暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 操作符与优化器详解 - 包含(选择性、JOIN方法、等效)等内容

digoal 2016-02-24
470

作者

digoal

日期

2016-02-24

标签

PostgreSQL , 优化器 , 操作符 , 选择性


背景

PostgreSQL 支持自定义操作符,本质上是调用函数来实现的。

同时不同的操作符有幂等,反转,NOT等相关性,数据库的优化器可以利用这些关系,对QUERY进行REWRITE,产生更好的执行计划。

operator语法

语法如下:

例如创建一个求两个值的平均值的操作符:

首选要创建函数

postgres=# create function f_avg(numeric,numeric) returns numeric as $$ postgres$# select ($1+$2)/2; postgres$# $$ language sql strict; CREATE FUNCTION

验证函数

```
postgres=# select f_avg(1,null);
f_avg


(1 row)
postgres=# select f_avg(1,2);
f_avg


1.5000000000000000
(1 row)
```

创建操作符,指定左右参数类型,调用的函数名,commutator是一个和优化器相关的选项,我后面会重点介绍:

```
postgres=# create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');
CREATE OPERATOR
postgres=# select 1 ## 2;
?column?


1.5000000000000000
(1 row)
```

operator优化器开关

注意到在创建操作符的语法中有6个和优化器有关的关键字:

[, COMMUTATOR = com_op ] [, NEGATOR = neg_op ] [, RESTRICT = res_proc ] [, JOIN = join_proc ] [, HASHES ] [, MERGES ]

介绍如下:

假设x表示操作符左侧的参数,y表示操作符右侧的参数

1. commutator,指明x op1 y等效于y op2 x,即操作数调换,返回的值一样。例如2>1 和1<2结果是一致的。那么>就是<的commutator或者反之。又例如1+2和2+1是等价的,那么+就是+的commutator。commutator只需要在创建其中一个操作符时指定,创建另一个对应的操作符时可以不需要指定,PostgreSQL会自动建立这个关系。例如创建>操作符时指定了它的commutator是<,那么在创建<操作符时可以不需要指定>是它的commutator。

另外需要注意,有commutator操作符的操作符的左右两侧的参数类型必须一致,这样才能满足x op1 y等价于y op2 x。

优化器如何利用commutator呢?例如索引扫描,必须列在操作符的左侧才能使用索引。1 > tbl.c这个条件,如果>没有commutator的话,是不能使用索引的。

例子,以int4的>和<操作符为例,实验一下:

和<在PostgreSQL中是一对commutator

```
postgres=# select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;
oprcom


pg_catalog.<
(1 row)
postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;
oprcom


pg_catalog.>
(1 row)
```

记录他们的oprcom对应的OID

```
postgres=# select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;
oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc
ode | oprrest | oprjoin
---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----
----+-------------+-----------------

  |           11 |       10 | b       | f           | f          |      23 |       23 |        16 |     97 |       523 | int4

gt | scalargtsel | scalargtjoinsel
(1 row)
postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;
oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc
ode | oprrest | oprjoin
---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----
----+-------------+-----------------
< | 11 | 10 | b | f | f | 23 | 23 | 16 | 521 | 525 | int4
lt | scalarltsel | scalarltjoinsel
(1 row)
```

接下来我要通过更新pg_operator解除他们的commutator关系,设置为0即可。

postgres=# update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc; UPDATE 1 postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc; UPDATE 1

创建测试表,插入测试数据,创建索引:

postgres=# create table tbl(id int); CREATE TABLE postgres=# insert into tbl select generate_series(1,100000); INSERT 0 100000 postgres=# create index idx_tbl_id on tbl(id); CREATE INDEX

将列放在条件的左边可以走索引,但是放在右边不走索引。因为优化器不能决定>,<是否为commutator

```
postgres=# explain select * from tbl where id<10;
QUERY PLAN


Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)
Index Cond: (id < 10)
(2 rows)
postgres=# explain select * from tbl where 10>id;
QUERY PLAN


Seq Scan on tbl (cost=0.00..1361.00 rows=33333 width=4)
Filter: (10 > id)
(2 rows)
```

重新建立这两个 operator的commutator关系后,优化器会自动将10>id转换为id<10,并且走索引了:

```
postgres=# update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;
UPDATE 1
postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;
UPDATE 1
postgres=# explain select * from tbl where 10>id;
QUERY PLAN


Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)
Index Cond: (id < 10)
(2 rows)
```

2. negator,指x op1 y 等价于 not(y op2 x),或者x op1等价于not( y op2),或者op1 x 等价于not(op2 y),因此negator支持一元和二元操作符。

例子:

如果=和<>是一对negator操作符,NOT (x = y) 可以简化为 x <> y。

```
postgres=# explain select * from tbl where 10=id;
QUERY PLAN


Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 10)
(2 rows)
postgres=# explain select * from tbl where not(10<>id);
QUERY PLAN


Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 10)
(2 rows)
```

同样,操作符两侧参数x,y的类型必须一致。并且仅适用于返回布尔逻辑类型的操作符。

3. restrict,是用于评估选择性的函数,仅适用于二元操作符,例如where col>100,这个查询条件,如何评估选择性呢?是通过操作符的restrict来指定的,选择性乘以pg_class.reltuples就可以评估得到这个查询条件的行数。

选择性函数的代码在 src/backend/utils/adt/

包括

-rw-r--r--. 1 1107 1107 33191 Jun 10 03:29 array_selfuncs.c -rw-r--r--. 1 1107 1107 2316 Jun 10 03:29 geo_selfuncs.c -rw-r--r--. 1 1107 1107 720 Jun 10 03:29 network_selfuncs.c -rw-r--r--. 1 1107 1107 33895 Jun 10 03:29 rangetypes_selfuncs.c -rw-r--r--. 1 1107 1107 218809 Jun 10 03:29 selfuncs.c

选择性函数,还需要依赖数据库的统计信息,从而计算选择性,常见的选择性计算函数有:

```
postgres=# select distinct oprrest from pg_operator order by 1;
oprrest


  • eqsel 相等
    neqsel 不相等
    scalarltsel 小于等于
    scalargtsel 大于等于
    areasel
    positionsel
    contsel
    iclikesel
    icnlikesel
    regexeqsel
    likesel
    icregexeqsel
    regexnesel
    nlikesel
    icregexnesel
    rangesel
    networksel
    tsmatchsel
    arraycontsel
    (20 rows)
    ```

当然,用户如果自定义数据类型的话,也可以自定义选择性函数,或者使用以上标准的选择性函数,只是可能需要实现一下类型转换。

源码中的介绍:

src/backend/utils/adt/selfuncs.c

/*---------- * Operator selectivity estimation functions are called to estimate the * selectivity of WHERE clauses whose top-level operator is their operator. * We divide the problem into two cases: * Restriction clause estimation: the clause involves vars of just * one relation. 一种是符合WHERE条件的选择性(百分比)。 * Join clause estimation: the clause involves vars of multiple rels. * Join selectivity estimation is far more difficult and usually less accurate * than restriction estimation. -- JOIN的选择性评估通常没有WHERE条件的选择性准确。 * * When dealing with the inner scan of a nestloop join, we consider the * join's joinclauses as restriction clauses for the inner relation, and * treat vars of the outer relation as parameters (a/k/a constants of unknown * values). So, restriction estimators need to be able to accept an argument * telling which relation is to be treated as the variable. 在使用nestloop JOIN时,一个表的字段将作为变量,另一个表的字段(及其统计信息)与操作符作为JOIN评估子句。 * * The call convention for a restriction estimator (oprrest function) is * * Selectivity oprrest (PlannerInfo *root, * Oid operator, * List *args, * int varRelid); * 评估选择性需要4个参数: * root: general information about the query (rtable and RelOptInfo lists * are particularly important for the estimator). plannerinfo信息。 * operator: OID of the specific operator in question. 操作符的OID * args: argument list from the operator clause. 操作符子句中的参数列表 * varRelid: if not zero, the relid (rtable index) of the relation to * be treated as the variable relation. May be zero if the args list * is known to contain vars of only one relation. 表示where条件所包含的参数来自哪些relation。 * * This is represented at the SQL level (in pg_proc) as * * float8 oprrest (internal, oid, internal, int4); 在pg_proc数据字典中表示为oprrest指定的函数。 * * The result is a selectivity, that is, a fraction (0 to 1) of the rows * of the relation that are expected to produce a TRUE result for the * given operator. 选择性函数的评估结果就是一个百分比。乘以pg_class.reltuples就可以得到记录数。 * * The call convention for a join estimator (oprjoin function) is similar * except that varRelid is not needed, and instead join information is * supplied: * JOIN选择性的计算函数与WHERE选择性的计算函数参数有轻微差别,么有varRelid, 增加了join信息的参数。 * Selectivity oprjoin (PlannerInfo *root, * Oid operator, * List *args, * JoinType jointype, * SpecialJoinInfo *sjinfo); * * float8 oprjoin (internal, oid, internal, int2, internal); * * (Before Postgres 8.4, join estimators had only the first four of these * parameters. That signature is still allowed, but deprecated.) The * relationship between jointype and sjinfo is explained in the comments for * clause_selectivity() --- the short version is that jointype is usually * best ignored in favor of examining sjinfo. * * Join selectivity for regular inner and outer joins is defined as the * fraction (0 to 1) of the cross product of the relations that is expected * to produce a TRUE result for the given operator. For both semi and anti (半连接与预连接) * joins, however, the selectivity is defined as the fraction of the left-hand * side relation's rows that are expected to have a match (ie, at least one * row with a TRUE result) in the right-hand side. * * For both oprrest and oprjoin functions, the operator's input collation OID * (if any) is passed using the standard fmgr mechanism, so that the estimator * function can fetch it with PG_GET_COLLATION(). Note, however, that all * statistics in pg_statistic are currently built using the database's default * collation. Thus, in most cases where we are looking at statistics, we * should ignore the actual operator collation and use DEFAULT_COLLATION_OID. * We expect that the error induced by doing this is usually not large enough * to justify complicating matters. *----------

4. join,是joinsel即join的选择性计算函数。

对应pg_operator.oprjoin

```
postgres=# select distinct oprjoin from pg_operator order by 1;
oprjoin


  • eqjoinsel
    neqjoinsel
    scalarltjoinsel
    scalargtjoinsel
    areajoinsel
    positionjoinsel
    contjoinsel
    iclikejoinsel
    icnlikejoinsel
    regexeqjoinsel
    likejoinsel
    icregexeqjoinsel
    regexnejoinsel
    nlikejoinsel
    icregexnejoinsel
    networkjoinsel
    tsmatchjoinsel
    arraycontjoinsel
    (19 rows)
    ```

5. hashes

6. merges

hashes和merges表示该操作符是否允许hash join和merge join, 只有返回布尔逻辑值的二元操作符满足这个要求。

我们在pg_operator这个catalog中也可以查看到对应的介绍:

Name| Type| References| Description
---|---|---|---
oid| oid| - | Row identifier (hidden attribute; must be explicitly selected)
oprname| name| - | Name of the operator
oprnamespace| oid| pg_namespace.oid| The OID of the namespace that contains this operator
oprowner| oid| pg_authid.oid| Owner of the operator
oprkind| char| - | b = infix ("between"), l = prefix ("left"), r = postfix ("right") 指定操作符在什么位置,例如中间,左侧,右侧
oprcanmerge| bool| - | This operator supports merge joins 此操作符是否支持merge join
oprcanhash| bool| - | This operator supports hash joins 此操作符是否支持hash join
oprleft| oid| pg_type.oid| Type of the left operand 操作符左侧的数据类型
oprright| oid| pg_type.oid| Type of the right operand 操作符右侧的数据类型
oprresult| oid| pg_type.oid| Type of the result 返回结果的数据类型
oprcom| oid pg_operator.oid| Commutator of this operator, if any
oprnegate| oid| pg_operator.oid| Negator of this operator, if any
oprcode| regproc| pg_proc.oid| Function that implements this operator
oprrest| regproc| pg_proc.oid| Restriction selectivity estimation function for this operator
oprjoin| regproc| pg_proc.oid| Join selectivity estimation function for this operator

参考

1. http://www.postgresql.org/docs/9.4/static/sql-createoperator.html

2. http://www.postgresql.org/docs/9.4/static/xoper-optimization.html

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论