暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 12 preview - query rewrite API 增强

digoal 2019-03-31
198

作者

digoal

日期

2019-03-31

标签

PostgreSQL , query rewrite , API , extension , lossy index 重写 , 放大范围 , 改写where条件走索引 , lossy 化 , recheck , filter


背景

在数据库中QUERY REWRITE是一种常用的SQL优化方法。

我们看一个QUERY REWRITE的例子。

1 /*------------------------------------------------------------------------- 2 * 3 * like_support.c 4 * Planner support functions for LIKE, regex, and related operators. 5 * 6 * These routines handle special optimization of operators that can be 7 * used with index scans even though they are not known to the executor's 8 * indexscan machinery. The key idea is that these operators allow us 9 * to derive approximate indexscan qual clauses, such that any tuples 10 * that pass the operator clause itself must also satisfy the simpler 11 * indexscan condition(s). Then we can use the indexscan machinery 12 * to avoid scanning as much of the table as we'd otherwise have to, 13 * while applying the original operator as a qpqual condition to ensure 14 * we deliver only the tuples we want. (In essence, we're using a regular 15 * index as if it were a lossy index.) 16 * 17 * An example of what we're doing is 18 * textfield LIKE 'abc%def' 19 * from which we can generate the indexscanable conditions 20 * textfield >= 'abc' AND textfield < 'abd' 21 * which allow efficient scanning of an index on textfield. 22 * (In reality, character set and collation issues make the transformation 23 * from LIKE to indexscan limits rather harder than one might think ... 24 * but that's the basic idea.) 25 * 26 * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group 27 * Portions Copyright (c) 1994, Regents of the University of California 28 * 29 * 30 * IDENTIFICATION 31 * src/backend/utils/adt/like_support.c 32 * 33 *------------------------------------------------------------------------- 34 */

例如,like可以改写成范围查询,走索引。如下:

```
postgres=# explain select * from t1 where info like 'abc%def';
QUERY PLAN


Index Scan using idx_info_t1_1 on t1 (cost=0.15..6.89 rows=1 width=36)
Index Cond: ((info >= 'abc'::text) AND (info < 'abd'::text))
改写成 ((info >= 'abc'::text) AND (info < 'abd'::text)),使得这个LIKE可以走索引。
Filter: (info ~~ 'abc%def'::text)
Filter重新检查,保证原查询条件的正确性。
(3 rows)
```

实际上~~就是like操作符,这个操作符支持了SupportRequestIndexCondition。

```
postgres=# \do+ ~~
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
------------+------+---------------+----------------+-------------+------------+-------------------------
pg_catalog | ~~ | bytea | bytea | boolean | bytealike | matches LIKE expression
pg_catalog | ~~ | character | text | boolean | bpcharlike | matches LIKE expression
pg_catalog | ~~ | name | text | boolean | namelike | matches LIKE expression
pg_catalog | ~~ | text | text | boolean | textlike | matches LIKE expression
(4 rows)

postgres=# \df+ textlike
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Description
------------+----------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+-------------+-------------------------------
pg_catalog | textlike | boolean | text, text | func | immutable | safe | postgres | invoker | | internal | textlike | implementation of ~~ operator
(1 row)
```

以前这是一个内置的功能,PostgreSQL 12将这个功能API化,当用户创建的函数支持了SupportRequestIndexCondition时,可以将这个函数调用进行转换,转换为支持索引扫描的QUERY(重写where条件),同时在index 中增加Filter,check每一条返回的记录,确保返回的记录符合原来的where条件。

实际上这个功能是lossy化,也就是说只要能用上索引,放宽一下查询条件也可以,在每条返回的记录上,再增加recheck(Filter)来保证返回记录符合原来的where条件。

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=74dfe58a5927b22c744b29534e67bfdd203ac028

```
Allow extensions to generate lossy index conditions.

For a long time, indxpath.c has had the ability to extract derived (lossy)
index conditions from certain operators such as LIKE. For just as long,
it's been obvious that we really ought to make that capability available
to extensions. This commit finally accomplishes that, by adding another
API for planner support functions that lets them create derived index
conditions for their functions. As proof of concept, the hardwired
"special index operator" code formerly present in indxpath.c is pushed
out to planner support functions attached to LIKE and other relevant
operators.

A weak spot in this design is that an extension needs to know OIDs for
the operators, datatypes, and opfamilies involved in the transformation
it wants to make. The core-code prototypes use hard-wired OID references
but extensions don't have that option for their own operators etc. It's
usually possible to look up the required info, but that may be slow and
inconvenient. However, improving that situation is a separate task.

I want to do some additional refactorization around selfuncs.c, but
that also seems like a separate task.

Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
```

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml index b486ef3..3403269 100644 (file) --- a/doc/src/sgml/xfunc.sgml +++ b/doc/src/sgml/xfunc.sgml @@ -3460,4 +3460,18 @@ supportfn(internal) returns internal This can be done by a support function that implements the <literal>SupportRequestRows</literal> request type. </para> + + <para> + For target functions that return boolean, it may be possible to + convert a function call appearing in WHERE into an indexable operator + clause or clauses. The converted clauses might be exactly equivalent + to the function's condition, or they could be somewhat weaker (that is, + they might accept some values that the function condition does not). + In the latter case the index condition is said to + be <firstterm>lossy</firstterm>; it can still be used to scan an index, + but the function call will have to be executed for each row returned by + the index to see if it really passes the WHERE condition or not. + To create such conditions, the support function must implement + the <literal>SupportRequestIndexCondition</literal> request type. + </para> </sect1>

小结

query rewrite API 增强 - Allow extensions to generate lossy index conditions - SupportRequestIndexCondition ,实际上就是让SQL改写成可以走索引(lossy化,通常是放大WHERE 条件,同时采用二次check来过滤放大的记录。)

同样具有lossy思想的还包括:

1、bloom index,放大搜索范围,再进行二次check。

2、bitmap scan,放大搜索范围,再进行二次check。

使用索引扫描降低了IO开销,但是二次check(Filter)需要一定的CPU开销,总体成本还是下降的。

如果要同时消除IO与check,可以使用精准索引(或者partial index)。

源码请参考

src/include/nodes/supportnodes.h:typedef struct SupportRequestIndexCondition src/include/nodes/supportnodes.h:} SupportRequestIndexCondition; src/include/nodes/nodes.h: T_SupportRequestIndexCondition /* in nodes/supportnodes.h */ src/backend/optimizer/path/indxpath.c: SupportRequestIndexCondition req; src/backend/optimizer/path/indxpath.c: req.type = T_SupportRequestIndexCondition; src/backend/utils/adt/like_support.c: else if (IsA(rawreq, SupportRequestIndexCondition)) src/backend/utils/adt/like_support.c: SupportRequestIndexCondition *req = (SupportRequestIndexCondition *) rawreq; src/backend/utils/adt/network.c: if (IsA(rawreq, SupportRequestIndexCondition)) src/backend/utils/adt/network.c: SupportRequestIndexCondition *req = (SupportRequestIndexCondition *) rawreq;

参考

https://www.postgresql.org/docs/devel/xfunc-optimization.html

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=74dfe58a5927b22c744b29534e67bfdd203ac028

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/like_support.c;h=b001dde5fc7d73b24853296ebc27e20044fae2c7;hb=74dfe58a5927b22c744b29534e67bfdd203ac028

《PostgreSQL bitmap scan的IO放大的原理解释和优化》

《Recheck Cond filter IO\CPU放大 原理与优化CASE - 含 超级大表 不包含(反选) SQL优化》

《PostgreSQL multipolygon 空间索引查询过滤精简优化 - IO,CPU放大优化》

《索引顺序扫描引发的堆扫描IO放大背后的统计学原理与解决办法 - PostgreSQL index scan enlarge heap page scans when index and column correlation small.》

《PostgreSQL bloom filter index 扩展 for bigint》

《PostgreSQL 11 preview - BRIN索引接口功能扩展(BLOOM FILTER、min max分段)》

《PostgreSQL 9.6 黑科技 bloom 算法索引,一个索引支撑任意列组合查询》

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论