作者
digoal
日期
2019-03-31
标签
PostgreSQL , query rewrite , API , extension , lossy index 重写 , 放大范围 , 改写where条件走索引 , lossy 化 , recheck , filter
背景
在数据库中QUERY REWRITE是一种常用的SQL优化方法。
我们看一个QUERY REWRITE的例子。
1 /*-------------------------------------------------------------------------
2 *
3 * like_support.c
4 * Planner support functions for LIKE, regex, and related operators.
5 *
6 * These routines handle special optimization of operators that can be
7 * used with index scans even though they are not known to the executor's
8 * indexscan machinery. The key idea is that these operators allow us
9 * to derive approximate indexscan qual clauses, such that any tuples
10 * that pass the operator clause itself must also satisfy the simpler
11 * indexscan condition(s). Then we can use the indexscan machinery
12 * to avoid scanning as much of the table as we'd otherwise have to,
13 * while applying the original operator as a qpqual condition to ensure
14 * we deliver only the tuples we want. (In essence, we're using a regular
15 * index as if it were a lossy index.)
16 *
17 * An example of what we're doing is
18 * textfield LIKE 'abc%def'
19 * from which we can generate the indexscanable conditions
20 * textfield >= 'abc' AND textfield < 'abd'
21 * which allow efficient scanning of an index on textfield.
22 * (In reality, character set and collation issues make the transformation
23 * from LIKE to indexscan limits rather harder than one might think ...
24 * but that's the basic idea.)
25 *
26 * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
27 * Portions Copyright (c) 1994, Regents of the University of California
28 *
29 *
30 * IDENTIFICATION
31 * src/backend/utils/adt/like_support.c
32 *
33 *-------------------------------------------------------------------------
34 */
例如,like可以改写成范围查询,走索引。如下:
```
postgres=# explain select * from t1 where info like 'abc%def';
QUERY PLAN
Index Scan using idx_info_t1_1 on t1 (cost=0.15..6.89 rows=1 width=36)
Index Cond: ((info >= 'abc'::text) AND (info < 'abd'::text))
改写成 ((info >= 'abc'::text) AND (info < 'abd'::text)),使得这个LIKE可以走索引。
Filter: (info ~~ 'abc%def'::text)
Filter重新检查,保证原查询条件的正确性。
(3 rows)
```
实际上~~
就是like操作符,这个操作符支持了SupportRequestIndexCondition。
```
postgres=# \do+ ~~
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
------------+------+---------------+----------------+-------------+------------+-------------------------
pg_catalog | ~~ | bytea | bytea | boolean | bytealike | matches LIKE expression
pg_catalog | ~~ | character | text | boolean | bpcharlike | matches LIKE expression
pg_catalog | ~~ | name | text | boolean | namelike | matches LIKE expression
pg_catalog | ~~ | text | text | boolean | textlike | matches LIKE expression
(4 rows)
postgres=# \df+ textlike
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Description
------------+----------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+-------------+-------------------------------
pg_catalog | textlike | boolean | text, text | func | immutable | safe | postgres | invoker | | internal | textlike | implementation of ~~ operator
(1 row)
```
以前这是一个内置的功能,PostgreSQL 12将这个功能API化,当用户创建的函数支持了SupportRequestIndexCondition时,可以将这个函数调用进行转换,转换为支持索引扫描的QUERY(重写where条件),同时在index 中增加Filter,check每一条返回的记录,确保返回的记录符合原来的where条件。
实际上这个功能是lossy化,也就是说只要能用上索引,放宽一下查询条件也可以,在每条返回的记录上,再增加recheck(Filter)来保证返回记录符合原来的where条件。
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=74dfe58a5927b22c744b29534e67bfdd203ac028
```
Allow extensions to generate lossy index conditions.
For a long time, indxpath.c has had the ability to extract derived (lossy)
index conditions from certain operators such as LIKE. For just as long,
it's been obvious that we really ought to make that capability available
to extensions. This commit finally accomplishes that, by adding another
API for planner support functions that lets them create derived index
conditions for their functions. As proof of concept, the hardwired
"special index operator" code formerly present in indxpath.c is pushed
out to planner support functions attached to LIKE and other relevant
operators.
A weak spot in this design is that an extension needs to know OIDs for
the operators, datatypes, and opfamilies involved in the transformation
it wants to make. The core-code prototypes use hard-wired OID references
but extensions don't have that option for their own operators etc. It's
usually possible to look up the required info, but that may be slow and
inconvenient. However, improving that situation is a separate task.
I want to do some additional refactorization around selfuncs.c, but
that also seems like a separate task.
Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
```
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index b486ef3..3403269 100644 (file)
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3460,4 +3460,18 @@ supportfn(internal) returns internal
This can be done by a support function that implements
the <literal>SupportRequestRows</literal> request type.
</para>
+
+ <para>
+ For target functions that return boolean, it may be possible to
+ convert a function call appearing in WHERE into an indexable operator
+ clause or clauses. The converted clauses might be exactly equivalent
+ to the function's condition, or they could be somewhat weaker (that is,
+ they might accept some values that the function condition does not).
+ In the latter case the index condition is said to
+ be <firstterm>lossy</firstterm>; it can still be used to scan an index,
+ but the function call will have to be executed for each row returned by
+ the index to see if it really passes the WHERE condition or not.
+ To create such conditions, the support function must implement
+ the <literal>SupportRequestIndexCondition</literal> request type.
+ </para>
</sect1>
小结
query rewrite API 增强 - Allow extensions to generate lossy index conditions - SupportRequestIndexCondition ,实际上就是让SQL改写成可以走索引(lossy化,通常是放大WHERE 条件,同时采用二次check来过滤放大的记录。)
同样具有lossy思想的还包括:
1、bloom index,放大搜索范围,再进行二次check。
2、bitmap scan,放大搜索范围,再进行二次check。
使用索引扫描降低了IO开销,但是二次check(Filter)需要一定的CPU开销,总体成本还是下降的。
如果要同时消除IO与check,可以使用精准索引(或者partial index)。
源码请参考
src/include/nodes/supportnodes.h:typedef struct SupportRequestIndexCondition
src/include/nodes/supportnodes.h:} SupportRequestIndexCondition;
src/include/nodes/nodes.h: T_SupportRequestIndexCondition /* in nodes/supportnodes.h */
src/backend/optimizer/path/indxpath.c: SupportRequestIndexCondition req;
src/backend/optimizer/path/indxpath.c: req.type = T_SupportRequestIndexCondition;
src/backend/utils/adt/like_support.c: else if (IsA(rawreq, SupportRequestIndexCondition))
src/backend/utils/adt/like_support.c: SupportRequestIndexCondition *req = (SupportRequestIndexCondition *) rawreq;
src/backend/utils/adt/network.c: if (IsA(rawreq, SupportRequestIndexCondition))
src/backend/utils/adt/network.c: SupportRequestIndexCondition *req = (SupportRequestIndexCondition *) rawreq;
参考
https://www.postgresql.org/docs/devel/xfunc-optimization.html
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=74dfe58a5927b22c744b29534e67bfdd203ac028
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/like_support.c;h=b001dde5fc7d73b24853296ebc27e20044fae2c7;hb=74dfe58a5927b22c744b29534e67bfdd203ac028
《PostgreSQL bitmap scan的IO放大的原理解释和优化》
《Recheck Cond filter IO\CPU放大 原理与优化CASE - 含 超级大表 不包含(反选) SQL优化》
《PostgreSQL multipolygon 空间索引查询过滤精简优化 - IO,CPU放大优化》
《PostgreSQL bloom filter index 扩展 for bigint》
《PostgreSQL 11 preview - BRIN索引接口功能扩展(BLOOM FILTER、min max分段)》
《PostgreSQL 9.6 黑科技 bloom 算法索引,一个索引支撑任意列组合查询》
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.