暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

复合条件c1=xx and (c2=xx or c2=xx)写法的优化

digoal 2019-04-17
162

作者

digoal

日期

2019-04-17

标签

PostgreSQL , Filter , bitmap or , index Cond


背景

c1=xx and (c2=xx or c2=xx)

如果你的SQL这么写即使有c1,c2的索引,你可能会得到这样的执行计划

```
postgres=# explain select * from td where c1=1 and (c2=1 or c2=2);
QUERY PLAN


Bitmap Heap Scan on td (cost=8.32..560.09 rows=499 width=16)
Recheck Cond: (((c1 = 1) AND (c2 = 1)) OR ((c1 = 1) AND (c2 = 2)))
-> BitmapOr (cost=8.32..8.32 rows=500 width=0)
-> Bitmap Index Scan on idx_td_1 (cost=0.00..4.04 rows=250 width=0)
Index Cond: ((c1 = 1) AND (c2 = 1))
-> Bitmap Index Scan on idx_td_1 (cost=0.00..4.04 rows=250 width=0)
Index Cond: ((c1 = 1) AND (c2 = 2))
(7 rows)
```

bitmapscan会引入recheck,所以实际上是lossy scan,然后再通过recheck的方式过滤数据。性能不是最优。

最优的方法应该是全部都在index cond里面

优化

```
postgres=# create table td(id int, c1 int, c2 int, c3 int);
CREATE TABLE
postgres=# insert into td select generate_series(1,10000000), random()100, random()1000, random()*10000;
INSERT 0 10000000
postgres=# create index idx_td_1 on td(c1,c2);
CREATE INDEX
postgres=# explain select * from td where c1=1 and (c2=1 or c2=2);
QUERY PLAN


Bitmap Heap Scan on td (cost=8.32..560.09 rows=499 width=16)
Recheck Cond: (((c1 = 1) AND (c2 = 1)) OR ((c1 = 1) AND (c2 = 2)))
-> BitmapOr (cost=8.32..8.32 rows=500 width=0)
-> Bitmap Index Scan on idx_td_1 (cost=0.00..4.04 rows=250 width=0)
Index Cond: ((c1 = 1) AND (c2 = 1))
-> Bitmap Index Scan on idx_td_1 (cost=0.00..4.04 rows=250 width=0)
Index Cond: ((c1 = 1) AND (c2 = 2))
(7 rows)

postgres=# explain select * from td where c2=1 and (c1=1 or c1=2);
QUERY PLAN


Bitmap Heap Scan on td (cost=5.17..226.34 rows=198 width=16)
Recheck Cond: (((c1 = 1) AND (c2 = 1)) OR ((c1 = 2) AND (c2 = 1)))
-> BitmapOr (cost=5.17..5.17 rows=199 width=0)
-> Bitmap Index Scan on idx_td_1 (cost=0.00..2.52 rows=99 width=0)
Index Cond: ((c1 = 1) AND (c2 = 1))
-> Bitmap Index Scan on idx_td_1 (cost=0.00..2.55 rows=101 width=0)
Index Cond: ((c1 = 2) AND (c2 = 1))
(7 rows)
```

优化手段,改写sql

```
postgres=# explain select * from td where c2=1 and c1 in (1,2);
QUERY PLAN


Index Scan using idx_td_1 on td (cost=0.43..225.96 rows=199 width=16)
Index Cond: ((c1 = ANY ('{1,2}'::integer[])) AND (c2 = 1))
(2 rows)

postgres=# explain select * from td where c2=1 and c1 = any(array [1,2]);
QUERY PLAN


Index Scan using idx_td_1 on td (cost=0.43..225.96 rows=199 width=16)
Index Cond: ((c1 = ANY ('{1,2}'::integer[])) AND (c2 = 1))
(2 rows)

postgres=# explain select * from td where c1=1 and c2 = any(array [1,2]);
QUERY PLAN


Index Scan using idx_td_1 on td (cost=0.43..224.83 rows=198 width=16)
Index Cond: ((c1 = 1) AND (c2 = ANY ('{1,2}'::integer[])))
(2 rows)

postgres=# explain (verbose,analyze,verbose,timing,costs) select * from td where c1=1 and c2 = any(array [1,2]);
QUERY PLAN


Index Scan using idx_td_1 on public.td (cost=0.43..224.83 rows=198 width=16) (actual time=0.104..0.556 rows=190 loops=1)
Output: id, c1, c2, c3
Index Cond: ((td.c1 = 1) AND (td.c2 = ANY ('{1,2}'::integer[])))
Planning Time: 0.141 ms
Execution Time: 0.583 ms
(5 rows)

postgres=# explain (verbose,analyze,verbose,timing,costs) select * from td where c2=1 and c1 = any(array [1,2]);
QUERY PLAN


Index Scan using idx_td_1 on public.td (cost=0.43..225.96 rows=199 width=16) (actual time=0.024..0.499 rows=184 loops=1)
Output: id, c1, c2, c3
Index Cond: ((td.c1 = ANY ('{1,2}'::integer[])) AND (td.c2 = 1))
Planning Time: 0.092 ms
Execution Time: 0.521 ms
(5 rows)
```

问题2

```
postgres=# explain (verbose,analyze,verbose,timing,costs) select * from td where c1=1 and c2::text = any(array ['1','2']);
QUERY PLAN


Index Scan using idx_td_1 on public.td (cost=0.43..59837.89 rows=997 width=16) (actual time=0.150..125.371 rows=190 loops=1)
Output: id, c1, c2, c3
Index Cond: (td.c1 = 1)
Filter: ((td.c2)::text = ANY ('{1,2}'::text[]))
Rows Removed by Filter: 99897
Planning Time: 0.103 ms
Execution Time: 125.400 ms
(7 rows)
```

当类型没有和索引对齐时,这个字段的索引将不能使用,会放到Filter里面,导致通过扫描过滤的行数不多,需要在filter里面再次过滤。

小结

1、写法,不建议使用c1=xx and (c2=xx or c2=xx),建议改成

```
c1=xx and c2 in (xx,yy)

或者

c1=xx and c2 =any(array[xx,yy])
```

2、类型对齐索引中的类型

```
只有当索引和条件类型一致时,才会走索引。

如果是表达式,那么表达式要和索引结构一致。
```

参考

https://www.postgresql.org/docs/11/sql-prepare.html

https://www.postgresql.org/docs/11/libpq-async.html

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论