PG Bitmap scan 和 Index Scan 区别
位图扫描:https://www.modb.pro/db/611186
索引:https://www.modb.pro/db/405851
| Index Scan | Bitmap Index Scan | |
|---|---|---|
| 前提 | 索引 | 索引 |
| 扫描过程 | 一次取一条数据 | 一次性的全部取出 |
| 需注意的问题 | 随机I/O | recheck的花销 |
| 适用场景 | 离散度非常高,并且离散IO的成本很高的场景;And/Or |
-
index scan: index -> ctid (heap tuple id) -> read heap tuple
索引扫描:找出匹配的ctid,读取heap表
-
bitmap scan: index -> heap blockid -> heap block scan -> recheck (filter tuple)
位图扫描:bitmap index scan阶段取得符合条件的所有行号, 获得对应的block id,构建位图;bitmap heap scan阶段根据bitmap扫描block id顺序从heap 表搜索数据,最后recheck, 根据查询条件再做一次 recheck , 过滤放大的无效记录.
bitmap scan的作用就是通过建立位图的方式,将回表过程中的访问随机IO的转换为顺序扫描行为,从而减少查询过程中IO的消耗。但是引入了recheck。
适用场景:
- I thought "Bitmap Index Scan" was only used when there are two or more applicable indexes in the plan, so I don't understand why is it used now?
True, we can combine multiple bitmaps via AND/OR operations to merge results from multiple indexes before visiting the table ... but it's still potentially worthwhile even for one index. A rule of thumb is that plain indexscan wins for fetching a small number of tuples, bitmap scan wins for a somewhat larger number of tuples, and seqscan wins if you're fetching a large percentage of the whole table.
👆引用自: https://www.postgresql.org/message-id/12553.1135634231@sss.pgh.pa.us
pg_stat_all_indexes的idx_tup_read和idx_tup_fetch字段:
- idx_tup_read:从指定索引的leap page获取tid的条数(垃圾版本计算)
- idx_tup_fetch:从索引的tid获取到的有效HEAP tuple的记数(垃圾版本不算)(每扫一条tuple计1)
bitmap scan不计算idx_tup_fetch。因为不是从leaf page的tid去查询heap tuple获取的,而是从HEAP BLOCK id去查询heap block然后再过滤出有效tuple的
👆摘自于:https://github.com/digoal/blog/blob/master/201610/20161018_03.md




