暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 14 preview - BYPASS_THRESHOLD_PAGES vacuum 优化, 避免每次都需要vacuum index, 当LP_DEAD覆盖的page较少时, 跳过index vacuum

digoal 2021-01-04
277

作者

digoal

日期

2021-04-08

标签

PostgreSQL , BYPASS_THRESHOLD_PAGES , vacuum index , skip , lp_dead , 优化


背景

数据库是怎么定位一条记录(tuple)的?
- 通过行号即ctid, 由heap page id+lp组成.
- heap page id -> lp -> tuple
- HEAP PAGE 的上半段里面头文件后是item point, 也叫line point, 即LP. LP 为4个字节, 存储一条记录在这个heap page内的offset.

如果一条记录已经被删除或更新, 老的tuple就是dead tuple, vacuum 时 , 回收垃圾信息需要回收dead tuple以及对应的lp.

并且, LP是可能被索引引用的, 如果清理垃圾时lp也被清除, 那么表的index也要做一次vacuum, 把引用这个LP的index item也清除掉.
为了避免每次vacuum都要清理index, PostgreSQL 14进行了优化, 当vacuum一个table时, 如果低于2%的PAGE有dead LP(例如一个表占用了100个page, 如果只有2个page里面有dead LP), 那么将跳过index vacuum, 并保留这些lp_dead.
当table中的dead lp积累到超过2% page时, 才需要执行index vacuum.

因为LP 只占用4字节, 所以不清理也影响不大, 但是大幅降低了index vacuum带来的vacuum负担.

为什么是2%, 代码写死的, 未来也许会支持索引级别配置, 或者支持GUC配置.

+/* + * Threshold that controls whether we bypass index vacuuming and heap + * vacuuming as an optimization + */ +#define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5100010ee4d5c8ef46619dbd1d17090c627e6d0a

```
Teach VACUUM to bypass unnecessary index vacuuming.
master github/master
author Peter Geoghegan pg@bowt.ie
Wed, 7 Apr 2021 23:14:54 +0000 (16:14 -0700)
committer Peter Geoghegan pg@bowt.ie
Wed, 7 Apr 2021 23:14:54 +0000 (16:14 -0700)
commit 5100010ee4d5c8ef46619dbd1d17090c627e6d0a
tree fba00f18c75ef6c605b7433fdbd66c926595ee2d tree
parent bc70728693bc2d28db7125e7a24d78ad7612f58c commit | diff
Teach VACUUM to bypass unnecessary index vacuuming.

VACUUM has never needed to call ambulkdelete() for each index in cases
where there are precisely zero TIDs in its dead_tuples array by the end
of its first pass over the heap (also its only pass over the heap in
this scenario). Index vacuuming is simply not required when this
happens. Index cleanup will still go ahead, but in practice most calls
to amvacuumcleanup() are usually no-ops when there were zero preceding
ambulkdelete() calls. In short, VACUUM has generally managed to avoid
index scans when there were clearly no index tuples to delete from
indexes. But cases with close to no index tuples to delete were
another matter -- a round of ambulkdelete() calls took place (one per
index), each of which performed a full index scan.

VACUUM now behaves just as if there were zero index tuples to delete in
cases where there are in fact "virtually zero" such tuples. That is, it
can now bypass index vacuuming and heap vacuuming as an optimization
(though not index cleanup). Whether or not VACUUM bypasses indexes is
determined dynamically, based on the just-observed number of heap pages
in the table that have one or more LP_DEAD items (LP_DEAD items in heap
pages have a 1:1 correspondence with index tuples that still need to be
deleted from each index in the worst case).

We only skip index vacuuming when 2% or less of the table's pages have
one or more LP_DEAD items -- bypassing index vacuuming as an
optimization must not noticeably impede setting bits in the visibility
map. As a further condition, the dead_tuples array (i.e. VACUUM's array
of LP_DEAD item TIDs) must not exceed 32MB at the point that the first
pass over the heap finishes, which is also when the decision to bypass
is made. (The VACUUM must also have been able to fit all TIDs in its
maintenance_work_mem-bound dead_tuples space, though with a default
maintenance_work_mem setting it can't matter.)

This avoids surprising jumps in the duration and overhead of routine
vacuuming with workloads where successive VACUUM operations consistently
have almost zero dead index tuples. The number of LP_DEAD items may
well accumulate over multiple VACUUM operations, before finally the
threshold is crossed and VACUUM performs conventional index vacuuming.
Even then, the optimization will have avoided a great deal of largely
unnecessary index vacuuming.

In the future we may teach VACUUM to skip index vacuuming on a per-index
basis, using a much more sophisticated approach. For now we only
consider the extreme cases, where we can be quite confident that index
vacuuming just isn't worth it using simple heuristics.

Also log information about how many heap pages have one or more LP_DEAD
items when autovacuum logging is enabled.

Author: Masahiko Sawada sawada.mshk@gmail.com
Author: Peter Geoghegan pg@bowt.ie
Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com
```

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论