VACUUM 是 PostgreSQL MVCC (Multiversion concurrency control) 实现的核心机制之一,是 PostgreSQL 正常运行的重要保证。
为什么需要做 VACUUM
为了方便看其底层数据,创建 extension pageinspect。pageinspect模块提供函数从低层次观察数据库页面的内容,这对于调试目的很有用。要求所有这些函数只能被超级用户使用。pg默认不带pageinspect,需要源码编译(位置在~/contrib/pageinspect)安装。
highgo=# CREATE EXTENSION IF NOT EXISTS pageinspect;
CREATE EXTENSION
复制
首先创建测试表
highgo=# create table tb1(id int,info text);
CREATE TABLE
复制
插入测试数据
highgo=# insert into tb1 values (generate_series(1,10),md5(random()::text));
INSERT 0 10
复制
查看其底层数据
highgo=# SELECT * FROM heap_page_items(get_raw_page('public.tb1', 0)) LIMIT 10; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data ----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------------------------------------------------------------------ 1 | 8128 | 1 | 61 | 793 | 0 | 0 | (0,1) | 2 | 2050 | 24 | | | \x01000000433764363338346539666630656535653932333465356634623236653238383834 2 | 8064 | 1 | 61 | 793 | 0 | 0 | (0,2) | 2 | 2050 | 24 | | | \x02000000433764333365616664653461386164373962626435613364333737356138323938 3 | 8000 | 1 | 61 | 793 | 0 | 0 | (0,3) | 2 | 2050 | 24 | | | \x03000000436666313465646436613037623535383166303939613636393237656365653964 4 | 7936 | 1 | 61 | 793 | 0 | 0 | (0,4) | 2 | 2050 | 24 | | | \x04000000433430646164396132313065363964633539323734383061316235393664633730 5 | 7872 | 1 | 61 | 793 | 0 | 0 | (0,5) | 2 | 2050 | 24 | | | \x05000000433039373035386565313462313739616237346334616463376130363661663832 6 | 7808 | 1 | 61 | 793 | 0 | 0 | (0,6) | 2 | 2050 | 24 | | | \x06000000433639343930663932646135363062346338323339646238346265663931303461 7 | 7744 | 1 | 61 | 793 | 0 | 0 | (0,7) | 2 | 2050 | 24 | | | \x07000000433039383732353363363162383162623638343161323234663737333765636363 8 | 7680 | 1 | 61 | 793 | 0 | 0 | (0,8) | 2 | 2050 | 24 | | | \x08000000436632643339343431323833366232353930356530656330623062393962386434 9 | 7616 | 1 | 61 | 793 | 0 | 0 | (0,9) | 2 | 2050 | 24 | | | \x09000000433563643966656237326638633066653365316535623031333066633832613966 10 | 7552 | 1 | 61 | 793 | 0 | 0 | (0,10) | 2 | 2050 | 24 | | | \x0a000000433864343334323838336530633964653237623864306365303961356437366137 (10 rows)
复制
下面删除 2 条数据:
highgo=# DELETE FROM tb1 WHERE id < 3;
DELETE 2
复制
再次查看底层数据
highgo=# SELECT * FROM heap_page_items(get_raw_page('public.tb1', 0)) LIMIT 10; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data ----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------------------------------------------------------------------ 1 | 8128 | 1 | 61 | 793 | 794 | 0 | (0,1) | 8194 | 258 | 24 | | | \x01000000433764363338346539666630656535653932333465356634623236653238383834 2 | 8064 | 1 | 61 | 793 | 794 | 0 | (0,2) | 8194 | 258 | 24 | | | \x02000000433764333365616664653461386164373962626435613364333737356138323938 3 | 8000 | 1 | 61 | 793 | 0 | 0 | (0,3) | 2 | 2306 | 24 | | | \x03000000436666313465646436613037623535383166303939613636393237656365653964 4 | 7936 | 1 | 61 | 793 | 0 | 0 | (0,4) | 2 | 2306 | 24 | | | \x04000000433430646164396132313065363964633539323734383061316235393664633730 5 | 7872 | 1 | 61 | 793 | 0 | 0 | (0,5) | 2 | 2306 | 24 | | | \x05000000433039373035386565313462313739616237346334616463376130363661663832 6 | 7808 | 1 | 61 | 793 | 0 | 0 | (0,6) | 2 | 2306 | 24 | | | \x06000000433639343930663932646135363062346338323339646238346265663931303461 7 | 7744 | 1 | 61 | 793 | 0 | 0 | (0,7) | 2 | 2306 | 24 | | | \x07000000433039383732353363363162383162623638343161323234663737333765636363 8 | 7680 | 1 | 61 | 793 | 0 | 0 | (0,8) | 2 | 2306 | 24 | | | \x08000000436632643339343431323833366232353930356530656330623062393962386434 9 | 7616 | 1 | 61 | 793 | 0 | 0 | (0,9) | 2 | 2306 | 24 | | | \x09000000433563643966656237326638633066653365316535623031333066633832613966 10 | 7552 | 1 | 61 | 793 | 0 | 0 | (0,10) | 2 | 2306 | 24 | | | \x0a000000433864343334323838336530633964653237623864306365303961356437366137 (10 rows)
复制
这时会发现,实际数据并未被删除。只是修改了 t_xmax
,t_infomask2
和 t_infomask
。t_xmax
为删除时的 transaction id,t_infomask2
和 t_infomask
为各种标志位,这里显示的是其二进制转换后的十进制。
为什么不直接物理删除数据呢?
主要是出于以下考虑:
这些被删除的数据可能还在被其他事务访问,所以不能直接删除。这就是所谓的 MVCC 中的 multi version,即多版本,不同事务访问的可能是不同版本的数据。transaction id 可以理解为版本号。其他事务可能还在访问 t_xmax 为 794 的数据。但是老版本的数据是可能有其他事务需要访问,但随着时间的推移,这些事务终将结束,对应老版本的数据终将不被需要,它们将不断占用甚至耗尽磁盘空间,使数据访问变得很慢,这就是 PostgreSQL 中的膨胀。
针对PostgreSQL 中的膨胀解决方法—— VACUUM
下面针对表tb1进行vacuum
highgo=# VACUUM VERBOSE tb1; INFO: vacuuming "public.tb1" INFO: table "tb1": removed 2 dead item identifiers in 1 pages INFO: table "tb1": found 2 removable, 8 nonremovable row versions in 1 out of 1 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 795 Skipped 0 pages due to buffer pins, 0 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: vacuuming "pg_toast.pg_toast_16495" INFO: table "pg_toast_16495": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 795 Skipped 0 pages due to buffer pins, 0 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM
复制
再次查看表的底层数据
highgo=# SELECT * FROM heap_page_items(get_raw_page('public.tb1', 0)) LIMIT 10;
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------------------------------------------------------------------
1 | 0 | 0 | 0 | | | | | | | | | |
2 | 0 | 0 | 0 | | | | | | | | | |
3 | 8128 | 1 | 61 | 793 | 0 | 0 | (0,3) | 2 | 2306 | 24 | | | \x03000000436666313465646436613037623535383166303939613636393237656365653964
4 | 8064 | 1 | 61 | 793 | 0 | 0 | (0,4) | 2 | 2306 | 24 | | | \x04000000433430646164396132313065363964633539323734383061316235393664633730
5 | 8000 | 1 | 61 | 793 | 0 | 0 | (0,5) | 2 | 2306 | 24 | | | \x05000000433039373035386565313462313739616237346334616463376130363661663832
6 | 7936 | 1 | 61 | 793 | 0 | 0 | (0,6) | 2 | 2306 | 24 | | | \x06000000433639343930663932646135363062346338323339646238346265663931303461
7 | 7872 | 1 | 61 | 793 | 0 | 0 | (0,7) | 2 | 2306 | 24 | | | \x07000000433039383732353363363162383162623638343161323234663737333765636363
8 | 7808 | 1 | 61 | 793 | 0 | 0 | (0,8) | 2 | 2306 | 24 | | | \x08000000436632643339343431323833366232353930356530656330623062393962386434
9 | 7744 | 1 | 61 | 793 | 0 | 0 | (0,9) | 2 | 2306 | 24 | | | \x09000000433563643966656237326638633066653365316535623031333066633832613966
10 | 7680 | 1 | 61 | 793 | 0 | 0 | (0,10) | 2 | 2306 | 24 | | | \x0a000000433864343334323838336530633964653237623864306365303961356437366137
(10 rows)
复制
可以看到,老版本数据已被清除。此时回收的空间新插入的数据使用,但并未返回给操作系统。下面还需要进行 VACUUM FULL 操作
highgo=# VACUUM FULL VERBOSE tb1;
INFO: vacuuming "public.tb1"
INFO: "tb1": found 0 removable, 8 nonremovable row versions in 1 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
复制
再次查看发现底层数据已经被清除
highgo=# SELECT * FROM heap_page_items(get_raw_page('public.tb1', 0)) LIMIT 10; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data ----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------------------------------------------------------------------ 1 | 8128 | 1 | 61 | 793 | 0 | 0 | (0,1) | 2 | 2818 | 24 | | | \x03000000436666313465646436613037623535383166303939613636393237656365653964 2 | 8064 | 1 | 61 | 793 | 0 | 0 | (0,2) | 2 | 2818 | 24 | | | \x04000000433430646164396132313065363964633539323734383061316235393664633730 3 | 8000 | 1 | 61 | 793 | 0 | 0 | (0,3) | 2 | 2818 | 24 | | | \x05000000433039373035386565313462313739616237346334616463376130363661663832 4 | 7936 | 1 | 61 | 793 | 0 | 0 | (0,4) | 2 | 2818 | 24 | | | \x06000000433639343930663932646135363062346338323339646238346265663931303461 5 | 7872 | 1 | 61 | 793 | 0 | 0 | (0,5) | 2 | 2818 | 24 | | | \x07000000433039383732353363363162383162623638343161323234663737333765636363 6 | 7808 | 1 | 61 | 793 | 0 | 0 | (0,6) | 2 | 2818 | 24 | | | \x08000000436632643339343431323833366232353930356530656330623062393962386434 7 | 7744 | 1 | 61 | 793 | 0 | 0 | (0,7) | 2 | 2818 | 24 | | | \x09000000433563643966656237326638633066653365316535623031333066633832613966 8 | 7680 | 1 | 61 | 793 | 0 | 0 | (0,8) | 2 | 2818 | 24 | | | \x0a000000433864343334323838336530633964653237623864306365303961356437366137 (8 rows)
复制
什么是autovacuum
zzq@localhost:/opt/pgsql/bin$ ps -ef|grep postgres
zzq 473286 1 0 16:53 ? 00:00:00 /opt/pgsql/bin/postgres -D ../data
zzq 473288 473286 0 16:53 ? 00:00:00 postgres: checkpointer
zzq 473289 473286 0 16:53 ? 00:00:00 postgres: background writer
zzq 473290 473286 0 16:53 ? 00:00:00 postgres: walwriter
zzq 473291 473286 0 16:53 ? 00:00:00 postgres: autovacuum launcher
zzq 473292 473286 0 16:53 ? 00:00:00 postgres: stats collector
zzq 473293 473286 0 16:53 ? 00:00:00 postgres: logical replication launcher
zzq 490861 473056 0 17:04 pts/0 00:00:00 grep postgres
复制
那么 AUTOVACUUM 多久运行一次?
autovacuum launcher 会每隔 autovacuum_naptime ,创建 autovacuum worker,检查是否需要做 autovacuum。autovacuum_naptime 默认为 1min:
highgo=# select * from pg_settings where name ='autovacuum_naptime';
-[ RECORD 1 ]---+---------------------------------------
name | autovacuum_naptime
setting | 60
unit | s
category | Autovacuum
short_desc | Time to sleep between autovacuum runs.
extra_desc |
context | sighup
vartype | integer
source | default
min_val | 1
max_val | 2147483
enumvals |
boot_val | 60
reset_val | 60
sourcefile |
sourceline |
pending_restart | f
复制
autovacuum 又是根据什么标准决定是否进行 VACUUM 和 ANALYZE 呢? 当 autovacuum worker 检查到,dead tuples 大于 vacuum threshold 时,会自动进行 VACUUM。vacuum threshold 公式如下:
vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
复制
analyze threshold 公式如下:
analyze threshold = analyze base threshold + analyze scale factor * number of tuples
复制
其对应的GUC参数如下:
#autovacuum_vacuum_threshold = 50 # min number of row updates before vacuum
#autovacuum_analyze_threshold = 50 # min number of row updates before analyze
#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
复制
触发autovacuum
highgo=# create table tb2(id int,info text);
CREATE TABLE
highgo=# insert into tb2 values (generate_series(1,1000),md5(random()::text));
INSERT 0 1000
复制
查看
highgo=# select * from pg_stat_user_tables where relname='tb2';
-[ RECORD 1 ]-------+------------------------------
relid | 16487
schemaname | public
relname | tb2
seq_scan | 0
seq_tup_read | 0
idx_scan |
idx_tup_fetch |
n_tup_ins | 1000
n_tup_upd | 0
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 1000
n_dead_tup | 0
n_mod_since_analyze | 0
n_ins_since_vacuum | 1000
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze | 2023-07-14 16:58:23.993929+08
vacuum_count | 0
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 1
复制
可以看出一分钟后执行了autoanalyze
场景一:update更新大于70条小于100条记录,查看是否自动执行autovacuum操作
highgo=# update tb2 set info='aa' where id<80;
UPDATE 79
highgo=# select * from pg_stat_user_tables where relname='tb2';
-[ RECORD 1 ]-------+------------------------------
relid | 16487
schemaname | public
relname | tb2
seq_scan | 1
seq_tup_read | 1000
idx_scan |
idx_tup_fetch |
n_tup_ins | 1000
n_tup_upd | 79
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 1000
n_dead_tup | 79
n_mod_since_analyze | 79
n_ins_since_vacuum | 1000
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze | 2023-07-14 16:58:23.993929+08
vacuum_count | 0
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 1
复制
结论:autovacuum_naptime 1分钟以内只执行autoanalzye操作,而没有执行autovacuum操作。
场景二:delete更新大于100条记录,查看是否自动执行autovacuum、autoanalyze操作
highgo=# select * from pg_stat_user_tables where relname='tb2';
-[ RECORD 1 ]-------+------------------------------
relid | 16487
schemaname | public
relname | tb2
seq_scan | 3
seq_tup_read | 3000
idx_scan |
idx_tup_fetch |
n_tup_ins | 1000
n_tup_upd | 188
n_tup_del | 259
n_tup_hot_upd | 102
n_live_tup | 741
n_dead_tup | 0
n_mod_since_analyze | 0
n_ins_since_vacuum | 0
last_vacuum |
last_autovacuum | 2023-07-14 17:08:24.496111+08
last_analyze |
last_autoanalyze | 2023-07-14 17:08:24.498153+08
vacuum_count | 0
autovacuum_count | 1
analyze_count | 0
autoanalyze_count | 3
复制
本次执行删除100条以上数据,都会同时执行autovacuum和autoanalyze操作