一、添加pgstattuple
操作数据时,PostgreSQL会为每一个客户端提供单独的快照。客户端对快照中的数据进行更新和删除,一些数据最终变成过期数据,但它们仍存在于数据库中,还占用表数据文件和索引数据文件的空间。因此数据文件中会存在碎片,引起整体数据库性能下降。这时候vacuum出来干活了,vacuum的主要任务就是清理表和索引中不需要的数据(死数据),为新加入的数据清理出来空间。
vacuum也会存在缺陷。比如1,vacuum完成清理工作后,那些空间并没有真正被释放掉,只能被vacuum清理过的表和索引所利用。虽然看上去表和索引大小都已经减少了,但是实际上和vacuum清理前的大小是一样。这让很多刚开始使用Postgres的用户感到困惑。比如2,假设有一张表上有大量索引,表上还有密集的更新操作。遇到这样的情况,大量更新会触发一次新的vacuum,有可能会发现这张表上有无穷无尽的vacuum在执行。无论怎么修改vacuum的配置都没有用,结果导致——表膨胀了。比如3,执行空事务,vacuum会推迟清理死数据导致表和索引膨胀。空事务,甚至经常执行长事务都会对数据库有影响,应避免这些操作
pgcompacttable利用了PostgreSQL的一个有趣特性:在执行INSERT和UPDATE操作时,会将所有新版本的行移到表最开始的可用空间。此为pgcompacttable工具的关键,因为如果从末端反向开始更新所有行,最终所有可用空间被这些行填充,并将表尾部的空间全部释放以便让定期vacuum进行truncate。这样一来,pgcompacttable通过批量更新和vacuum强制移动,最终整个表被重新整理,达到压缩的效果。此工具对磁盘空间要求低,且性能影响可控。
pgcompacttable工具使用过程中需要依赖pgstattuple,因此需先添加pgstattuple。如果是源码安装的postgresql,则源码里包含了postgresql-contrib。
[root@pg01 contrib]# cd /soft/postgresql-14.5/contrib/
[root@pg01 contrib]# make && make install
[root@pg01 contrib]# su - postgres
Last login: Thu May 11 14:22:22 CST 2023 on pts/0
[postgres@pg01 ~]$ psql
psql (14.5)
Type "help" for help.
postgres=# select * from pg_available_extensions where name like 'pgstat%';
name | default_version | installed_version | comment
-------------+-----------------+-------------------+-----------------------------
pgstattuple | 1.5 | | show tuple-level statistics
(1 row)
postgres=# create extension pgstattuple;
CREATE EXTENSION
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
-----------------+---------+------------+----------------------------------------------------------------
pg_freespacemap | 1.2 | public | examine the free space map (FSM)
pg_visibility | 1.2 | public | examine the visibility map (VM) and page-level visibility info
pgstattuple | 1.5 | public | show tuple-level statistics
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(4 rows)
复制
二、准备环境
1、建表
create table test(
id int ,
sex varchar(50) ,
name varchar(50) ,
now_address varchar(100),
address varchar(100)
);
复制
2、插入数据
begin;
insert into test values(generate_series(1,10000),repeat( chr(int4(random()*26)+65),1),repeat( chr(int4(random()*26)+65),6),repeat( chr(int4(random()*26)+65),30),repeat( chr(int4(random()*26)+65),30));
commit;
复制
3、建立索引
create index on test(id,sex);
create index on test(name,now_address,address);
###数据如下所示
postgres=# select count(*) from test ;
count
-------
10000
(1 row)
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+----------+-------+----------+-------------+---------------+------------+-------------
public | t1 | table | postgres | permanent | heap | 8192 bytes |
public | test | table | postgres | permanent | heap | 1104 kB |
public | test_big | table | postgres | permanent | heap | 4408 kB |
public | test_tbs | table | postgres | permanent | heap | 0 bytes |
(4 rows)
postgres=# select * from test limit 10 ;
id | sex | name | now_address | address
----+-----+--------+--------------------------------+--------------------------------
1 | R | PPPPPP | EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE | YYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
2 | N | EEEEEE | PPPPPPPPPPPPPPPPPPPPPPPPPPPPPP | UUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
3 | R | OOOOOO | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM | KKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
4 | Q | IIIIII | RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
5 | D | IIIIII | CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC | DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
6 | J | PPPPPP | LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL | FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
7 | V | JJJJJJ | ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
8 | R | BBBBBB | VVVVVVVVVVVVVVVVVVVVVVVVVVVVVV | NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 | K | YYYYYY | RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
10 | R | FFFFFF | EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE | YYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
(10 rows)
复制
可以看到,test表有10000条数据,表大小为1104KB。
三、模拟数据修改
[postgres@pg01 ~]$ vi a.sh
for((i=1;i<=10000;i++))
do
a=`tr -dc A-Z[ < /dev/urandom | head -c1`
psql -Upostgres -h 127.0.0.1 -d postgres -c "update test set name=repeat( chr(int4(random()*26)+65),6),now_address=repeat( chr(int4(random()*26)+65),30),address=repeat( chr(int4(random()*26)+65),30) where sex='$a';" >>/dev/null
if [ $? == 0 ]
then
echo "$a is ok" >>update.log
else
echo "$a is close" >>update.err.log
exit 1;
fi
done
[postgres@pg01 ~]$ chmod a+x a.sh
[postgres@pg01 ~]$ ./a.sh
复制
四、膨胀再现
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+----------+-------+----------+-------------+---------------+------------+-------------
public | t1 | table | postgres | permanent | heap | 8192 bytes |
public | test | table | postgres | permanent | heap | 24 MB |
public | test_big | table | postgres | permanent | heap | 4408 kB |
public | test_tbs | table | postgres | permanent | heap | 0 bytes |
(4 rows)
postgres=# \x
Expanded display is on.
postgres=# SELECT * FROM pgstattuple('public.test');
-[ RECORD 1 ]------+---------
table_len | 25550848
tuple_count | 10000
tuple_len | 990000
tuple_percent | 3.87
dead_tuple_count | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 23868444
free_percent | 93.42
pgstattuple 输出列如下:
字段 类型 描述
table_len bigint 物理关系长度,以字节计
tuple_count bigint 活的元组的数量
tuple_len bigint 活的元组的总长度,以字节计
tuple_percent float8 活的元组的百分比
dead_tuple_count bigint 死的元组的数量
dead_tuple_len bigint 死的元组的总长度,以字节计
dead_tuple_percent float8 死的元组的百分比
free_space bigint 空闲空间总量,以字节计
free_percent float8 空闲空间的百分比
复制
可以看到,还是1w条数据,占用的空间有原先的1104KB上涨到了24MB。
五、pgcompacttable使用
1、安装pgcompacttable
[root@pg01 ~]# mount /dev/sr0 /mnt/cdrom/
[root@pg01 ~]# yum install perl-Time-HiRes perl-DBI perl-DBD-Pg -y
[root@pg01 ~]# yum install -y git
[root@pg01 ~]# su - postgres
[postgres@pg01 ~]$ git clone https://github.com/dataegret/pgcompacttable.git
Cloning into 'pgcompacttable'...
remote: Enumerating objects: 320, done.
remote: Counting objects: 100% (48/48), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 320 (delta 11), reused 41 (delta 10), pack-reused 272
Receiving objects: 100% (320/320), 83.84 KiB | 34.00 KiB/s, done.
Resolving deltas: 100% (74/74), done.
[postgres@pg01 ~]$ cd pgcompacttable
[postgres@pg01 pgcompacttable]$ ll
total 4
drwxrwxr-x 2 postgres postgres 27 May 11 15:02 bin
-rw-rw-r-- 1 postgres postgres 2547 May 11 15:02 README.md
复制
2、使用pgcompacttable
[postgres@pg01 pgcompacttable]$ cd /home/postgres/pgcompacttable/bin
[postgres@pg01 bin]$ ./pgcompacttable -h localhost -U postgres -d postgres -n public -t test
[Thu May 11 15:04:23 2023] (postgres) Connecting to database
[Thu May 11 15:04:23 2023] (postgres) Postgres backend pid: 106942
[Thu May 11 15:04:23 2023] (postgres) Handling tables. Attempt 1
[Thu May 11 15:04:23 2023] (postgres:public.test) Statistics: 3110 pages (4075 pages including toasts and indexes), it is expected that ~93.400% (2904 pages) can be compacted with the estimated space saving being 22.693MB.
[Thu May 11 15:04:49 2023] (postgres:public.test) Reindex: public.test_id_sex_idx, initial size 222 pages(1.734MB), has been reduced by 86% (1.500MB), duration 0 seconds.
[Thu May 11 15:04:49 2023] (postgres:public.test) Reindex: public.test_name_now_address_address_idx, initial size 739 pages(5.773MB), has been reduced by 86% (5.008MB), duration 0 seconds.
[Thu May 11 15:04:49 2023] (postgres:public.test) Processing results: 136 pages left (268 pages including toasts and indexes), size reduced by 23.305MB (29.812MB including toasts and indexes) in total.
[Thu May 11 15:04:49 2023] (postgres) Processing complete.
[Thu May 11 15:04:49 2023] (postgres) Processing results: size reduced by 23.305MB (29.812MB including toasts and indexes) in total.
[Thu May 11 15:04:49 2023] (postgres) Disconnecting from database
[Thu May 11 15:04:49 2023] Processing complete: 1 retries to process has been done
[Thu May 11 15:04:49 2023] Processing results: size reduced by 23.305MB (29.812MB including toasts and indexes) in total, 23.305MB (29.812MB) postgres.
[postgres@pg01 bin]$ psql
psql (14.5)
Type "help" for help.
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+----------+-------+----------+-------------+---------------+------------+-------------
public | t1 | table | postgres | permanent | heap | 8192 bytes |
public | test | table | postgres | permanent | heap | 1120 kB |
public | test_big | table | postgres | permanent | heap | 4408 kB |
public | test_tbs | table | postgres | permanent | heap | 0 bytes |
(4 rows)
postgres=# \x
Expanded display is on.
postgres=# SELECT * FROM pgstattuple('public.test');
-[ RECORD 1 ]------+--------
table_len | 1114112
tuple_count | 10000
tuple_len | 990000
tuple_percent | 88.86
dead_tuple_count | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 19260
free_percent | 1.73
复制
可以看出,表大小从24MB缩小回1120KB。