暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

postgresql表膨胀清理演示

dm5250 2024-07-02
50

一、添加pgstattuple

操作数据时,PostgreSQL会为每一个客户端提供单独的快照。客户端对快照中的数据进行更新和删除,一些数据最终变成过期数据,但它们仍存在于数据库中,还占用表数据文件和索引数据文件的空间。因此数据文件中会存在碎片,引起整体数据库性能下降。这时候vacuum出来干活了,vacuum的主要任务就是清理表和索引中不需要的数据(死数据),为新加入的数据清理出来空间。

vacuum也会存在缺陷。比如1,vacuum完成清理工作后,那些空间并没有真正被释放掉,只能被vacuum清理过的表和索引所利用。虽然看上去表和索引大小都已经减少了,但是实际上和vacuum清理前的大小是一样。这让很多刚开始使用Postgres的用户感到困惑。比如2,假设有一张表上有大量索引,表上还有密集的更新操作。遇到这样的情况,大量更新会触发一次新的vacuum,有可能会发现这张表上有无穷无尽的vacuum在执行。无论怎么修改vacuum的配置都没有用,结果导致——表膨胀了。比如3,执行空事务,vacuum会推迟清理死数据导致表和索引膨胀。空事务,甚至经常执行长事务都会对数据库有影响,应避免这些操作

pgcompacttable利用了PostgreSQL的一个有趣特性:在执行INSERT和UPDATE操作时,会将所有新版本的行移到表最开始的可用空间。此为pgcompacttable工具的关键,因为如果从末端反向开始更新所有行,最终所有可用空间被这些行填充,并将表尾部的空间全部释放以便让定期vacuum进行truncate。这样一来,pgcompacttable通过批量更新和vacuum强制移动,最终整个表被重新整理,达到压缩的效果。此工具对磁盘空间要求低,且性能影响可控。

pgcompacttable工具使用过程中需要依赖pgstattuple,因此需先添加pgstattuple。如果是源码安装的postgresql,则源码里包含了postgresql-contrib。

[root@pg01 contrib]# cd /soft/postgresql-14.5/contrib/
[root@pg01 contrib]# make && make install
[root@pg01 contrib]# su - postgres 
Last login: Thu May 11 14:22:22 CST 2023 on pts/0
[postgres@pg01 ~]$ psql 
psql (14.5)
Type "help" for help.

postgres=# select * from pg_available_extensions where name like 'pgstat%';
    name     | default_version | installed_version |           comment           
-------------+-----------------+-------------------+-----------------------------
 pgstattuple | 1.5             |                   | show tuple-level statistics
(1 row)

postgres=# create extension pgstattuple;
CREATE EXTENSION

postgres=# \dx
                                      List of installed extensions
      Name       | Version |   Schema   |                          Description                           
-----------------+---------+------------+----------------------------------------------------------------
 pg_freespacemap | 1.2     | public     | examine the free space map (FSM)
 pg_visibility   | 1.2     | public     | examine the visibility map (VM) and page-level visibility info
 pgstattuple     | 1.5     | public     | show tuple-level statistics
 plpgsql         | 1.0     | pg_catalog | PL/pgSQL procedural language
(4 rows)
复制


二、准备环境

1、建表

create table test(
	id int ,
    sex varchar(50) , 
    name varchar(50) , 
    now_address varchar(100),
    address   varchar(100)
);
复制


2、插入数据

begin;
insert into test values(generate_series(1,10000),repeat( chr(int4(random()*26)+65),1),repeat( chr(int4(random()*26)+65),6),repeat( chr(int4(random()*26)+65),30),repeat( chr(int4(random()*26)+65),30));
commit;
复制


3、建立索引

create index on test(id,sex);
create index on test(name,now_address,address);

###数据如下所示
postgres=# select count(*) from test ;
 count 
-------
 10000
(1 row)

postgres=# \d+
                                       List of relations
 Schema |   Name   | Type  |  Owner   | Persistence | Access method |    Size    | Description 
--------+----------+-------+----------+-------------+---------------+------------+-------------
 public | t1       | table | postgres | permanent   | heap          | 8192 bytes | 
 public | test     | table | postgres | permanent   | heap          | 1104 kB    | 
 public | test_big | table | postgres | permanent   | heap          | 4408 kB    | 
 public | test_tbs | table | postgres | permanent   | heap          | 0 bytes    | 
(4 rows)

postgres=#  select * from test limit 10 ;
 id | sex |  name  |          now_address           |            address             
----+-----+--------+--------------------------------+--------------------------------
  1 | R   | PPPPPP | EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE | YYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
  2 | N   | EEEEEE | PPPPPPPPPPPPPPPPPPPPPPPPPPPPPP | UUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
  3 | R   | OOOOOO | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM | KKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
  4 | Q   | IIIIII | RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
  5 | D   | IIIIII | CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC | DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
  6 | J   | PPPPPP | LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL | FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  7 | V   | JJJJJJ | ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  8 | R   | BBBBBB | VVVVVVVVVVVVVVVVVVVVVVVVVVVVVV | NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
  9 | K   | YYYYYY | RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
 10 | R   | FFFFFF | EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE | YYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
(10 rows)
复制

可以看到,test表有10000条数据,表大小为1104KB。


三、模拟数据修改

[postgres@pg01 ~]$  vi a.sh 
for((i=1;i<=10000;i++))
do
a=`tr -dc A-Z[ < /dev/urandom | head -c1`
psql -Upostgres -h 127.0.0.1 -d postgres -c "update  test set name=repeat( chr(int4(random()*26)+65),6),now_address=repeat( chr(int4(random()*26)+65),30),address=repeat( chr(int4(random()*26)+65),30)  where sex='$a';" >>/dev/null
if [ $? == 0 ]
        then
         echo "$a is ok" >>update.log
       else
          echo "$a is close" >>update.err.log
       exit 1;
fi
done

[postgres@pg01 ~]$ chmod a+x a.sh 
[postgres@pg01 ~]$ ./a.sh 
复制


四、膨胀再现


postgres=# \d+
                                       List of relations
 Schema |   Name   | Type  |  Owner   | Persistence | Access method |    Size    | Description 
--------+----------+-------+----------+-------------+---------------+------------+-------------
 public | t1       | table | postgres | permanent   | heap          | 8192 bytes | 
 public | test     | table | postgres | permanent   | heap          | 24 MB      | 
 public | test_big | table | postgres | permanent   | heap          | 4408 kB    | 
 public | test_tbs | table | postgres | permanent   | heap          | 0 bytes    | 
(4 rows)

postgres=# \x
Expanded display is on.
postgres=# SELECT * FROM pgstattuple('public.test');
-[ RECORD 1 ]------+---------
table_len          | 25550848
tuple_count        | 10000
tuple_len          | 990000
tuple_percent      | 3.87
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 23868444
free_percent       | 93.42

pgstattuple 输出列如下:
    字段              类型            描述
table_len           bigint       物理关系长度,以字节计
tuple_count         bigint       活的元组的数量
tuple_len           bigint       活的元组的总长度,以字节计
tuple_percent       float8       活的元组的百分比
dead_tuple_count    bigint       死的元组的数量
dead_tuple_len      bigint       死的元组的总长度,以字节计
dead_tuple_percent  float8       死的元组的百分比
free_space          bigint       空闲空间总量,以字节计
free_percent        float8       空闲空间的百分比
复制

可以看到,还是1w条数据,占用的空间有原先的1104KB上涨到了24MB。


五、pgcompacttable使用

1、安装pgcompacttable

[root@pg01 ~]# mount /dev/sr0  /mnt/cdrom/
[root@pg01 ~]# yum install perl-Time-HiRes perl-DBI perl-DBD-Pg -y
[root@pg01 ~]# yum install -y git
[root@pg01 ~]# su - postgres 
[postgres@pg01 ~]$ git clone https://github.com/dataegret/pgcompacttable.git
Cloning into 'pgcompacttable'...
remote: Enumerating objects: 320, done.
remote: Counting objects: 100% (48/48), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 320 (delta 11), reused 41 (delta 10), pack-reused 272
Receiving objects: 100% (320/320), 83.84 KiB | 34.00 KiB/s, done.
Resolving deltas: 100% (74/74), done.

[postgres@pg01 ~]$ cd pgcompacttable
[postgres@pg01 pgcompacttable]$ ll
total 4
drwxrwxr-x 2 postgres postgres   27 May 11 15:02 bin
-rw-rw-r-- 1 postgres postgres 2547 May 11 15:02 README.md
复制


2、使用pgcompacttable

[postgres@pg01 pgcompacttable]$ cd /home/postgres/pgcompacttable/bin
[postgres@pg01 bin]$ ./pgcompacttable -h localhost -U postgres -d postgres -n public -t test
[Thu May 11 15:04:23 2023] (postgres) Connecting to database
[Thu May 11 15:04:23 2023] (postgres) Postgres backend pid: 106942
[Thu May 11 15:04:23 2023] (postgres) Handling tables. Attempt 1
[Thu May 11 15:04:23 2023] (postgres:public.test) Statistics: 3110 pages (4075 pages including toasts and indexes), it is expected that ~93.400% (2904 pages) can be compacted with the estimated space saving being 22.693MB.
[Thu May 11 15:04:49 2023] (postgres:public.test) Reindex: public.test_id_sex_idx, initial size 222 pages(1.734MB), has been reduced by 86% (1.500MB), duration 0 seconds.
[Thu May 11 15:04:49 2023] (postgres:public.test) Reindex: public.test_name_now_address_address_idx, initial size 739 pages(5.773MB), has been reduced by 86% (5.008MB), duration 0 seconds.
[Thu May 11 15:04:49 2023] (postgres:public.test) Processing results: 136 pages left (268 pages including toasts and indexes), size reduced by 23.305MB (29.812MB including toasts and indexes) in total.
[Thu May 11 15:04:49 2023] (postgres) Processing complete.
[Thu May 11 15:04:49 2023] (postgres) Processing results: size reduced by 23.305MB (29.812MB including toasts and indexes) in total.
[Thu May 11 15:04:49 2023] (postgres) Disconnecting from database
[Thu May 11 15:04:49 2023] Processing complete: 1 retries to process has been done
[Thu May 11 15:04:49 2023] Processing results: size reduced by 23.305MB (29.812MB including toasts and indexes) in total, 23.305MB (29.812MB) postgres.

[postgres@pg01 bin]$ psql
psql (14.5)
Type "help" for help.

postgres=# \d+
                                       List of relations
 Schema |   Name   | Type  |  Owner   | Persistence | Access method |    Size    | Description 
--------+----------+-------+----------+-------------+---------------+------------+-------------
 public | t1       | table | postgres | permanent   | heap          | 8192 bytes | 
 public | test     | table | postgres | permanent   | heap          | 1120 kB    | 
 public | test_big | table | postgres | permanent   | heap          | 4408 kB    | 
 public | test_tbs | table | postgres | permanent   | heap          | 0 bytes    | 
(4 rows)

postgres=# \x
Expanded display is on.
postgres=#  SELECT * FROM pgstattuple('public.test');
-[ RECORD 1 ]------+--------
table_len          | 1114112
tuple_count        | 10000
tuple_len          | 990000
tuple_percent      | 88.86
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 19260
free_percent       | 1.73
复制

可以看出,表大小从24MB缩小回1120KB。


「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论