暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

PG数据库表膨胀不能清理问题分析处理

PostgreSQL数据库工作学习随笔 2022-07-09
1349

    我们知道默认情况下表在条件达到autovacuum_vacuum_scale_factor和autovacuum_vacuum_threshold的限制时会触发autovacuum,比如基于1000W的表,需要有20W以上数据进行更改时才触发autovacuum,但是同时对于有锁的事务比如长事务、大量频繁的更新操作,autovacuum会跳过,由于时间限制不能完成表清理。

    那么我们怎么判断是哪些原因导致表不能清理呢?我们做一个测试说明长事务导致表不能清理的情况:

    为了方便测试我们把所有autovacuum执行过程写入日志。
    postgres=# show log_autovacuum_min_duration ;
    log_autovacuum_min_duration
    -----------------------------
    -1
    (1 row)


    postgres=# alter system set log_autovacuum_min_duration to 0;
    ALTER SYSTEM
    postgres=# select pg_reload_conf();
    pg_reload_conf
    ----------------
    t
    (1 row)


    postgres=# show log_autovacuum_min_duration ;
    log_autovacuum_min_duration
    -----------------------------
    0
    (1 row)


    postgres=
    复制
      创建测试表和测试数据,为了方便测试我把表tbl_test_autovacuum的autovacuum触发条件改为autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=1
      postgres=# create table tbl_test_autovacuum(id int ,name varchar);
      CREATE TABLE
      postgres=# alter table tbl_test_autovacuum set (autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=1);
      ALTER TABLE
      postgres=# insert into tbl_test_autovacuum values (1,'hl'),(2,'hl'),(3,'hl'),(4,'hl'),(5,'hl');
      INSERT 0 5
      postgres=#


      查看数据
      postgres=# select relname,relpages,reltuples,pg_stat_get_live_tuples(oid),pg_stat_get_dead_tuples(oid) from pg_class where relname='tbl_test_autovacuum';
      relname | relpages | reltuples | pg_stat_get_live_tuples | pg_stat_get_dead_tuples
      ---------------------+----------+-----------+-------------------------+-------------------------
      tbl_test_autovacuum | 1 | 5 | 5 | 0
      (1 row)


      postgres=#


      复制

      session1 开启一个事务,更新一条数据,不提交

        postgres=# begin;
        BEGIN
        postgres=*# update tbl_test_autovacuum set name='test' where id=1;
        UPDATE 1
        postgres=*#
        复制

        session2 删除部分数据触发autovacuum

          postgres=# delete from tbl_test_autovacuum where id > 2;
          DELETE 3
          postgres=#
          复制
            再次查看数据分布情况,产生了三个死元组
            postgres=# select relname,relpages,reltuples,pg_stat_get_live_tuples(oid),pg_stat_get_dead_tuples(oid) from pg_class where relname='tbl_test_autovacuum';
            relname | relpages | reltuples | pg_stat_get_live_tuples | pg_stat_get_dead_tuples
            ---------------------+----------+-----------+-------------------------+-------------------------
            tbl_test_autovacuum | 1 | 5 | 2 | 3
            (1 row)


            postgres=#
            复制

            查看数据库日志

              autovacuum 触发了,但是有些行无法移除,在查询中说明涉及到的行是会被保护的,提示oldest xmin: 1703。该 xmin 是造成不能 vacuum 的事务 id,根据提示信息,有尚未提交的事务,导致autovacuum 进程不能 vacuum。
              2022-06-24 02:30:42.575 EDT [9425] LOG: automatic vacuum of table "postgres.public.tbl_test_autovacuum": index scans: 0
              pages: 0 removed, 1 remain, 0 skipped due to pins, 0 skipped frozen
              tuples: 0 removed, 5 remain, 3 are dead but not yet removable, oldest xmin: 1703
              index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
              avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
              buffer usage: 32 hits, 0 misses, 0 dirtied
              WAL usage: 0 records, 0 full page images, 0 bytes
              system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
              复制

                  查看1703是什么事务,通过pg_stat_activity视图可以看到就是session1执行的未提交事务。

                postgres=# \x
                Expanded display is on.
                postgres=# select * from pg_stat_activity where backend_xid=1703;
                -[ RECORD 1 ]----+-------------------------------------------------------
                datid | 13892
                datname | postgres
                pid | 8926
                leader_pid |
                usesysid | 10
                usename | atlasdb
                application_name | psql
                client_addr |
                client_hostname |
                client_port | -1
                backend_start | 2022-06-24 02:22:43.805632-04
                xact_start | 2022-06-24 02:27:55.184073-04
                query_start | 2022-06-24 02:27:57.200055-04
                state_change | 2022-06-24 02:27:57.200932-04
                wait_event_type | Client
                wait_event | ClientRead
                state | idle in transaction
                backend_xid | 1703
                backend_xmin |
                query_id | 3783348657976902404
                query | update tbl_test_autovacuum set name='test' where id=1;
                backend_type | client backend


                postgres=#
                复制

                此时提交session1中的事务

                  postgres=# begin;
                  BEGIN
                  postgres=*# update tbl_test_autovacuum set name='test' where id=1;
                  UPDATE 1
                  postgres=*# end;
                  COMMIT
                  postgres=
                  查看数据元组情况,由于autovacuum_naptime参数设置不能立刻触发autovacuum
                  postgres=# select relname,relpages,reltuples,pg_stat_get_live_tuples(oid),pg_stat_get_dead_tuples(oid) from pg_class where relname='tbl_test_autovacuum';
                  relname | relpages | reltuples | pg_stat_get_live_tuples | pg_stat_get_dead_tuples
                  ---------------------+----------+-----------+-------------------------+-------------------------
                  tbl_test_autovacuum | 1 | 2 | 2 | 4
                  (1 row)


                  postgres=#
                  复制

                  继续观察数据库日志,可以看到死元组被清理了。

                    2022-06-24 02:38:42.677 EDT [9841] LOG:  automatic vacuum of table "postgres.public.tbl_test_autovacuum": index scans: 0
                    pages: 0 removed, 1 remain, 0 skipped due to pins, 0 skipped frozen
                    tuples: 4 removed, 2 remain, 0 are dead but not yet removable, oldest xmin: 1705
                    index scan not needed: 1 pages from table (100.00% of total) had 3 dead item identifiers removed
                    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
                    buffer usage: 32 hits, 0 misses, 0 dirtied
                    WAL usage: 2 records, 0 full page images, 118 bytes
                    system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
                    复制

                    再次查看数据分布情况,已经没有死元组了

                      postgres=# select relname,relpages,reltuples,pg_stat_get_live_tuples(oid),pg_stat_get_dead_tuples(oid) from pg_class where relname='tbl_test_autovacuum';
                      relname | relpages | reltuples | pg_stat_get_live_tuples | pg_stat_get_dead_tuples
                      ---------------------+----------+-----------+-------------------------+-------------------------
                      tbl_test_autovacuum | 1 | 2 | 2 | 0
                      (1 row)


                      postgres=#


                      复制

                      总结:

                      1. 在存在长事务的时候,触发autovacuum后死元组是不被清理的;

                      2. 当长事务提交,但是条件不够触发autovacuum时,可以使用vacuum手动进行数据清理;

                      3. 如果使用vacuum清理的时候如果长事务未提交,也是不能清理死元组的。


                      文章转载自PostgreSQL数据库工作学习随笔,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                      评论