暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL学习随笔2 clog(xact)损坏

6210

背景

今天在巡检PostgreSQL数据库日志时,发现某个节点日志信息出现了,"ERROR: could not access status of transaction 228942"的错误信息。分析了一下错误信息,定位为clog(xact)事物状态文件损坏,这种情况是没有真正的办法修复该问题的,只能尝试尽可能的抢救数据,不过还好这套系统是一套非核心系统,并且还在备节点发生该错误,直接将备节点重做了。

事后思考2个问题:

1.如何最大程度的规避该问题?

2.如何在没有备份的情况下,快速恢复现场?并抢救尽可能多的抢救数据。

概念

clog 就是 commit log 用来记录事物最终状态的日志,放在数据库目录的pg_clog下面。PostgreSQL10之后修更名为Xact,数据目录变更为pg_xact下面。

一个事物可以有四种状态,0x00、0x01、0x02 、0x03

每个事物占用,2bit,每个byte中存有4个事物,每个数据块(16KB)中有32768个事物

    /*
    * Possible transaction statuses --- note that all-zeroes is the initial
    * state.
    *
    * A "subcommitted" transaction is a committed subtransaction whose parent
    * hasn't committed or aborted yet.
    */
    typedef int XidStatus;

    #define TRANSACTION_STATUS_IN_PROGRESS 0x00
    #define TRANSACTION_STATUS_COMMITTED 0x01
    #define TRANSACTION_STATUS_ABORTED        0x02
    #define TRANSACTION_STATUS_SUB_COMMITTED 0x03
    复制

    实验

    模拟下clog损坏使用dd命令伪造事物状态,将数据导出后重建实例,并导回数据。

      手工模拟clog损坏
      postgres@bogon-> cd opt/pg_root/pg_xact
      postgres@bogon-> ll
      total 12K
      -rw------- 1 postgres postgres 8.1K Jul 19 04:57 0000
      postgres@bogon-> cat''>0000
      postgres@bogon-> ll
      total 0
      -rw------- 1 postgres postgres 0 Jul 19 04:57 0000

      启动DB,报错无法启动DB
      postgres@bogon-> pg_ctl start
      waiting for server to start....2021-07-19 05:01:51.774 EDT [56728] LOG: starting PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
      2021-07-19 05:01:51.774 EDT [56728] LOG: listening on IPv4 address "0.0.0.0", port 5432
      2021-07-19 05:01:51.775 EDT [56728] LOG: listening on Unix socket "./.s.PGSQL.5432"
      ...2021-07-19 05:01:54.849 EDT [56728] LOG: redirecting log output to logging collector process
      2021-07-19 05:01:54.849 EDT [56728] HINT: Future log output will appear in directory "pg_log".
      stopped waiting
      pg_ctl: could not start server
      Examine the log output.

      检查下日志信息:
      postgres@bogon-> cat postgresql-2021-07-19_050154.log
      2021-07-19 05:01:54.849 EDT [56728] LOG: ending log output to stderr
      2021-07-19 05:01:54.849 EDT [56728] HINT: Future log output will go to log destination "csvlog".
      postgres@bogon-> cat postgresql-2021-07-19_050154.csv
      2021-07-19 05:01:54.849 EDT,,,56728,,60f53f7f.dd98,1,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
      2021-07-19 05:01:54.853 EDT,,,56733,,60f53f82.dd9d,1,,2021-07-19 05:01:54 EDT,,0,LOG,00000,"database system was shut down at 2021-07-19 04:56:09 EDT",,,,,,,,,""
      2021-07-19 05:01:54.855 EDT,,,56733,,60f53f82.dd9d,2,,2021-07-19 05:01:54 EDT,,0,FATAL,XX000,"could not access status of transaction 1217","Could not read from file ""pg_xact/0000"" at offset 0: read too few bytes.",,,,,,,,""
      2021-07-19 05:01:54.855 EDT,,,56728,,60f53f7f.dd98,2,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"startup process (PID 56733) exited with exit code 1",,,,,,,,,""
      2021-07-19 05:01:54.855 EDT,,,56728,,60f53f7f.dd98,3,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""
      2021-07-19 05:01:54.879 EDT,,,56728,,60f53f7f.dd98,4,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"database system is shut down",,,,,,,,,""

      错误信息
      FATAL,XX000,"could not access status of transaction 1217","Could not read from file ""pg_xact/0000"" at offset 0: read too few bytes.",,,,,,,,""
      无法访问事物1217的状态

      尝试使用dd工具修复
      postgres@bogon-> dd if=/dev/zero of=/opt/pg_root/pg_xact/0000 bs=256K count=1
      1+0 records in
      1+0 records out
      262144 bytes (262 kB) copied, 0.000680516 s, 385 MB/s
      postgres@bogon-> ll
      total 256K
      -rw------- 1 postgres postgres 256K Jul 19 05:12 0000

      启动数据库
      postgres@bogon-> pg_ctl start
      waiting for server to start....2021-07-19 05:13:13.879 EDT [57590] LOG: starting PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
      2021-07-19 05:13:13.879 EDT [57590] LOG: listening on IPv4 address "0.0.0.0", port 5432
      2021-07-19 05:13:13.881 EDT [57590] LOG: listening on Unix socket "./.s.PGSQL.5432"
      ...2021-07-19 05:13:17.483 EDT [57590] LOG: redirecting log output to logging collector process
      2021-07-19 05:13:17.483 EDT [57590] HINT: Future log output will appear in directory "pg_log".
      done
      server started
      数据库已经启动

      检查数据库日志
      2021-07-19 05:13:17.483 EDT,,,57590,,60f54229.e0f6,1,,2021-07-19 05:13:13 EDT,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
      2021-07-19 05:13:17.495 EDT,,,57595,,60f5422d.e0fb,1,,2021-07-19 05:13:17 EDT,,0,LOG,00000,"database system was shut down at 2021-07-19 04:56:09 EDT",,,,,,,,,""
      2021-07-19 05:13:17.536 EDT,,,57590,,60f54229.e0f6,2,,2021-07-19 05:13:13 EDT,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""

      复制

      clog伪造完成,需要对数据库做一个全量备份,并重新创建数据库实例。


      参考

      《精通PostgreSQL11》

      文章转载自CP的PostgreSQL厨房,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

      评论