背景
今天在巡检PostgreSQL数据库日志时,发现某个节点日志信息出现了,"ERROR: could not access status of transaction 228942"的错误信息。分析了一下错误信息,定位为clog(xact)事物状态文件损坏,这种情况是没有真正的办法修复该问题的,只能尝试尽可能的抢救数据,不过还好这套系统是一套非核心系统,并且还在备节点发生该错误,直接将备节点重做了。
事后思考2个问题:
1.如何最大程度的规避该问题?
2.如何在没有备份的情况下,快速恢复现场?并抢救尽可能多的抢救数据。
概念
/*
* Possible transaction statuses --- note that all-zeroes is the initial
* state.
*
* A "subcommitted" transaction is a committed subtransaction whose parent
* hasn't committed or aborted yet.
*/
typedef int XidStatus;
#define TRANSACTION_STATUS_IN_PROGRESS 0x00
#define TRANSACTION_STATUS_COMMITTED 0x01
#define TRANSACTION_STATUS_ABORTED 0x02
#define TRANSACTION_STATUS_SUB_COMMITTED 0x03
复制
实验
模拟下clog损坏使用dd命令伪造事物状态,将数据导出后重建实例,并导回数据。
手工模拟clog损坏
postgres@bogon-> cd opt/pg_root/pg_xact
postgres@bogon-> ll
total 12K
-rw------- 1 postgres postgres 8.1K Jul 19 04:57 0000
postgres@bogon-> cat''>0000
postgres@bogon-> ll
total 0
-rw------- 1 postgres postgres 0 Jul 19 04:57 0000
启动DB,报错无法启动DB
postgres@bogon-> pg_ctl start
waiting for server to start....2021-07-19 05:01:51.774 EDT [56728] LOG: starting PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2021-07-19 05:01:51.774 EDT [56728] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-07-19 05:01:51.775 EDT [56728] LOG: listening on Unix socket "./.s.PGSQL.5432"
...2021-07-19 05:01:54.849 EDT [56728] LOG: redirecting log output to logging collector process
2021-07-19 05:01:54.849 EDT [56728] HINT: Future log output will appear in directory "pg_log".
stopped waiting
pg_ctl: could not start server
Examine the log output.
检查下日志信息:
postgres@bogon-> cat postgresql-2021-07-19_050154.log
2021-07-19 05:01:54.849 EDT [56728] LOG: ending log output to stderr
2021-07-19 05:01:54.849 EDT [56728] HINT: Future log output will go to log destination "csvlog".
postgres@bogon-> cat postgresql-2021-07-19_050154.csv
2021-07-19 05:01:54.849 EDT,,,56728,,60f53f7f.dd98,1,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2021-07-19 05:01:54.853 EDT,,,56733,,60f53f82.dd9d,1,,2021-07-19 05:01:54 EDT,,0,LOG,00000,"database system was shut down at 2021-07-19 04:56:09 EDT",,,,,,,,,""
2021-07-19 05:01:54.855 EDT,,,56733,,60f53f82.dd9d,2,,2021-07-19 05:01:54 EDT,,0,FATAL,XX000,"could not access status of transaction 1217","Could not read from file ""pg_xact/0000"" at offset 0: read too few bytes.",,,,,,,,""
2021-07-19 05:01:54.855 EDT,,,56728,,60f53f7f.dd98,2,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"startup process (PID 56733) exited with exit code 1",,,,,,,,,""
2021-07-19 05:01:54.855 EDT,,,56728,,60f53f7f.dd98,3,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""
2021-07-19 05:01:54.879 EDT,,,56728,,60f53f7f.dd98,4,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"database system is shut down",,,,,,,,,""
错误信息
FATAL,XX000,"could not access status of transaction 1217","Could not read from file ""pg_xact/0000"" at offset 0: read too few bytes.",,,,,,,,""
无法访问事物1217的状态
尝试使用dd工具修复
postgres@bogon-> dd if=/dev/zero of=/opt/pg_root/pg_xact/0000 bs=256K count=1
1+0 records in
1+0 records out
262144 bytes (262 kB) copied, 0.000680516 s, 385 MB/s
postgres@bogon-> ll
total 256K
-rw------- 1 postgres postgres 256K Jul 19 05:12 0000
启动数据库
postgres@bogon-> pg_ctl start
waiting for server to start....2021-07-19 05:13:13.879 EDT [57590] LOG: starting PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2021-07-19 05:13:13.879 EDT [57590] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-07-19 05:13:13.881 EDT [57590] LOG: listening on Unix socket "./.s.PGSQL.5432"
...2021-07-19 05:13:17.483 EDT [57590] LOG: redirecting log output to logging collector process
2021-07-19 05:13:17.483 EDT [57590] HINT: Future log output will appear in directory "pg_log".
done
server started
数据库已经启动
检查数据库日志
2021-07-19 05:13:17.483 EDT,,,57590,,60f54229.e0f6,1,,2021-07-19 05:13:13 EDT,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2021-07-19 05:13:17.495 EDT,,,57595,,60f5422d.e0fb,1,,2021-07-19 05:13:17 EDT,,0,LOG,00000,"database system was shut down at 2021-07-19 04:56:09 EDT",,,,,,,,,""
2021-07-19 05:13:17.536 EDT,,,57590,,60f54229.e0f6,2,,2021-07-19 05:13:13 EDT,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""
复制
clog伪造完成,需要对数据库做一个全量备份,并重新创建数据库实例。
参考
《精通PostgreSQL11》
文章转载自CP的PostgreSQL厨房,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
明明想执行的SQL是DELETE、UPDATE,但为什么看到的是SELECT(FDW的实现原理解析)
小满未满、
377次阅读
2025-03-19 23:11:26
PostgreSQL初/中/高级认证考试(3.15)通过考生公示
开源软件联盟PostgreSQL分会
351次阅读
2025-03-20 09:50:36
IvorySQL 4.4 发布 - 基于 PostgreSQL 17.4,增强平台支持
通讯员
220次阅读
2025-03-20 15:31:04
openHalo问世,全球首款基于PostgreSQL兼容MySQL协议的国产开源数据库
严少安
205次阅读
2025-04-07 12:14:29
命名不规范,事后泪两行
xiongcc
203次阅读
2025-03-13 14:26:08
PG vs MySQL 执行计划解读的异同点
进击的CJR
142次阅读
2025-03-21 10:50:08
版本发布| IvorySQL 4.4 发布
IvorySQL开源数据库社区
131次阅读
2025-03-13 09:52:33
PostgreSQL分区管理扩展——pg_partman
chirpyli
110次阅读
2025-03-19 15:48:31
手把手教你在 openKylin 上部署 IvorySQL 4.4
严少安
107次阅读
2025-03-27 20:41:28
postgresql+patroni+etcd高可用安装
necessary
106次阅读
2025-03-28 10:11:23