3_PostgreSQL深入解析controlfile
1. 内容概述
postgresql数据库中controlfile为8K大小的二进制文件,主要存储和数据库运行相关的检查点信息、系统信息、初始化信息及恢复信息等,在initdb初始化数据库时生成,损坏时会影响数据库的正常启动。
2. 生成过程详解
执行initdb初始化数据库时调用函数BootStrapXLOG生成controlfile,可以发现pg代码的注释还是比较清晰的,有助于我们理解和学习。
源码路径: ./src/backend/access/transam/xlog.c
/*
* This func must be called ONCE on system install. It creates pg_control
* and the initial XLOG segment.
*/
void
BootStrapXLOG(void)
{
..................详细内容请参见源码
/* Now create pg_control */
InitControlFile(sysidentifier);
ControlFile->time = checkPoint.time;
ControlFile->checkPoint = checkPoint.redo;
ControlFile->checkPointCopy = checkPoint;
/* some additional ControlFile fields are set in WriteControlFile() */
WriteControlFile();
/* Bootstrap the commit log, too */
BootStrapCLOG();
BootStrapCommitTs();
BootStrapSUBTRANS();
BootStrapMultiXact();
pfree(buffer);
/*
* Force control file to be read - in contrast to normal processing we'd
* otherwise never run the checks and GUC related initializations therein.
*/
ReadControlFile();
}
3. 存储位置及大小
[postgres@enmo global]$ ls -ltr $PGDATA/global/pg_control -rwx------. 1 postgres dba 8192 Sep 8 11:22 /u01/pg15/pgdata/global/pg_control [postgres@enmo global]$
4. 查询controlfile信息
### sql查询controlfile信息
postgres=# select * from pg_control_checkpoint();
postgres=# select * from pg_control_system();
postgres=# select * from pg_control_init();
postgres=# select * from pg_control_recovery();
### 命令行查询
[postgres@enmo global]$ pg_controldata
4. 1 pg_control_checkpoint信息
postgres=# select * from pg_control_checkpoint(); -[ RECORD 1 ]--------+------------------------- checkpoint_lsn | A/B64CDA80 redo_lsn | A/B64CDA80 redo_wal_file | 000000010000000A000000B6 timeline_id | 1 prev_timeline_id | 1 full_page_writes | t next_xid | 0:2686985 next_oid | 74487 next_multixact_id | 1 next_multi_offset | 0 oldest_xid | 716 oldest_xid_dbid | 1 oldest_active_xid | 0 oldest_multi_xid | 1 oldest_multi_dbid | 1 oldest_commit_ts_xid | 0 newest_commit_ts_xid | 0 checkpoint_time | 2022-09-21 08:57:28+08 postgres=#
4. 2 pg_control_system信息
postgres=# select * from pg_control_system(); -[ RECORD 1 ]------------+----------------------- pg_control_version | 1300 catalog_version_no | 202205131 system_identifier | 7109983015813941257 pg_control_last_modified | 2022-09-21 09:05:37+08 postgres=#
4.3 pg_control_init信息
postgres=# select * from pg_control_init(); -[ RECORD 1 ]--------------+--------- max_data_alignment | 8 database_block_size | 8192 blocks_per_segment | 131072 wal_block_size | 8192 bytes_per_wal_segment | 16777216 max_identifier_length | 64 max_index_columns | 32 max_toast_chunk_size | 1996 large_object_chunk_size | 2048 float8_pass_by_value | t data_page_checksum_version | 0 postgres=#
4.4 pg_control_recovery信息
postgres=# select * from pg_control_recovery(); -[ RECORD 1 ]-----------------+---- min_recovery_end_lsn | 0/0 min_recovery_end_timeline | 0 backup_start_lsn | 0/0 backup_end_lsn | 0/0 end_of_backup_record_required | f postgres=#
4.5 命令行工具 pg_controldata
[postgres@enmo global]$ pg_controldata pg_control version number: 1300 Catalog version number: 202205131 Database system identifier: 7109983015813941257 Database cluster state: in production pg_control last modified: Wed Sep 21 09:10:37 2022 Latest checkpoint location: A/B64CDB68 Latest checkpoint's REDO location: A/B64CDB30 Latest checkpoint's REDO WAL file: 000000010000000A000000B6 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:2686985 Latest checkpoint's NextOID: 74487 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 716 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 2686985 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Time of latest checkpoint: Wed Sep 21 09:10:37 2022 Fake LSN counter for unlogged rels: 0/3E8 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: replica wal_log_hints setting: off max_connections setting: 100 max_worker_processes setting: 8 max_wal_senders setting: 10 max_prepared_xacts setting: 0 max_locks_per_xact setting: 64 track_commit_timestamp setting: off Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float8 argument passing: by value Data page checksum version: 0 Mock authentication nonce: 5960f1c6b5e0e9336091dd378a7bd193e5af0df2a2e50be3f78bc93b99a7df7c [postgres@enmo global]$
5. 物理结构二进制解析
dd if=/u01/pg15/pgdata/global/pg_control bs=8192 skip=0 count=1|hexdump -C [postgres@enmo postgresql-15beta1]$ dd if=/u01/pg15/pgdata/global/pg_control bs=8192 skip=0 count=1|hexdump -C 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 3.0193e-05 s, 271 MB/s 00000000 09 88 a6 f0 e1 bb ab 62 14 05 00 00 cb 67 0d 0c |.......b.....g..| 00000010 06 00 00 00 00 00 00 00 8d 64 2a 63 00 00 00 00 |.........d*c....| 00000020 68 db 4c b6 0a 00 00 00 30 db 4c b6 0a 00 00 00 |h.L.....0.L.....| 00000030 01 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 |................| 00000040 09 00 29 00 00 00 00 00 f7 22 01 00 01 00 00 00 |..)......"......| 00000050 00 00 00 00 cc 02 00 00 01 00 00 00 01 00 00 00 |................| 00000060 01 00 00 00 00 00 00 00 8d 64 2a 63 00 00 00 00 |.........d*c....| 00000070 00 00 00 00 00 00 00 00 09 00 29 00 00 00 00 00 |..........).....| 00000080 e8 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000a0 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| 000000b0 00 00 00 00 64 00 00 00 08 00 00 00 0a 00 00 00 |....d...........| 000000c0 00 00 00 00 40 00 00 00 00 00 00 00 08 00 00 00 |....@...........| 000000d0 00 00 00 00 87 d6 32 41 00 20 00 00 00 00 02 00 |......2A. ......| 000000e0 00 20 00 00 00 00 00 01 40 00 00 00 20 00 00 00 |. ......@... ...| 000000f0 cc 07 00 00 00 08 00 00 01 00 00 00 00 00 00 00 |................| 00000100 59 60 f1 c6 b5 e0 e9 33 60 91 dd 37 8a 7b d1 93 |Y`.....3`..7.{..| 00000110 e5 af 0d f2 a2 e5 0b e3 f7 8b c9 3b 99 a7 df 7c |...........;...|| 00000120 b7 cf 75 74 00 00 00 00 00 00 00 00 00 00 00 00 |..ut............| 00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00002000 [postgres@enmo postgresql-15beta1]$ 该控制文件共使用0x123(291) bytes字节,其余部分使用0x00占位。
6. 物理结构源码解析
以前研究Oracle时需要猜对象的结构体,然后dump核对验证,pg是开源数据库则方便很多。
源码位置: ./src/include/catalog/pg_control.h
typedef struct ControlFileData
{
/*
* Unique system identifier --- to ensure we match up xlog files with the
* installation that produced them.
*/
uint64 system_identifier; <--09 88 a6 f0 e1 bb ab 62(逆序16字节) == Database system identifier: 7109983015813941257
/*
* Version identifier information. Keep these fields at the same offset,
* especially pg_control_version; they won't be real useful if they move
* around. (For historical reasons they must be 8 bytes into the file
* rather than immediately at the front.)
*
* pg_control_version identifies the format of pg_control itself.
* catalog_version_no identifies the format of the system catalogs.
*
* There are additional version identifiers in individual files; for
* example, WAL logs contain per-page magic numbers that can serve as
* version cues for the WAL log.
*/
uint32 pg_control_version; /* PG_CONTROL_VERSION */
uint32 catalog_version_no; /* see catversion.h */
/*
* System status data
*/
DBState state; /* see enum above */
pg_time_t time; /* time stamp of last pg_control update */
XLogRecPtr checkPoint; /* last check point record ptr */
CheckPoint checkPointCopy; /* copy of last check point record */
XLogRecPtr unloggedLSN; /* current fake LSN value, for unlogged rels */
/*
* These two values determine the minimum point we must recover up to
* before starting up:
*
* minRecoveryPoint is updated to the latest replayed LSN whenever we
* flush a data change during archive recovery. That guards against
* starting archive recovery, aborting it, and restarting with an earlier
* stop location. If we've already flushed data changes from WAL record X
* to disk, we mustn't start up until we reach X again. Zero when not
* doing archive recovery.
*
* backupStartPoint is the redo pointer of the backup start checkpoint, if
* we are recovering from an online backup and haven't reached the end of
* backup yet. It is reset to zero when the end of backup is reached, and
* we mustn't start up before that. A boolean would suffice otherwise, but
* we use the redo pointer as a cross-check when we see an end-of-backup
* record, to make sure the end-of-backup record corresponds the base
* backup we're recovering from.
*
* backupEndPoint is the backup end location, if we are recovering from an
* online backup which was taken from the standby and haven't reached the
* end of backup yet. It is initialized to the minimum recovery point in
* pg_control which was backed up last. It is reset to zero when the end
* of backup is reached, and we mustn't start up before that.
*
* If backupEndRequired is true, we know for sure that we're restoring
* from a backup, and must see a backup-end record before we can safely
* start up.
*/
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
XLogRecPtr backupStartPoint;
XLogRecPtr backupEndPoint;
bool backupEndRequired;
/*
* Parameter settings that determine if the WAL can be used for archival
* or hot standby.
*/
int wal_level;
bool wal_log_hints;
int MaxConnections;
int max_worker_processes;
int max_wal_senders;
int max_prepared_xacts;
int max_locks_per_xact;
bool track_commit_timestamp;
/*
* This data is used to check for hardware-architecture compatibility of
* the database and the backend executable. We need not check endianness
* explicitly, since the pg_control version will surely look wrong to a
* machine of different endianness, but we do need to worry about MAXALIGN
* and floating-point format. (Note: storage layout nominally also
* depends on SHORTALIGN and INTALIGN, but in practice these are the same
* on all architectures of interest.)
*
* Testing just one double value is not a very bulletproof test for
* floating-point compatibility, but it will catch most cases.
*/
uint32 maxAlign; /* alignment requirement for tuples */
double floatFormat; /* constant 1234567.0 */
#define FLOATFORMAT_VALUE 1234567.0
/*
* This data is used to make sure that configuration of this database is
* compatible with the backend executable.
*/
uint32 blcksz; /* data block size for this DB */
uint32 relseg_size; /* blocks per segment of large relation */
uint32 xlog_blcksz; /* block size within WAL files */
uint32 xlog_seg_size; /* size of each WAL segment */
uint32 nameDataLen; /* catalog name field width */
uint32 indexMaxKeys; /* max number of columns in an index */
uint32 toast_max_chunk_size; /* chunk size in TOAST tables */
uint32 loblksize; /* chunk size in pg_largeobject */
bool float8ByVal; /* float8, int8, etc pass-by-value? */
/* Are data pages protected by checksums? Zero if no checksum version */
uint32 data_checksum_version;
/*
* Random nonce, used in authentication requests that need to proceed
* based on values that are cluster-unique, like a SASL exchange that
* failed at an early stage.
*/
char mock_authentication_nonce[MOCK_AUTH_NONCE_LEN];
/* CRC of all above ... MUST BE LAST! */
pg_crc32c crc;
} ControlFileData;
7. 缺失controlfile启动pg
### 缺失controlfile启动
[postgres@enmo global]$ mv pg_control bak/
[postgres@enmo global]$ pg_ctl start
waiting for server to start....postgres: could not find the database system
Expected to find it in the directory "/u01/pg15/pgdata",
but could not open file "/u01/pg15/pgdata/global/pg_control": No such file or directory
stopped waiting
pg_ctl: could not start server
Examine the log output.
[postgres@enmo global]$
### 恢复controlfile启动
[postgres@enmo global]$ mv bak/pg_control .
[postgres@enmo global]$ pg_ctl start
waiting for server to start....2022-09-21 10:26:38.141 CST [8296] LOG: redirecting log output to logging collector process
2022-09-21 10:26:38.141 CST [8296] HINT: Future log output will appear in directory "log".
done
server started
[postgres@enmo global]$ ps -fu postgres
UID PID PPID C STIME TTY TIME CMD
postgres 6228 6227 0 Sep20 pts/1 00:00:00 -bash
postgres 8296 1 0 10:26 ? 00:00:00 /u01/pg15/pgsql/bin/postgres
postgres 8297 8296 0 10:26 ? 00:00:00 postgres: logger
postgres 8298 8296 0 10:26 ? 00:00:00 postgres: checkpointer
postgres 8299 8296 0 10:26 ? 00:00:00 postgres: background writer
postgres 8301 8296 0 10:26 ? 00:00:00 postgres: walwriter
postgres 8302 8296 0 10:26 ? 00:00:00 postgres: autovacuum launcher
postgres 8303 8296 0 10:26 ? 00:00:00 postgres: logical replication launcher
postgres 8305 6228 0 10:26 pts/1 00:00:00 ps -fu postgres
[postgres@enmo global]$
8. old version controlfile启动
postgres=# checkpoint;
CHECKPOINT
postgres=# checkpoint;
CHECKPOINT
postgres=# checkpoint;
CHECKPOINT
postgres=# exit
[postgres@enmo global]$ pg_ctl stop
waiting for server to shut down.... done
server stopped
[postgres@enmo global]$
[postgres@enmo global]$ mv pg_control bak/pg_control_new
[postgres@enmo global]$ mv bak/pg_control .
[postgres@enmo global]$ pg_ctl start
waiting for server to start....2022-09-21 10:34:22.136 CST [8333] LOG: redirecting log output to logging collector process
2022-09-21 10:34:22.136 CST [8333] HINT: Future log output will appear in directory "log".
done
server started
[postgres@enmo global]$ ps -fu postgres
UID PID PPID C STIME TTY TIME CMD
postgres 6228 6227 0 Sep20 pts/1 00:00:00 -bash
postgres 8333 1 0 10:34 ? 00:00:00 /u01/pg15/pgsql/bin/postgres
postgres 8334 8333 0 10:34 ? 00:00:00 postgres: logger
postgres 8335 8333 0 10:34 ? 00:00:00 postgres: checkpointer
postgres 8336 8333 0 10:34 ? 00:00:00 postgres: background writer
postgres 8338 8333 0 10:34 ? 00:00:00 postgres: walwriter
postgres 8339 8333 0 10:34 ? 00:00:00 postgres: autovacuum launcher
postgres 8340 8333 0 10:34 ? 00:00:00 postgres: logical replication launcher
postgres 8341 6228 0 10:34 pts/1 00:00:00 ps -fu postgres
[postgres@enmo global]$ pg_ctl status
pg_ctl: server is running (PID: 8333)
/u01/pg15/pgsql/bin/postgres
[postgres@enmo global]$
### 通过以上测试,说明pg没有对controlfile版本过旧的检查,使用old version controlfile可以正常启动pg数据库。
9. miss controlfile 影响
[postgres@enmo global]$ mv pg_control bak/
[postgres@enmo global]$ psql
psql (15beta1)
Type "help" for help.
postgres=# checkpoint;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> exit
[postgres@enmo global]$ pg_ctl status
pg_ctl: no server running
[postgres@enmo global]$
### log信息
2022-09-21 11:28:40.970 CST,,,8362,,632a7977.20aa,3,,2022-09-21 10:39:51 CST,,0,LOG,00000,"checkpoint starting: immediate force wait",,,,,,,,,"","checkpointer",,0
2022-09-21 11:28:40.971 CST,,,8362,,632a7977.20aa,4,,2022-09-21 10:39:51 CST,,0,LOG,00000,"checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB",,,,,,,,,"","checkpointer",,0
2022-09-21 11:29:08.352 CST,,,8362,,632a7977.20aa,5,,2022-09-21 10:39:51 CST,,0,LOG,00000,"checkpoint starting: immediate force wait",,,,,,,,,"","checkpointer",,0
2022-09-21 11:29:08.354 CST,,,8362,,632a7977.20aa,6,,2022-09-21 10:39:51 CST,,0,PANIC,58P01,"could not open file ""/u01/pg15/pgdata/global/pg_control"": No such file or directory",,,,,,,,,"","checkpointer",,0
2022-09-21 11:29:09.651 CST,,,8360,,632a7977.20a8,7,,2022-09-21 10:39:51 CST,,0,LOG,00000,"checkpointer process (PID 8362) was terminated by signal 6: Aborted",,,,,,,,,"","postmaster",,0
2022-09-21 11:29:09.651 CST,,,8360,,632a7977.20a8,8,,2022-09-21 10:39:51 CST,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,"","postmaster",,0
2022-09-21 11:29:09.652 CST,,,8360,,632a7977.20a8,9,,2022-09-21 10:39:51 CST,,0,LOG,00000,"all server processes terminated; reinitializing",,,,,,,,,"","postmaster",,0
2022-09-21 11:29:09.659 CST,,,8360,,632a7977.20a8,10,,2022-09-21 10:39:51 CST,,0,PANIC,58P01,"could not open file ""global/pg_control"": No such file or directory",,,,,,,,,"","postmaster",,0
### 如果controlfile损坏或缺失,pg数据库进行检查点时会terminated。
10. pg_resetwal重置controlfile
10.1 缺失controlfile启动报错
[postgres@enmo pgdata]$ pg_ctl start pg_ctl: another server might be running; trying to start server anyway waiting for server to start....postgres: could not find the database system Expected to find it in the directory "/u01/pg15/pgdata", but could not open file "/u01/pg15/pgdata/global/pg_control": No such file or directory stopped waiting pg_ctl: could not start server Examine the log output.
10.2 pg_resetwal工具介绍
[postgres@enmo pgdata]$ pg_resetwal --help pg_resetwal resets the PostgreSQL write-ahead log. Usage: pg_resetwal [OPTION]... DATADIR Options: -c, --commit-timestamp-ids=XID,XID set oldest and newest transactions bearing commit timestamp (zero means no change) [-D, --pgdata=]DATADIR data directory -e, --epoch=XIDEPOCH set next transaction ID epoch -f, --force force update to be done -l, --next-wal-file=WALFILE set minimum starting location for new WAL -m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID -n, --dry-run no update, just show what would be done -o, --next-oid=OID set next OID -O, --multixact-offset=OFFSET set next multitransaction offset -u, --oldest-transaction-id=XID set oldest transaction ID -V, --version output version information, then exit -x, --next-transaction-id=XID set next transaction ID --wal-segsize=SIZE size of WAL segments, in megabytes -?, --help show this help, then exit Report bugs to <pgsql-bugs@lists.postgresql.org>. PostgreSQL home page: <https://www.postgresql.org/>
10.3 postgresql 缺失controlfile恢复过程
参数确认: -l, --next-wal-file=WALFILE
参数算法:
ls -ltr $PGDATA/pg_wal/ <-- 最后一个wal日志+1如,
000000010000000000000001 +1 -> 000000010000000000000002
-l walfile
--next-wal-file=walfile
Manually set the WAL starting location by specifying the name of the next WAL segment file.
The name of next WAL segment file should be larger than any WAL segment file name currently existing in the directory pg_wal under the data directory. These names are also in hexadecimal and have three parts. The first part is the “timeline ID” and should usually be kept the same. For example, if 00000001000000320000004A is the largest entry in pg_wal, use -l 00000001000000320000004B or higher.
Note that when using nondefault WAL segment sizes, the numbers in the WAL file names are different from the LSNs that are reported by system functions and system views. This option takes a WAL file name, not an LSN.
Note
### step1. -l 参数确认
[postgres@enmo pgdata]$ ls -ltr pg_wal/
total 16384
drwx------ 2 postgres dba 6 Sep 21 15:45 archive_status
-rw------- 1 postgres dba 16777216 Sep 21 16:34 000000010000000000000001
[postgres@enmo pgdata]$
### 确定参数为
-l 000000010000000000000002
参数确认:-m, --multixact-ids=MXID1,MXID2
参数算法:
ls -ltr $PGDATA/pg_multixact/offsets/
result == (0000)?(0x20000,0x10000) : (max(result+1)*0x10000,min(result)*0x10000)
-m mxid,mxid
--multixact-ids=mxid,mxid
Manually set the next and oldest multitransaction ID.
A safe value for the next multitransaction ID (first part) can be determined by looking for the numerically largest file name in the directory pg_multixact/offsets under the data directory, adding one, and then multiplying by 65536 (0x10000). Conversely, a safe value for the oldest multitransaction ID (second part of -m) can be determined by looking for the numerically smallest file name in the same directory and multiplying by 65536. The file names are in hexadecimal, so the easiest way to do this is to specify the option value in hexadecimal and append four zeroes.
### step2. -m 参数确认
[postgres@enmo pgdata]$ ls -ltr pg_multixact/offsets/
total 8
-rw------- 1 postgres dba 8192 Sep 21 16:11 0000
[postgres@enmo pgdata]$
### 确定参数为
-m 0x10000,0x00000
### pg_resetwal: error: oldest multitransaction ID (-m) must not be 0
由于以上报错,所以将参数修改为:
-m 0x20000,0x10000
参数确认:-O, --multixact-offset=OFFSET
ls -ltr $PGDATA/pg_multixact/members/
参数算法: max(result+1)*0xCC80
-O mxoff
--multixact-offset=mxoff
Manually set the next multitransaction offset.
A safe value can be determined by looking for the numerically largest file name in the directory pg_multixact/members under the data directory, adding one, and then multiplying by 52352 (0xCC80). The file names are in hexadecimal. There is no simple recipe such as the ones for other options of appending zeroes.
### step3. -O 参数确认
[postgres@enmo pgdata]$ ls -ltr pg_multixact/members/
total 8
-rw------- 1 postgres dba 8192 Sep 21 15:45 0000
[postgres@enmo pgdata]$
### 确定参数为
-O 0xCC80
参数确认:-x, --next-transaction-id=XID
参数算法:
ls -ltr $PGDATA/pg_xact/
max(result+1)*0x100000
-x xid
--next-transaction-id=xid
Manually set the next transaction ID.
A safe value can be determined by looking for the numerically largest file name in the directory pg_xact under the data directory, adding one, and then multiplying by 1048576 (0x100000). Note that the file names are in hexadecimal. It is usually easiest to specify the option value in hexadecimal too. For example, if 0011 is the largest entry in pg_xact, -x 0x1200000 will work (five trailing zeroes provide the proper multiplier).
### step4. -x 参数确认
[postgres@enmo pgdata]$ ls -ltr pg_xact/
total 8
-rw------- 1 postgres dba 8192 Sep 21 16:11 0000
[postgres@enmo pgdata]$
### 确定参数为
-x 0x100000
恢复controlfile
### step 5. 清除postmaster.pid文件
rm -f $PGDATA/postmaster.pid
### step 6. 创建空controlfile文件
touch $PGDATA/global/pg_control
### stop5. 恢复controlfile
[postgres@enmo pgdata]$ pg_resetwal -l 000000010000000000000002 -m 0x20000,0x10000 -O 0xCC80 -x 0x100000 -f $PGDATA
pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it
Write-ahead log reset
[postgres@enmo pgdata]$
pg登陆测试
[postgres@enmo global]$ psql psql (15beta1) Type "help" for help. postgres=# select * from test; a1 | a2 ----+---- 1 | 2 1 | 2 (2 rows) postgres=#
11. strace跟踪控制文件更新
session 1:
strace -fittT -o /tmp/strace_checkpoint.out -p `ps -ef|grep checkpoint|grep -v grep|awk '{print $2}'`
session 2:
postgres=# checkpoint;
CHECKPOINT
postgres=#
session 1:
Ctrl+c
vim /tmp/strace_checkpoint.out
14720 17:56:30.102130 [00007ff14b796f30] open("/u01/pg15/pgdata/global/pg_control", O_RDWR) = 4 <0.000012>
14720 17:56:30.102165 [00007ff14b796730] write(4, "w\351\246\26\355\335*c\24\5\0\0\313g\r\f\6\0\0\0\0\0\0\0\316\337*c\0\0\0\0"..., 8192) = 8192 <0.000024>
14720 17:56:30.102212 [00007ff14ad9c574] fstat(4, {st_mode=S_IFREG|0644, st_size=8192, ...}) = 0 <0.000008>
14720 17:56:30.102242 [00007ff14b796924] fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) <0.000008>
14720 17:56:30.102285 [00007ff14b796db0] fsync(4) = 0 <0.000112>
14720 17:56:30.102439 [00007ff14b7967f0] close(4) = 0 <0.000010>
### 小结
在执行checkpoint时, postgres: checkpointer 进行更新 controlfile信息。
12. checkpoint 更新controlfile哪些信息?
12.1 测试脚本
checkpoint前后对比pg_controldata输出结果。 [postgres@enmo global]$ pg_controldata [postgres@enmo global]$ psql psql (15beta1) Type "help" for help. postgres=# checkpoint; CHECKPOINT postgres=# \q [postgres@enmo global]$ pg_controldata 通过测试修改字段为: pg_control last modified: Wed Sep 21 17:56:30 2022 Latest checkpoint location: 0/2024E18 Latest checkpoint's REDO location: 0/2024DE0 Time of latest checkpoint: Wed Sep 21 17:56:30 2022
12.2 测试结果
controlfile中其它字段也可能在特定条件下修改,待后续深入分析。
13. 参考资料
源码路径: ./src/backend/access/transam/xlog.c 源码路径: ./src/include/catalog/pg_control.h
最后修改时间:2022-10-31 08:51:20
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。