本文主要讲述磐维数据库的物理备份(全量备份、增量备份)、全量恢复、增量恢复以及pitr恢复数据库集群的过程。
关于数据库全备份和增量备份部署详情请参考:磐维数据库物理备份(全量+增量)部署
环境说明
主机名 | ip地址 | OS版本 | 内存、CPU | 节点角色 | 数据库端口 | 数据目录 |
---|---|---|---|---|---|---|
node1 | 192.*.*.60 | Centos7.9 | 4G 、 1个双核 | 主节点 | 17700 | /data/panweidb/data |
node2 | 192.*.*.62 | Centos7.9 | 4G 、 1个双核 | 从节点 | 17700 | /data/panweidb/data |
node3 | 192.*.*.64 | Centos7.9 | 4G 、 1个双核 | 从节点 | 17700 | /data/panweidb/data |
构造测试表testtab
#创建测试表testtab并插入10000条记录
panweidb=# create table testtab(id int,rq timestamp without time zone);
CREATE TABLE
panweidb=# insert into testtab (select generate_series(1,10000),now());
INSERT 0 10000
panweidb=# select count(*) from testtab;
count
-------
10000
(1 row)
panweidb=# select max(rq) from testtab;
max
----------------------------
2025-01-08 19:32:10.313036
(1 row)
复制
全量+增量备份
全量备份
[omm@node1 panweidb]$ gs_probackup backup --backup-path=/backup/probackup --instance=panweidb --backup-mode=full --delete-expired --pgdatabase=postgres > /backup/probackup/log/backup_week$(date +\%u).log 2>&1 &
复制
检查备份集
[omm@node1 ~]$ gs_probackup show --backup-path=/backup/probackup --instance=panweidb ======================================================================================================================================= Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Type Status ======================================================================================================================================= panweidb 9.2 SPRPNQ 2025-01-08 19:38:25+08 FULL STREAM 1/0 16s 774MB 16MB 0.98 0/F000028 0/F000270 FILE OK <----全量备份
复制
向testtab表中再插入99条数据
panweidb=# insert into testtab (select generate_series(101,199),now());
INSERT 0 99
panweidb=# select count(*) from testtab;
count
-------
10099 <----全量备份后,表中插入99条数据,此时数据量为10099
(1 row)
复制
第一次增量备份
[omm@node1 panweidb]$ gs_probackup backup --backup-path=/backup/probackup --instance=panweidb --backup-mode=PTRACK --delete-expired --pgdatabase=postgres> /backup/probackup/log/backup_week$(date +\%u).log 2>&1 &
复制
检查备份集
[omm@node1 panweidb]$ gs_probackup show --backup-path=/backup/probackup --instance=panweidb =========================================================================================================================================== Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Type Status =========================================================================================================================================== panweidb 9.2 SPRPQ3 2025-01-08 19:39:41+08 PTRACK STREAM 1/1 5s 273MB 16MB 0.94 0/11000028 0/11000270 FILE OK <----第一次增量备份 panweidb 9.2 SPRPNQ 2025-01-08 19:38:25+08 FULL STREAM 1/0 16s 774MB 16MB 0.98 0/F000028 0/F000270 FILE OK <----全量备份
复制
向testtab表中再插入200条数据
panweidb=# insert into testtab (select generate_series(200,399),now());
INSERT 0 200
panweidb=# select count(*) from testtab;
count
-------
10299 <----第一次增量备份后,表中插入200条数据,此时数据量为10299
(1 row)
复制
第二次增量备份
[omm@node1 panweidb]$ gs_probackup backup --backup-path=/backup/probackup --instance=panweidb --backup-mode=PTRACK --delete-expired --pgdatabase=postgres> /backup/probackup/log/backup_week$(date +\%u).log 2>&1 &
复制
检查备份集
[omm@node1 panweidb]$ gs_probackup show --backup-path=/backup/probackup --instance=panweidb =========================================================================================================================================== Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Type Status =========================================================================================================================================== panweidb 9.2 SPRPTY 2025-01-08 19:42:00+08 PTRACK STREAM 1/1 6s 273MB 16MB 0.94 0/13000028 0/13000270 FILE OK <----第二次增量备份 panweidb 9.2 SPRPQ3 2025-01-08 19:39:41+08 PTRACK STREAM 1/1 5s 273MB 16MB 0.94 0/11000028 0/11000270 FILE OK <----第一次增量备份 panweidb 9.2 SPRPNQ 2025-01-08 19:38:25+08 FULL STREAM 1/0 16s 774MB 16MB 0.98 0/F000028 0/F000270 FILE OK <----全量备份
复制
向testtab表中再插入300条数据
panweidb=# insert into testtab (select generate_series(400,699),now());
INSERT 0 300
panweidb=# select max(rq) from testtab;
max
----------------------------
2025-01-08 19:43:13.318706
(1 row)
panweidb=# select count(*) from testtab;
count
-------
10599 <----第二次增量备份后,表中插入300条数据,此时数据量为10599
(1 row)
复制
模拟数据丢失
panweidb=# drop table testtab;
DROP TABLE
复制
使用全量备份+增量备份恢复主节点数据
场景一、使用全量备份+第一次增量备份恢复数据
预期目标:testtab中的数据恢复到总数据量为10099的状态。
1、 停止集群
[omm@node1 panweidb]$ cm_ctl stop cm_ctl: stop cluster. cm_ctl: stop nodeid: 1 cm_ctl: stop nodeid: 2 cm_ctl: stop nodeid: 3 .......... cm_ctl: stop cluster successfully.
复制
2、恢复数据
#使用gs_probackup restore恢复数据库的时候,需保证数据目录是空的。
[omm@node1 panweidb]$ mv /data/panweidb/data /data/panweidb/data.bak.`date +"%Y%m%d-%H%M%S"`
[omm@node1 ~]$ ls -ltrd /data/panweidb/data*
drwx------ 22 omm dbgroup 4096 Jan 8 19:44 /data/panweidb/data.bak.20250108-194459
#使用全量+第一次增量备份恢复数据库
[omm@node1 data]$ gs_probackup restore -B /backup/probackup --instance=panweidb -i SPRPQ3 -D /data/panweidb/data
LOG: Restore begin.
LOG: there is no file tablespace_map
LOG: check tablespace directories of backup SPRPQ3
LOG: check external directories of backup SPRPQ3
INFO: Validating parents for backup SPRPQ3
INFO: Validating backup SPRPNQ
INFO: Backup SPRPNQ data files are valid <----验证全量备份有效性
INFO: Validating backup SPRPQ3
INFO: Backup SPRPQ3 data files are valid <----验证第一次增量备份有效性
LOG: Thread [1]: Opening WAL segment "/backup/probackup/backups/panweidb/SPRPQ3/database/pg_xlog/000000010000000000000011"
INFO: Backup SPRPQ3 WAL segments are valid
INFO: Backup SPRPQ3 is valid.
INFO: Restoring the database from backup at 2025-01-08 19:39:39+08
LOG: there is no file tablespace_map
LOG: Restore directories and symlinks... in /data/panweidb/data
INFO: Start restoring backup files. DATA size: 774MB
LOG: Start thread 1
INFO: Backup files are restored. Transfered bytes: 790MB, time elapsed: 1s
INFO: Restore incremental ratio (less is better): 102% (790MB/774MB)
INFO: Syncing restored files to disk
INFO: Restored backup files are synced, time elapsed: 0
The pw_guc run with the following arguments: [gs_guc -D /data/panweidb/data -c pw_wal_directory='pg_xlog' set ].
expected instance path: [/data/panweidb/data/postgresql.conf]
gs_guc set: pw_wal_directory='pg_xlog': [/data/panweidb/data/postgresql.conf]
Total instances: 1. Failed instances: 0.
Success to perform gs_guc!
INFO: Restore of backup SPRPQ3 completed.
复制
3、启动数据库,验证数据
[omm@node1 data]$ gs_ctl start -D /data/panweidb/data -M primary
[omm@node1 data]$ gsql -d panweidb -p 17700 -c "select count(*) from testtab;"
count
-------
10099
(1 row)
复制
结论:恢复后testtab表中的数据量和预期的数据量一致,数据验证通过。
场景二、使用全量备份+第一次、第二次增量备份恢复数据
预期目标:testtab中的数据恢复到总数据量为10299的状态。
1、 停止数据库
[omm@node1 panweidb]$ gs_ctl stop .......... cm_ctl: stop cluster successfully.
复制
2、恢复数据
#使用gs_probackup restore恢复数据库的时候,需保证数据目录是空的。
[omm@node1 panweidb]$ mv /data/panweidb/data /data/panweidb/data.bak.`date +"%Y%m%d-%H%M%S"`
[omm@node1 panweidb]$ ls -ltrd /data/panweidb/data*
drwx------ 22 omm dbgroup 4096 Jan 8 19:44 /data/panweidb/data.bak.20250108-194459
drwx------ 22 omm dbgroup 4096 Jan 8 19:48 /data/panweidb/data.bak.20250108-194845
#使用全量+第一次、第二次增量备份恢复数据库
[omm@node1 panweidb]$ gs_probackup restore -B /backup/probackup --instance=panweidb -i SPRPTY -D /data/panweidb/data
LOG: Restore begin.
LOG: there is no file tablespace_map
LOG: check tablespace directories of backup SPRPTY
LOG: check external directories of backup SPRPTY
INFO: Validating parents for backup SPRPTY
INFO: Validating backup SPRPNQ <----验证全量备份有效性
INFO: Backup SPRPNQ data files are valid
INFO: Validating backup SPRPQ3 <----验证第一次增量备份有效性
INFO: Backup SPRPQ3 data files are valid
INFO: Validating backup SPRPTY <----验证第二次增量备份有效性
INFO: Backup SPRPTY data files are valid
LOG: Thread [1]: Opening WAL segment "/backup/probackup/backups/panweidb/SPRPTY/database/pg_xlog/000000010000000000000013"
INFO: Backup SPRPTY WAL segments are valid
INFO: Backup SPRPTY is valid.
INFO: Restoring the database from backup at 2025-01-08 19:41:58+08
LOG: there is no file tablespace_map
LOG: Restore directories and symlinks... in /data/panweidb/data
INFO: Start restoring backup files. DATA size: 774MB
LOG: Start thread 1
INFO: Backup files are restored. Transfered bytes: 790MB, time elapsed: 1s
INFO: Restore incremental ratio (less is better): 102% (790MB/774MB)
INFO: Syncing restored files to disk
INFO: Restored backup files are synced, time elapsed: 1s
The pw_guc run with the following arguments: [gs_guc -D /data/panweidb/data -c pw_wal_directory='pg_xlog' set ].
expected instance path: [/data/panweidb/data/postgresql.conf]
gs_guc set: pw_wal_directory='pg_xlog': [/data/panweidb/data/postgresql.conf]
Total instances: 1. Failed instances: 0.
Success to perform gs_guc!
INFO: Restore of backup SPRPTY completed.
复制
3、启动数据库,验证数据
[omm@node1 data]$ gs_ctl start -D /data/panweidb/data -M primary
[omm@node1 data]$ gsql -d panweidb -p 17700 -c "select count(*) from testtab;"
count
-------
10299
(1 row)
复制
结论:恢复后testtab表中的数据量和预期的数据量一致,数据验证通过。
场景三、使用PITR恢复数据(恢复到drop table之前的时间点)
PITR 原理
PITR 恢复过程中,会在全备份+增量备份的基础上重放归档日志中的 WAL 数据,直到恢复到用户想要的位置为止。用户可以在启动数据库前,通过编辑恢复控制文件recovery.conf 中的命令来指定恢复的目标位置,或者在使用 gs_probackup的 restore 命令时指定恢复目标相关参数。pitr恢复支持recovery_target_name、
recovery_target_time、recovery_target_xid、recovery_target_lsn四种配置项,但是同时只能使用一种配置项。本文使用的是recovery_target_time配置项。
预期目标:testtab中的数据恢复到总数据量为10599的状态。
1、 停止数据库
[omm@node1 panweidb]$ gs_ctl stop .......... cm_ctl: stop cluster successfully.
复制
2、恢复数据
#使用gs_probackup restore恢复数据库的时候,需保证数据目录是空的。
[omm@node1 panweidb]$ mv /data/panweidb/data /data/panweidb/data.bak.`date +"%Y%m%d-%H%M%S"`
[omm@node1 data]$ ls -ltrd /data/panweidb/data*
drwx------ 22 omm dbgroup 4096 Jan 8 19:44 /data/panweidb/data.bak.20250108-194459
drwx------ 22 omm dbgroup 4096 Jan 8 19:48 /data/panweidb/data.bak.20250108-194845
drwx------ 22 omm dbgroup 4096 Jan 8 19:51 /data/panweidb/data.bak.20250108-195200
#使用全量+第一次、第二次增量备份+归档+当前xlog日志恢复数据库
[omm@node1 data]$ gs_probackup restore -B /backup/probackup --instance=panweidb -i SPRPTY -D /data/panweidb/data
LOG: Restore begin.
LOG: there is no file tablespace_map
LOG: check tablespace directories of backup SPRPTY
LOG: check external directories of backup SPRPTY
INFO: Validating parents for backup SPRPTY
INFO: Validating backup SPRPNQ
INFO: Backup SPRPNQ data files are valid
INFO: Validating backup SPRPQ3
INFO: Backup SPRPQ3 data files are valid
INFO: Validating backup SPRPTY
INFO: Backup SPRPTY data files are valid
LOG: Thread [1]: Opening WAL segment "/backup/probackup/backups/panweidb/SPRPTY/database/pg_xlog/000000010000000000000013"
INFO: Backup SPRPTY WAL segments are valid
INFO: Backup SPRPTY is valid.
INFO: Restoring the database from backup at 2025-01-08 19:41:58+08
LOG: there is no file tablespace_map
LOG: Restore directories and symlinks... in /data/panweidb/data
INFO: Start restoring backup files. DATA size: 774MB
LOG: Start thread 1
INFO: Backup files are restored. Transfered bytes: 790MB, time elapsed: 1s
INFO: Restore incremental ratio (less is better): 102% (790MB/774MB)
INFO: Syncing restored files to disk
INFO: Restored backup files are synced, time elapsed: 0
The pw_guc run with the following arguments: [gs_guc -D /data/panweidb/data -c pw_wal_directory='pg_xlog' set ].
expected instance path: [/data/panweidb/data/postgresql.conf]
gs_guc set: pw_wal_directory='pg_xlog': [/data/panweidb/data/postgresql.conf]
Total instances: 1. Failed instances: 0.
Success to perform gs_guc!
INFO: Restore of backup SPRPTY completed.
复制
3、配置pitr配置文件recovery.conf
cat >> /data/panweidb/data/recovery.conf << EOF
recovery_target_time = '2025-01-08 19:43:14' #注意:recovery_target_time不能写成2025-01-08 19:43:13,因为19:43:13的时候表中还没有新插入的300条数据
restore_command = 'cp /archive/%f %p'
EOF
复制
4、拷贝drop table之前最新的xlog日志到当前数据目录的pg_xlog下
注意:这一步必须做,因为drop table之前最新的xlog日志没有被归档到/archive目录下。
cp /data/panweidb/data.bak.20250108-194459/pg_xlog/000000010000000000000014 /data/panweidb/data/pg_xlog/
复制
5、启动数据库,验证数据是否回到最新的pitr时间
[omm@node1 data]$ gs_ctl start -D /data/panweidb/data -M primary
#检查数据库是否处于恢复状态
[omm@node1 data]$ gsql -d panweidb -p 17700 -c "SELECT pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
此时,数据目录中的recovery.conf文件已经被自动重命名为recovery.done。
#停止恢复,对外提供读写服务
[omm@node1 data]$ gsql -d panweidb -p 17700 -c "select pg_xlog_replay_resume();"
pg_xlog_replay_resume
-----------------------
(1 row)
[omm@node1 data]$ gsql -d panweidb -p 17700 -c "select count(*) from testtab;"
count
-------
10599
(1 row)
复制
结论:pitr恢复后testtab表中的数据量和预期的数据量一致,数据验证通过。
以上场景都是在主节点上执行的恢复。
恢复整个集群
在主节点恢复成功后,停止主节点上启动的单实例,在拉起整个集群的过程中会重建备节点实例,完成备库的恢复。
#停止主节点启动的单实例
[omm@node1 panweidb]$ gs_ctl stop
[2025-01-09 00:08:02.715][54978][][gs_ctl]: gs_ctl stopped ,datadir is /data/panweidb/data
waiting for server to shut down...... done
server stopped
#启动数据库集群
[omm@node1 panweidb]$ cm_ctl start
cm_ctl: checking cluster status.
cm_ctl: checking cluster status.
cm_ctl: checking finished in 777 ms.
cm_ctl: start cluster.
cm_ctl: start nodeid: 1
cm_ctl: start nodeid: 2
cm_ctl: start nodeid: 3
...............................................
cm_ctl: start cluster successfully.
#检查集群状态
[omm@node1 panweidb]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
-----------------------------------------------------------------
1 node1 192.*.*.60 1 /data/panweidb/cm/cm_server Primary
2 node2 192.*.*.62 2 /data/panweidb/cm/cm_server Standby
3 node3 192.*.*.64 3 /data/panweidb/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
-------------------------------------------------------------------
1 node1 192.*.*.60 6001 /data/panweidb/data P Primary Normal
2 node2 192.*.*.62 6002 /data/panweidb/data S Standby Normal
3 node3 192.*.*.64 6003 /data/panweidb/data S Standby Normal
复制
至此,整个数据库集群恢复成功。
附录
gs_probackup backup命令语法
gs_probackup backup -B backup-path --instance=instance_name -b backup-mode [-D pgdata-path] [-C] [-S slot-name] [--temp-slot] [-s session-timeout] [--backup-pg-log] [-j threads_num] [--progress] [--no-validate] [--skip-block-validation] [-E external-directories-paths] [--no-sync] [--note=text] [--archive-timeout=timeout] [--log-level-console=log-level-console] [--log-level-file=log-level-file] [--log-filename=log-filename] [--error-log-filename=error-log-filename] [--log-directory=log-directory] [--log-rotation-size=log-rotation-size] [--log-rotation-age=log-rotation-age] [--delete-expired] [--delete-wal] [--merge-expired] [--retention-redundancy=retention-redundancy] [--retention-window=retention-window] [--wal-depth=wal-depth] [--dry-run] [--compress-algorithm=compress-algorithm] [--compress-level=compress-level] [--compress] [-d dbname] [-h host] [-p port] [-U username] [-w] [-W password] [-t rwtimeout] [--remote-proto=protocol] [--remote-host=destination] [--remote-path=path] [--remote-user=username] [--remote-port=port] [--ssh-options=ssh_options] [--remote-libpath=libpath] [--enable-dss] [--instance-id=instance_id] [--vgname="vgdata,vglog"] [--socketpath=socketpath] [--ttl=interval] [--expire-time=time] [--backup-pg-replslot] [--do-enc] [--key-idx [--help]
复制
gs_probackup restore命令语法
gs_probackup restore -B backup-path --instance=instance_name
[-D pgdata-path] [-i backup-id] [-j threads_num] [--progress]
[--force] [--no-sync] [--no-validate] [--skip-block-validation]
[--external-mapping=OLDDIR=NEWDIR] [-T OLDDIR=NEWDIR]
[--skip-external-dirs] [-I incremental_mode]
[--recovery-target-time=time|--recovery-target-xid=xid
|--recovery-target-lsn=lsn|--recovery-target-name=target-name]
[--recovery-target-inclusive=boolean]
[--recovery-target=latest]
[--remote-proto=protocol] [--remote-host=destination]
[--remote-path=path] [--remote-user=username]
[--remote-port=port] [--ssh-options=ssh_options]
[--remote-libpath=libpath]
[--enable-dss] [--instance-id=instance_id]
[--vgname="vgdata,vglog"] [--socketpath=socketpath]
[--log-level-console=log-level-console]
[--log-level-file=log-level-file]
[--log-filename=log-filename]
[--error-log-filename=error-log-filename]
[--log-directory=log-directory]
[--log-rotation-size=log-rotation-size]
[--log-rotation-age=log-rotation-age]
[--archive-host=hostname] [--archive-path=path]
[--archive-port=port] [--archive-user=username]
[--force-overwrite]
[-K wal-directory]
[--do-enc]
[--key-idx
[--help]
复制
gs_probackup show命令语法
gs_probackup show -B backup-path [--instance=instance_name [-i backup-id]] [--archive] [--format=plain|json] [--do-enc] [--key-idx [--help]
复制
总结
数据库备份是保证数据安全和业务连续性的关键措施,定期进行数据库备份恢复演练,以确保备份文件的有效性。
关于作者:
专注于Oracle、MySQL、PG、OpenGauss和国产数据库的研究,热爱生活,热衷于分享数据库技术。
微信公众号:飞天online
墨天轮:https://www.modb.pro/u/15197
如有任何疑问,欢迎大家留言,共同进步~~~