暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

磐维集中式数据库远程备份、恢复实战

原创 飞天 2025-04-13
174

1、背景

数据库备份恢复是数据安全的最后一道防线,直接影响企业生存能力。它不仅是一项技术措施,更是风险管理、合规运营和客户信任的基石。忽视备份等同于将企业置于“数据裸奔”的风险中,一旦灾难发生,后果可能是毁灭性的。生产系统中一定要做好数据库备份!!!
正好项目上同事问磐维集中式数据库怎么实现远程备份,趁周末有空在虚拟机里面做了实验,现在把整个步骤分享出来,希望对大家有所帮助!

2、环境说明

主机名 ip地址 OS版本 内存、CPU 安装软件 用途
node1 192.*.*.60 Centos7.9 4G 、 1个双核 PanWeiDB_V2.0-S3.1.0_B01数据库 数据库主节点
node2 192.*.*.62 Centos7.9 4G 、 1个双核 PanWeiDB_V2.0-S3.1.0_B01数据库 数据库备节点
node3 192.*.*.64 Centos7.9 4G 、 1个双核 PanWeiDB_V2.0-S3.1.0_B01数据库 数据库备节点
pubnode 192.*.*.100 Centos7.9 3G 、 1个双核 PanWeiDB_V2.0-S3.1.0_B01 工具包 备份机

3、数据库参数检查、设置

postgres=# show archive_mode; archive_mode -------------- on (1 row) # 如果archive_mode参数的值不是'on',登录主库用下面命令设置 # alter system set archive_mode=on; postgres=# show archive_dest; archive_dest -------------- /archive (1 row) # 如果archive_dest参数没设置,登录主库用下面命令设置 # alter system set archive_dest = '/archive' ; postgres=# show wal_sender_timeout ; wal_sender_timeout -------------------- 10s (1 row) #如果wal_sender_timeout参数的值不是10s,登录主库用下面命令设置 # alter system set wal_sender_timeout = '10s' ; postgres=# show session_timeout; session_timeout ----------------- 0 (1 row) # 如果session_timeout参数的值不是10s,登录主库用下面命令设置 # alter user omm set session_timeout to 0;
复制

4、远程备份实施过程

4.1、安装tools工具包(备份机上执行)

#root用户 mkdir /tools groupadd -g 1101 dbgrp useradd -g dbgrp -u 1101 -m omm echo "omm密码" | passwd --stdin omm chown -R omm:dbgrp /tools #omm用户 su - omm tar -zxf PanWeiDB_V2.0-S3.1.0_B01-tools-centos_7-x86_64.tar.gz -C /tools cat >>.bashrc<<EOF export PATH=/tools/bin:$PATH export LD_LIBRARY_PATH=/tools/lib:/tools/lib/postgresql:$LD_LIBRARY_PATH EOF
复制

4.2、创建备份目录、初始化备份路径(备份机上执行)

[root@pubnode ~]# mkdir /backup [root@pubnode ~]# chown -R omm:dbgrp /backup [root@pubnode ~]# su - omm Last login: Sat Apr 13 16:12:08 CST 2025 on pts/0 [omm@pubnode ~]$ gs_probackup init -B /backup/probackup INFO: Backup catalog '/backup/probackup' successfully inited # 创建备份日志目录 [omm@pubnode ~]$ mkdir /backup/probackup/log/ [omm@pubnode ~]$
复制

4.3、设置备份机和数据库主机之间的ssh免密(备份机上执行)

#备份机 [omm@pubnode ~]$ mkdir ~/.ssh [omm@pubnode ~]$ chmod 700 ~/.ssh [omm@pubnode ~]$ ssh-keygen -t rsa 将公钥文件上传至同集群所有节点(包括本节点),即可实现免密登录(此操作需输入密码): [omm@pubnode ~]$ ssh-copy-id omm@192.*.*.60 [omm@pubnode ~]$ ssh-copy-id omm@192.*.*.100
复制

4.4、创建备份实例(备份机上执行)

[omm@pubnode ~]$ gs_probackup add-instance -B /backup/probackup --instance=panweidb -D /data/panweidb/data --remote-host=192.*.*.60 --remote-user=omm --remote-port=22 --retention-redundancy=3 --retention-window=0 LOG: Start SSH client process, pid 3411 INFO: Instance 'panweidb' successfully inited
复制

4.5、备份数据库

注意:omm用户不允许远程连接,因此需要创建testusr用户进行备份,并设置白名单。

4.5.1、在数据库主机上创建备份用户、设置白名单:

#创建备份用户testusr [omm@node1 ~]$ gsql -r gsql ((PanWeiDB_V2.0-S3.1.0_B01) compiled at 2024-12-19 11:44:29 commit 5d08dc9 ) Non-SSL connection (SSL connection is recommended when requiring high-security) Type "help" for help. postgres=# create user testusr password '******'; CREATE ROLE postgres=# \q #设置白名单,允许备份机能连接上数据库 [omm@node1 ~]$ gs_guc reload -I all -N all -h "host all all 192.*.*.100/32 sha256"
复制

4.5.2、备份机上进行数据库备份

[omm@pubnode ~]$ gs_probackup backup --backup-path=/backup/probackup --instance=panweidb --backup-mode=full --delete-expired --remote-host=192.*.*.60 --remote-user=omm --remote-port=22 --pgdatabase=postgres -U testusr -W'******' -p 17700 --progress INFO: Backup start, pw_probackup version: 2.4.2, instance: panweidb, backup ID: SUNGJH, backup mode: FULL, wal mode: STREAM, remote: true, compress-algorithm: none, compress-level: 1 LOG: Backup destination is initialized ERROR: could not connect to database postgres: FATAL: Normal user is not allowed to use HA channel! WARNING: Backup SUNGJH is running, setting its status to ERROR
复制

备份报错:ERROR: could not connect to database postgres: FATAL: Normal user is not allowed to use HA channel!

4.5.3、授予备份用户sysadmin权限,并再次备份

#在数据库主机上执行 postgres=# alter user testusr sysadmin; ALTER ROLE postgres=# alter user testusr set session_timeout to 0; ALTER ROLE #在备份机上再次备份: [omm@pubnode ~]$ gs_probackup backup --backup-path=/backup/probackup --instance=panweidb --backup-mode=full --delete-expired --remote-host=192.*.*.60 --remote-user=omm --remote-port=22 --pgdatabase=postgres -U testusr -W'******' -p 17700 INFO: Backup start, pw_probackup version: 2.4.2, instance: panweidb, backup ID: SUNH9P, backup mode: FULL, wal mode: STREAM, remote: true, compress-algorithm: none, compress-level: 1 LOG: Backup destination is initialized LOG: This PanWeiDB instance was initialized with data block checksums. Data block corruption will be detected LOG: Start SSH client process, pid 7298 LOG: Database backup start ERROR: could not connect to database postgres: FATAL: no pg_hba.conf entry for replication connection from host "192.*.*.100", user "testusr", SSL on FATAL: no pg_hba.conf entry for replication connection from host "192.*.*.100", user "testusr", SSL off WARNING: backup in progress, stop backup INFO: wait for pg_stop_backup() INFO: pg_stop backup() successfully executed WARNING: Backup SUNH9P is running, setting its status to ERROR
复制

备份报错:FATAL: no pg_hba.conf entry for replication connection from host “192...100”, user “testusr”, SSL off

4.5.4、数据库主机上添加replication连接的白名单,再次备份

gs_guc reload -I all -N all -h "host replication all 192.*.*.100/32 sha256"
复制

再次备份:

[omm@pubnode wal]$ gs_probackup backup --backup-path=/backup/probackup --instance=panweidb --backup-mode=full --delete-expired --remote-host=192.*.*.60 --remote-user=omm --remote-port=22 --pgdatabase=postgres -U testusr -W'******' -p 17700 INFO: Backup start, pw_probackup version: 2.4.2, instance: panweidb, backup ID: SUNHQA, backup mode: FULL, wal mode: STREAM, remote: true, compress-algorithm: none, compress-level: 1 LOG: Backup destination is initialized LOG: This PanWeiDB instance was initialized with data block checksums. Data block corruption will be detected LOG: Start SSH client process, pid 7394 LOG: Database backup start LOG: started streaming WAL at 0/1F000000 (timeline 1) [2025-04-13 17:44:35]: check identify system success [2025-04-13 17:44:35]: send START_REPLICATION 0/1F000000 success [2025-04-13 17:44:35]: keepalive message is received LOG: SSH process 7394 is terminated with status 0 INFO: PGDATA size: 720MB INFO: Start transferring data files [2025-04-13 17:44:35]: keepalive message is received LOG: Start SSH client process, pid 7398 LOG: Creating page header map "/backup/probackup/backups/panweidb/SUNHQA/page_header_map" [2025-04-13 17:44:40]: keepalive message is received [2025-04-13 17:44:45]: keepalive message is received LOG: SSH process 7398 is terminated with status 0 INFO: Data files are transferred, time elapsed: 13s INFO: wait for pg_stop_backup() INFO: pg_stop backup() successfully executed LOG: stop_lsn: 0/1F000270 LOG: Looking for LSN 0/1F000270 in segment: 00000001000000000000001F LOG: Found WAL segment: /backup/probackup/backups/panweidb/SUNHQA/database/pg_xlog/00000001000000000000001F LOG: Thread [0]: Opening WAL segment "/backup/probackup/backups/panweidb/SUNHQA/database/pg_xlog/00000001000000000000001F" LOG: Found LSN: 0/1F000270 [2025-04-13 17:44:53]:(null): not renaming "/backup/probackup/backups/panweidb/SUNHQA/database/pg_xlog/000000010000000000000020", segment is not complete. LOG: finished streaming WAL at 0/200000B0 (timeline 1) LOG: Getting the Recovery Time from WAL LOG: Thread [0]: Opening WAL segment "/backup/probackup/backups/panweidb/SUNHQA/database/pg_xlog/00000001000000000000001F" INFO: Syncing backup files to disk INFO: Backup files are synced, time elapsed: 0 INFO: Validating backup SUNHQA INFO: Backup SUNHQA data files are valid INFO: Backup SUNHQA resident size: 752MB INFO: Backup SUNHQA completed LOG: REDUNDANCY=3 INFO: Evaluate backups by retention INFO: Backup SUNHQA, mode: FULL, status: OK. Redundancy: 1/3, Time Window: 0d/0d. Active INFO: Backup SUNH9P, mode: FULL, status: ERROR. Redundancy: 2/3, Time Window: 0d/0d. Active INFO: Backup SUNH75, mode: FULL, status: ERROR. Redundancy: 3/3, Time Window: 0d/0d. Expired LOG: Consider backup SUNH75 for purge INFO: Delete: SUNH751970-01-01 08:00:00+08 INFO: There are no backups to merge by retention policy INFO: Purging finished INFO: There is no WAL to purge by retention policy
复制

备份成功!

4.5.5、查看备份(备份机上执行)

[omm@pubnode ~]$ gs_probackup show -B /backup/probackup/ BACKUP INSTANCE 'panweidb' ========================================================================================================================================= Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Type Status ========================================================================================================================================= panweidb 9.2 SUNHU9 2025-04-13 17:47:10+08 FULL STREAM 1/0 18s 736MB 16MB 0.98 0/23000028 0/23000270 FILE OK panweidb 9.2 SUNHRE 2025-04-13 17:45:29+08 FULL STREAM 1/0 20s 736MB 16MB 0.98 0/21000028 0/21000270 FILE OK panweidb 9.2 SUNHQA 2025-04-13 17:44:48+08 FULL STREAM 1/0 19s 736MB 16MB 0.98 0/1F000028 0/1F000270 FILE OK
复制

5、远程恢复实施过程

5.1 数据库主机上,修改数据目录,模拟数据库异常

postgres=# show data_directory ; data_directory --------------------- /data/panweidb/data (1 row) [omm@node1 ~]$ cd /data/panweidb [omm@node1 panweidb]$ mv data data.old
复制

5.2 备份机上执行数据库恢复

[omm@pubnode ~]$ gs_probackup restore --backup-path=/backup/probackup --instance=panweidb --remote-host=192.*.*.60 --remote-user=omm --remote-port=22 LOG: Start SSH client process, pid 8035 LOG: Restore begin. LOG: there is no file tablespace_map LOG: check tablespace directories of backup SUNJ1F LOG: check external directories of backup SUNJ1F INFO: Validating backup SUNJ1F INFO: Backup SUNJ1F data files are valid LOG: Thread [1]: Opening WAL segment "/backup/probackup/backups/panweidb/SUNJ1F/database/pg_xlog/000000010000000000000024" INFO: Backup SUNJ1F WAL segments are valid INFO: Backup SUNJ1F is valid. INFO: Restoring the database from backup at 2025-04-13 18:12:51+08 LOG: there is no file tablespace_map LOG: Restore directories and symlinks... in /data/panweidb/data LOG: SSH process 8035 is terminated with status 0 INFO: Start restoring backup files. DATA size: 744MB LOG: Start thread 1 LOG: Start SSH client process, pid 8039 LOG: SSH process 8039 is terminated with status 0 INFO: Backup files are restored. Transfered bytes: 744MB, time elapsed: 12s INFO: Restore incremental ratio (less is better): 100% (744MB/744MB) INFO: Syncing restored files to disk LOG: Start SSH client process, pid 8040 INFO: Restored backup files are synced, time elapsed: 1s LOG: remote execute command [gs_guc set -D /data/panweidb/data -c "pw_wal_directory='pg_xlog'" &> /dev/null] success LOG: SSH process 8040 is terminated with status 0 INFO: Restore of backup SUNJ1F completed.
复制

由于本地虚拟机里面没有数据写入,所以恢复也比较简单。如果生产环境中有实时业务数据写入,PITR恢复请参考下面的文章:
【干货】磐维数据库物理备份&恢复实战

5.3 数据库主机上检查集群状态

[omm@node1 panweidb]$ gs_om -t status --detail [ CMServer State ] node node_ip instance state ----------------------------------------------------------------- 1 node1 192.*.*.60 1 /data/panweidb/cm/cm_server Primary 2 node2 192.*.*.62 2 /data/panweidb/cm/cm_server Standby 3 node3 192.*.*.64 3 /data/panweidb/cm/cm_server Standby [ Cluster State ] cluster_state : Normal redistributing : No balanced : No current_az : AZ_ALL [ Datanode State ] node node_ip instance state ------------------------------------------------------------------- 1 node1 192.*.*.60 6001 /data/panweidb/data P Standby Normal 2 node2 192.*.*.62 6002 /data/panweidb/data S Primary Normal 3 node3 192.*.*.64 6003 /data/panweidb/data S Standby Normal
复制

数据库集群恢复正常。

6、配置定时任务(备份机上执行)

配置omm用户的crontab定时任务:

  1. 每三天00:30:00执行数据库全备份,备份成功后删除过期全备份。
  2. 每两小时删除十天前的归档。
  3. 每天0:10:00 删除ERROR状态的备份

crontab -e

30 00 */3 * * source /home/omm/.bashrc;gs_probackup backup --backup-path=/backup/probackup --instance=panweidb --backup-mode=full --delete-expired --remote-host=192.*.*.60 --remote-user=omm --remote-port=22 --pgdatabase=postgres -U testusr -W'******' -p 17700 > /backup/probackup/log/backup_week$(date +\%u).log 2>&1 & 0 */2 * * * sh /home/omm/move_and_clear_archive.sh 9 10 00 * * * source /home/omm/.bashrc; gs_probackup delete -B /backup/probackup/ --status=ERROR --instance=panweidb > /dev/null 2>&1 &
复制

cat /home/omm/move_and_clear_archive.sh:

#!/bin/bash fnum=$1 echo "begin to clear archive log..." find /archive -mtime +${fnum} -type f -name "0000*" | xargs rm -f echo "archive files clean successfully"
复制

7、总结

远程备份整体步骤总结:

  1. 数据库主机上检查、确认数据库参数设置
  2. 备份机上安装数据库工具包
  3. 备份机上创建备份目录、初始化备份路径
  4. 备份机上设置备份机和数据库主机之间的ssh免密
  5. 备份机上创建备份实例
  6. 数据库主机上设置白名单(设置远程连接以及replication连接)、创建远程备份用户(需要授予sysadmin权限、session_timeout)
  7. 备份机上执行远程备份
  8. 备份机上配置定时备份任务

远程恢复步骤总结:在本地恢复命令的基础上加了remote相关的参数。

生产环境中一定要做数据库备份,把数据库备份结果纳入监控平台,并定期做数据库恢复验证工作,防患于未然。

关于作者:
网名:飞天,墨天轮2024年度优秀原创作者,拥有 Oracle 10g OCM 认证、PGCE认证以及OBCA、KCP、KCSM、ACP、磐维等众多国产数据库认证证书,目前从事Oracle、Mysql、PostgresSQL、磐维数据库管理运维工作,喜欢结交更多志同道合的朋友,热衷于研究、分享数据库技术。
微信公众号:飞天online
墨天轮:https://www.modb.pro/u/15197
如有任何疑问,欢迎大家留言,共同探讨~~~

最后修改时间:2025-04-14 15:50:37
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

TA的专栏
磐维数据库
收录50篇内容
MySql数据库
收录24篇内容
KINGBASE数据库
收录7篇内容
目录
  • 1、背景
  • 2、环境说明
  • 3、数据库参数检查、设置
  • 4、远程备份实施过程
  • 4.1、安装tools工具包(备份机上执行)
  • 4.2、创建备份目录、初始化备份路径(备份机上执行)
  • 4.3、设置备份机和数据库主机之间的ssh免密(备份机上执行)
  • 4.4、创建备份实例(备份机上执行)
  • 4.5、备份数据库
    • 4.5.1、在数据库主机上创建备份用户、设置白名单:
    • 4.5.2、备份机上进行数据库备份
    • 4.5.3、授予备份用户sysadmin权限,并再次备份
    • 4.5.4、数据库主机上添加replication连接的白名单,再次备份
    • 4.5.5、查看备份(备份机上执行)
  • 5、远程恢复实施过程
    • 5.1 数据库主机上,修改数据目录,模拟数据库异常
    • 5.2 备份机上执行数据库恢复
    • 5.3 数据库主机上检查集群状态
  • 6、配置定时任务(备份机上执行)
  • 7、总结