环境: openGauss1.0.0 一主一备
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Primary Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Normal
[omm@gsdb01 ~]$
模拟主库宕机,备库处于 Standby Need repair(Disconnected)
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_ctl stop -D /u01/openGauss/data/db1
[2020-07-15 09:18:53.064][21012][][gs_ctl]: gs_ctl stopped ,datadir is -D "/u01/openGauss/data/db1"
waiting for server to shut down........ done
server stopped
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Down Manually stopped | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Need repair(Disconnected)
[omm@gsdb01 ~]$
主库恢复后,gs_ctl start -D启动主库,默认是normal状态,非primary状态.
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_ctl start -D /u01/openGauss/data/db1
[2020-07-15 09:35:12.777][21111][][gs_ctl]: gs_ctl started,datadir is -D "/u01/openGauss/data/db1"
[2020-07-15 09:35:12.829][21111][][gs_ctl]: waiting for server to start...
.0 [BACKEND] LOG: Begin to start openGauss Database.
2020-07-15 09:35:12.928 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 DB001 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max
= 4, actual = 42020-07-15 09:35:12.928 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 DB001 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recov
ery_parallelism:4, max_recovery_parallelism:42020-07-15 09:35:12.928 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Transparent encryption disabled.
2020-07-15 09:35:12.944 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribut
e_mode: none inheritThreadPool: 0.2020-07-15 09:35:12.944 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory pr
otect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4213 Mbytes) is larger.2020-07-15 09:35:13.024 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
2020-07-15 09:35:13.053 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set metadata cache size(268435456)
2020-07-15 09:35:13.302 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/u01/openGauss/da
ta/db1/gaussdb.state.temp" success2020-07-15 09:35:13.302 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db
state(STARTING_STATE), server mode(Normal)2020-07-15 09:35:13.325 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 978, usable_fds = 1000,
already_open = 122020-07-15 09:35:13.326 5f0e5d50.1 [unknown] 139956031434496 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Success to start openGauss Database, p
lease press any key to exit...
[2020-07-15 09:35:13.843][21111][][gs_ctl]: done
[2020-07-15 09:35:13.843][21111][][gs_ctl]: server started (/u01/openGauss/data/db1)
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Normal Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Need repair(Disconnected)
[omm@gsdb01 ~]$
在备端查看集群状态:集群不可用Unavailable,主库是Normal状态, 备库Standby Need repair(Disconnected)
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Normal Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Need repair(Disconnected)
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$
pg_log日志:显示server_mode is NORMAL, could not accept HA connection
[BACKEND] LOG: Connecting to remote server :host=192.168.0.195 port=40001 localhost=192.168.0.96 localport=40001 dbname=replication replication=true fallback_application_name=dn_6002 connect_timeout=2 [BACKEND] FATAL: walreceiver could not connect to the remote server,the connection info :host=192.168.0.195 port=40001 localhost=192.168.0.96 localport=40001 : FATAL: the current t_thrd.postmaster_cxt.server_mode is NORMAL, could not accept HA connection. FATAL: the current t_thrd.postmaster_cxt.server_mode is NORMAL, could not accept HA connection.
原因:当主库以normal状态启动后,主库不是Primary角色,导致备库不能接收主备HA连接。
处理:停掉主库,重新以primary角色启动主库,此时主库状态正常,备库Standby Need repair(WAL),再build重建备库,集群正常。
[omm@gsdb01 ~]$ gs_ctl stop -D /u01/openGauss/data/db1
[2020-07-15 10:04:28.977][5509][][gs_ctl]: gs_ctl stopped ,datadir is -D "/u01/openGauss/data/db1"
waiting for server to shut down........ done
server stopped
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Down Manually stopped | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Need repair(Disconnected)
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_ctl start -D /u01/openGauss/data/db1 -M primary
[2020-07-15 10:04:49.145][5860][][gs_ctl]: gs_ctl started,datadir is -D "/u01/openGauss/data/db1"
[2020-07-15 10:04:49.197][5860][][gs_ctl]: waiting for server to start...
.0 [BACKEND] LOG: Begin to start openGauss Database.
2020-07-15 10:04:49.298 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 DB001 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max
= 4, actual = 42020-07-15 10:04:49.298 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 DB001 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recov
ery_parallelism:4, max_recovery_parallelism:42020-07-15 10:04:49.298 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Transparent encryption disabled.
2020-07-15 10:04:49.315 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribut
e_mode: none inheritThreadPool: 0.2020-07-15 10:04:49.315 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory pr
otect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4213 Mbytes) is larger.2020-07-15 10:04:49.396 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
2020-07-15 10:04:49.425 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set metadata cache size(268435456)
2020-07-15 10:04:49.679 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/u01/openGauss/da
ta/db1/gaussdb.state.temp" success2020-07-15 10:04:49.679 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db
state(STARTING_STATE), server mode(Primary)2020-07-15 10:04:49.702 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 978, usable_fds = 1000,
already_open = 122020-07-15 10:04:49.703 5f0e6441.1 [unknown] 140154620174080 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Success to start openGauss Database, p
lease press any key to exit...
[2020-07-15 10:04:50.210][5860][][gs_ctl]: done
[2020-07-15 10:04:50.210][5860][][gs_ctl]: server started (/u01/openGauss/data/db1)
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Primary Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Need repair(WAL)
[omm@gsdb01 ~]$
备端查看集群状态:备库处于Standby Need repair(WAL)
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Primary Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Need repair(WAL)
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$
build重建备库,集群状态恢复正常。
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gs_ctl build -D /u01/openGauss/data/db1
[2020-07-15 10:06:07.024][5550][][gs_ctl]: gs_ctl incremental build ,datadir is -D "/u01/openGauss/data/db1"
waiting for server to shut down.... done
server stopped
[2020-07-15 10:06:08.054][5550][dn_6001_6002][gs_rewind]: set gaussdb state file when rewind:db state(BUILDING_STATE), server mode(STANDBY_MODE), bu
ild mode(INC_BUILD).[2020-07-15 10:06:08.079][5550][dn_6001_6002][gs_rewind]: connected to server: host=192.168.0.195 port=40001 dbname=postgres application_name=gs_rew
ind connect_timeout=5 rw_timeout=10[2020-07-15 10:06:08.082][5550][dn_6001_6002][gs_rewind]: connect to primary success
[2020-07-15 10:06:08.082][5550][dn_6001_6002][gs_rewind]: get pg_control success
[2020-07-15 10:06:08.082][5550][dn_6001_6002][gs_rewind]: target server was interrupted in mode 2.
[2020-07-15 10:06:08.082][5550][dn_6001_6002][gs_rewind]: sanityChecks success
[2020-07-15 10:06:08.082][5550][dn_6001_6002][gs_rewind]: find last checkpoint at 0/321ADA0 on timeline 1 from control file
[2020-07-15 10:06:08.083][5550][dn_6001_6002][gs_rewind]: The source slot restart_lsn at WAL position 0/321AA08.
[2020-07-15 10:06:08.084][5550][dn_6001_6002][gs_rewind]: The target slot restart_lsn at WAL position 0/300C838.
[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: FindMaxLSN success find max lsn rec (0/321ADA0) success.
[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: servers diverged at WAL position 0/321AA08.
[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: the local diverge xlogfile is 000000010000000000000003, older xlog files will not be copie
d or removed.[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: find last common checkpoint at 0/321A980 on timeline 1, cooresponding redo point at 0/321A
900[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: find diverge point success
[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: read checkpoint redo (0/321A900) success before rewinding.
[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: rewinding from checkpoint redo point at 0/321A900 on timeline 1
[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: the CommonAncestor checkpoint xlogfile is 000000010000000000000003,older xlog files will n
ot copy[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: targetFileStatThread success pid 140402068141824.
[2020-07-15 10:06:08.086][5550][dn_6001_6002][gs_rewind]: reading source file list
[2020-07-15 10:06:08.093][5550][dn_6001_6002][gs_rewind]: targetFileStatThread return success.
[2020-07-15 10:06:08.101][5550][dn_6001_6002][gs_rewind]: reading target file list
[2020-07-15 10:06:08.102][5550][dn_6001_6002][gs_rewind]: traverse target datadir success
[2020-07-15 10:06:08.102][5550][dn_6001_6002][gs_rewind]: reading WAL in target
[2020-07-15 10:06:08.102][5550][dn_6001_6002][gs_rewind]: could not read WAL record at 0/321AE28: invalid record length at 0/321AE28: wanted 32, got
0[2020-07-15 10:06:08.103][5550][dn_6001_6002][gs_rewind]: calculate totals rewind success
[2020-07-15 10:06:08.103][5550][dn_6001_6002][gs_rewind]: need to copy 283MB (total source directory size is 348MB)
[2020-07-15 10:06:09.873][5550][dn_6001_6002][gs_rewind]: backup target files success
[2020-07-15 10:06:09.905][5550][dn_6001_6002][gs_rewind]: pg_xlog type 1.
[2020-07-15 10:06:09.905][5550][dn_6001_6002][gs_rewind]: remove file pg_tblspc/16407/PG_9.2_201611171_dn_6001_6002/pgsql_tmp, type 1
[2020-07-15 10:06:09.905][5550][dn_6001_6002][gs_rewind]: remove file global/pgstat.stat, type 0
[2020-07-15 10:06:09.905][5550][dn_6001_6002][gs_rewind]: remove file full_backup_label, type 0
[2020-07-15 10:06:09.905][5550][dn_6001_6002][gs_rewind]: remove file build_completed.done, type 0
[2020-07-15 10:06:09.908][5550][dn_6001_6002][gs_rewind]: receiving and unpacking files...
[2020-07-15 10:06:11.680][5550][dn_6001_6002][gs_rewind]: execute file map success
[2020-07-15 10:06:11.680][5550][dn_6001_6002][gs_rewind]: read checkpoint redo (0/321A900) success.
[2020-07-15 10:06:11.680][5550][dn_6001_6002][gs_rewind]: read checkpoint rec (0/321A980) success.
[2020-07-15 10:06:11.681][5550][dn_6001_6002][gs_rewind]: update pg_control file success
[2020-07-15 10:06:11.726][5550][dn_6001_6002][gs_rewind]: update pg_dw file success
[2020-07-15 10:06:11.726][5550][dn_6001_6002][gs_rewind]: creating backup label and updating control file
[2020-07-15 10:06:11.726][5550][dn_6001_6002][gs_rewind]: create backup label success
[2020-07-15 10:06:11.726][5550][dn_6001_6002][gs_rewind]: dn incremental build completed.
[2020-07-15 10:06:11.726][5550][dn_6001_6002][gs_rewind]: fetch MOT checkpoint
[2020-07-15 10:06:11.779][5550][dn_6001_6002][gs_ctl]: waiting for server to start...
.0 [BACKEND] LOG: Begin to start openGauss Database.
2020-07-15 10:06:11.880 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 DB001 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max
= 4, actual = 42020-07-15 10:06:11.880 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 DB001 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recov
ery_parallelism:4, max_recovery_parallelism:42020-07-15 10:06:11.880 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Transparent encryption disabled.
2020-07-15 10:06:11.898 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribut
e_mode: none inheritThreadPool: 0.2020-07-15 10:06:11.898 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory pr
otect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4213 Mbytes) is larger.2020-07-15 10:06:11.977 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
2020-07-15 10:06:12.007 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set metadata cache size(268435456)
2020-07-15 10:06:12.258 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/u01/openGauss/da
ta/db1/gaussdb.state.temp" success2020-07-15 10:06:12.259 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db
state(STARTING_STATE), server mode(Standby)2020-07-15 10:06:12.281 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 978, usable_fds = 1000,
already_open = 122020-07-15 10:06:12.283 5f0e6493.1 [unknown] 139835654893312 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Success to start openGauss Database, p
lease press any key to exit....
[2020-07-15 10:06:13.798][5550][dn_6001_6002][gs_ctl]: done
[2020-07-15 10:06:13.798][5550][dn_6001_6002][gs_ctl]: server started (/u01/openGauss/data/db1)
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Primary Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Normal
[omm@gsdb02 ~]$
最后修改时间:2020-07-16 09:16:22
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。