0. 概述
主要检查一下三方面:
- 容灾状态
- 集群状态
- 流复制状态
1. 常用指令
gs_sdr -t query gs_om -t status --detail gs_ctl query select * from pg_stat_replication; select * from pg_replication_slots; cm_ctl query -Cvidp gs_ctl status
复制
2. 未建立容灾关系时
- hadr_cluster_stat 为 normal:
[omm@panweidb01 ~]$ gs_sdr -t query -------------------------------------------------------------------------------- Streaming disaster recovery query cdf29f001da511efbb2f000c29a96163 -------------------------------------------------------------------------------- Start streaming disaster query. Start check archive. Start check recovery. Start check RPO & RTO. Successfully executed streaming disaster recovery query, result: {'hadr_cluster_stat': 'normal', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '', 'RTO': ''}
复制
3. 建立容灾关系后的正常状态
3.1 主集群主节点
3.1.1 容灾状态
- hadr_cluster_stat 为 archive:
[omm@apanweidb01 ~]$ gs_sdr -t query -------------------------------------------------------------------------------- Streaming disaster recovery query d6b666261da511efa53c000c29fd99eb -------------------------------------------------------------------------------- Start streaming disaster query. Start check archive. Start check recovery. Start check RPO & RTO. Successfully executed streaming disaster recovery query, result: {'hadr_cluster_stat': 'archive', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '0', 'RTO': '0'}
复制
3.1.2 集群状态
[omm@apanweidb02 ~]$ gs_om -t status --detail [ CMServer State ] node node_ip instance state --------------------------------------------------------------------------- 1 apanweidb01 <主集群1-ip> 1 /database/panweidb/cm/cm_server Standby 2 apanweidb02 <主集群2-ip> 2 /database/panweidb/cm/cm_server Standby 3 apanweidb03 <主集群3-ip> 3 /database/panweidb/cm/cm_server Primary [ Cluster State ] cluster_state : Normal redistributing : No balanced : Yes current_az : AZ_ALL [ Datanode State ] node node_ip instance state ----------------------------------------------------------------------------- 1 apanweidb01 <主集群1-ip> 6001 /database/panweidb/data P Primary Normal 2 apanweidb02 <主集群2-ip> 6002 /database/panweidb/data S Standby Normal 3 apanweidb03 <主集群3-ip> 6003 /database/panweidb/data S Standby Normal
复制
3.1.3 流复制状态
- 流复制会包含主集群备节点(out)和备集群主节点(out):
[omm@apanweidb01 ~]$ gs_ctl query [2024-07-15 20:23:10.078][2670696][][gs_ctl]: gs_ctl query ,datadir is /database/panweidb/data HA state: local_role : Primary static_connections : 5 db_state : Normal detail_information : Normal Senders info: sender_pid : 1868144 local_role : Primary peer_role : Standby peer_state : Normal state : Streaming sender_sent_location : 0/2CCEE3F8 sender_write_location : 0/2CCEE3F8 sender_flush_location : 0/2CCEE3F8 sender_replay_location : 0/2CCEE3F8 receiver_received_location : 0/2CCEE3F8 receiver_write_location : 0/2CCEE3F8 receiver_flush_location : 0/2CCEE3F8 receiver_replay_location : 0/2CCEE3F8 sync_percent : 100% sync_state : Quorum sync_priority : 1 sync_most_available : Off channel : <主集群1-ip>:15401--><主集群2-ip>:40332 sender_pid : 1868143 local_role : Primary peer_role : Standby peer_state : Normal state : Streaming sender_sent_location : 0/2CCEE3F8 sender_write_location : 0/2CCEE3F8 sender_flush_location : 0/2CCEE3F8 sender_replay_location : 0/2CCEE3F8 receiver_received_location : 0/2CCEE3F8 receiver_write_location : 0/2CCEE3F8 receiver_flush_location : 0/2CCEE3F8 receiver_replay_location : 0/2CCEE3F8 sync_percent : 100% sync_state : Quorum sync_priority : 1 sync_most_available : Off channel : <主集群1-ip>:15401--><主集群3-ip>:65376 sender_pid : 1868145 local_role : Primary peer_role : Standby peer_state : Normal state : Streaming sender_sent_location : 0/2CCEE3F8 sender_write_location : 0/2CCEE3F8 sender_flush_location : 0/2CCEE3F8 sender_replay_location : 0/2CCEE3F8 receiver_received_location : 0/2CCEE3F8 receiver_write_location : 0/2CCEE3F8 receiver_flush_location : 0/2CCEE3F8 receiver_replay_location : 0/2CCEE3F8 sync_percent : 100% sync_state : Async sync_priority : 0 sync_most_available : Off channel : <主集群1-ip>:15401--><备集群1-ip>:64174 Receiver info: No information
复制
3.1.4 复制槽状态
- 注意 dn_6001_hadr 是指向备集群主节点的:
postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn | dummy_standby | confirmed_flush --------------+--------+-----------+--------+----------+--------+------+--------------+-------------+---------------+----------------- dn_6002 | | physical | 0 | | t | | | 0/2CCF8430 | f | dn_6003 | | physical | 0 | | t | | | 0/2CCF8430 | f | dn_6001_hadr | | physical | 0 | | t | | | 0/2CCF8430 | f | (3 rows) postgres=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sender_sent_location | receiver_write_location | receiver_flush_location | receiver_replay_location | sync_priority | sync_state ----------------+----------+---------+------------------------------------+---------------+-----------------+-------------+-------------------------------+-----------+--------------------- -+-------------------------+-------------------------+--------------------------+---------------+------------ 23369422993152 | 10 | omm | WalSender to Standby[hadr_dn_6001] | <备集群1-ip> | | 64174 | 2024-07-11 22:04:30.19872+08 | Streaming | 0/2CCF8430 | 0/2CCF8430 | 0/2CCF8430 | 0/2CCF8430 | 0 | Async 23369553135360 | 10 | omm | WalSender to Standby[dn_6003] | <主集群3-ip> | | 65376 | 2024-07-11 22:04:29.656365+08 | Streaming | 0/2CCF8430 | 0/2CCF8430 | 0/2CCF8430 | 0/2CCF8430 | 1 | Quorum 23369406211840 | 10 | omm | WalSender to Standby[dn_6002] | <主集群2-ip> | | 40332 | 2024-07-11 22:04:29.726514+08 | Streaming | 0/2CCF8430 | 0/2CCF8430 | 0/2CCF8430 | 0/2CCF8430 | 1 | Quorum (3 rows)
复制
3.2 备集群主节点
3.2.1 容灾状态
- hadr_cluster_stat 为 recovery:
[omm@bpanweidb02 ~]$ gs_sdr -t query -------------------------------------------------------------------------------- Streaming disaster recovery query 55694ade1dc311efbd28000c29a8e394 -------------------------------------------------------------------------------- Start streaming disaster query. Start check archive. Start check recovery. Start check RPO & RTO. Successfully executed streaming disaster recovery query, result: {'hadr_cluster_stat': 'recovery', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '', 'RTO': ''}
复制
3.2.2 集群状态
- 注意dn的角色:
[omm@bpanweidb01 ~]$ gs_om -t status --detail [ CMServer State ] node node_ip instance state --------------------------------------------------------------------------- 1 bpanweidb01 <备集群1-ip> 1 /database/panweidb/cm/cm_server Standby 2 bpanweidb02 <备集群2-ip> 2 /database/panweidb/cm/cm_server Primary 3 bpanweidb03 <备集群3-ip> 3 /database/panweidb/cm/cm_server Standby [ Cluster State ] cluster_state : Normal redistributing : No balanced : Yes current_az : AZ_ALL [ Datanode State ] node node_ip instance state ----------------------------------------------------------------------------- 1 bpanweidb01 <备集群1-ip> 6001 /database/panweidb/data P Main Standby Normal 2 bpanweidb02 <备集群2-ip> 6002 /database/panweidb/data S Cascade Standby Normal 3 bpanweidb03 <备集群3-ip> 6003 /database/panweidb/data S Cascade Standby Normal
复制
3.2.3 流复制状态
- 流复制会包含主集群主节点(in)备集群备节点(out):
[omm@bpanweidb01 ~]$ gs_ctl query [2024-07-15 20:25:36.254][3706201][][gs_ctl]: gs_ctl query ,datadir is /database/panweidb/data HA state: local_role : Main Standby static_connections : 5 db_state : Normal detail_information : Normal Senders info: sender_pid : 3250685 local_role : Main Standby peer_role : Cascade Standby peer_state : Normal state : Streaming sender_sent_location : 0/2CCEE638 sender_write_location : 0/2CCEE638 sender_flush_location : 0/2CCEE638 sender_replay_location : 0/2CCEE638 receiver_received_location : 0/2CCEE638 receiver_write_location : 0/2CCEE638 receiver_flush_location : 0/2CCEE638 receiver_replay_location : 0/2CCEE638 sync_percent : 100% sync_state : Async sync_priority : 0 sync_most_available : Off channel : <备集群1-ip>:15401--><备集群3-ip>:40300 sender_pid : 3250692 local_role : Main Standby peer_role : Cascade Standby peer_state : Normal state : Streaming sender_sent_location : 0/2CCEE638 sender_write_location : 0/2CCEE638 sender_flush_location : 0/2CCEE638 sender_replay_location : 0/2CCEE638 receiver_received_location : 0/2CCEE638 receiver_write_location : 0/2CCEE638 receiver_flush_location : 0/2CCEE638 receiver_replay_location : 0/2CCEE638 sync_percent : 100% sync_state : Async sync_priority : 0 sync_most_available : Off channel : <备集群1-ip>:15401--><备集群2-ip>:63784 Receiver info: receiver_pid : 3250885 local_role : Main Standby peer_role : Primary peer_state : Normal state : Normal sender_sent_location : 0/2CCEE638 sender_write_location : 0/2CCEE638 sender_flush_location : 0/2CCEE638 sender_replay_location : 0/2CCEE638 receiver_received_location : 0/2CCEE638 receiver_write_location : 0/2CCEE638 receiver_flush_location : 0/2CCEE638 receiver_replay_location : 0/2CCEE638 sync_percent : 100% channel : <备集群1-ip>:64174<--<主集群1-ip>:15401
复制
3.2.4 复制槽状态
postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn | dummy_standby | confirmed_flush -----------+--------+-----------+--------+----------+--------+------+--------------+-------------+---------------+----------------- dn_6002 | | physical | 0 | | t | | | 0/2CCF8670 | f | dn_6003 | | physical | 0 | | t | | | 0/2CCF8670 | f | (2 rows) postgres=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sender_sent_location | re ceiver_write_location | receiver_flush_location | receiver_replay_location | sync_priority | sync_state ----------------+----------+---------+-------------------------------+---------------+-----------------+-------------+-------------------------------+-----------+----------------------+--- ----------------------+-------------------------+--------------------------+---------------+------------ 23145466558208 | 10 | omm | WalSender to Standby[dn_6002] | <备集群2-ip> | | 63784 | 2024-07-11 22:04:16.878049+08 | Streaming | 0/2CCF8670 | 0/ 2CCF8670 | 0/2CCF8670 | 0/2CCF8670 | 0 | Async 23145405675264 | 10 | omm | WalSender to Standby[dn_6003] | <备集群3-ip> | | 40300 | 2024-07-11 22:04:16.712418+08 | Streaming | 0/2CCF8670 | 0/ 2CCF8670 | 0/2CCF8670 | 0/2CCF8670 | 0 | Async (2 rows)
复制
参考
- hadr_cluster_stat 含义请参考 openGauss官方文档
最后修改时间:2024-07-15 23:16:37
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
磐维集中式数据库远程备份、恢复实战
飞天
168次阅读
2025-04-13 19:24:59
磐维运维管理平台(panwei_dbops)部署及使用(2025.03.30)
飞天
146次阅读
2025-03-30 23:47:45
磐维数据库运维合集(一):日常运维相关命令
钟一
99次阅读
2025-04-02 17:20:32
磐维分布式数据库备份部署
飞天
92次阅读
2025-03-28 23:46:28
磐维数据库备份与恢复
飞天
83次阅读
2025-04-11 23:50:35
【实操记录】磐维数据库-a模式下的一个别名大小写问题,及相关参数测试
磐维数据库
81次阅读
2025-03-31 17:41:46
【数据安全管理】磐维数据库-关于审计的经验分享
磐维数据库
75次阅读
2025-03-31 17:37:00
【干货分享】磐维数据库分布式升级v1.4经验分享
磐维数据库
68次阅读
2025-03-31 17:02:10
磐维分布式数据库升级实战
飞天
64次阅读
2025-04-12 16:54:56
磐维数据库出现锁超时如何处理“ERROR: Lock wait timeout”
王贝
62次阅读
2025-04-01 10:28:33