暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

【磐维数据库】如何检查容灾集群状态?

原创 Darcy 2024-07-15
3949

0. 概述

主要检查一下三方面:

  • 容灾状态
  • 集群状态
  • 流复制状态

1. 常用指令

gs_sdr -t query
gs_om -t status --detail
gs_ctl query

select * from pg_stat_replication;
select * from pg_replication_slots;

cm_ctl query -Cvidp
gs_ctl status
复制

2. 未建立容灾关系时

  • hadr_cluster_stat 为 normal:
[omm@panweidb01 ~]$ gs_sdr -t query
--------------------------------------------------------------------------------
Streaming disaster recovery query cdf29f001da511efbb2f000c29a96163
--------------------------------------------------------------------------------
Start streaming disaster query.
Start check archive.
Start check recovery.
Start check RPO & RTO.
Successfully executed streaming disaster recovery query, result:
{'hadr_cluster_stat': 'normal', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '', 'RTO': ''}
复制

3. 建立容灾关系后的正常状态

3.1 主集群主节点

3.1.1 容灾状态

  • hadr_cluster_stat 为 archive:
[omm@apanweidb01 ~]$ gs_sdr -t query
--------------------------------------------------------------------------------
Streaming disaster recovery query d6b666261da511efa53c000c29fd99eb
--------------------------------------------------------------------------------
Start streaming disaster query.
Start check archive.
Start check recovery.
Start check RPO & RTO.
Successfully executed streaming disaster recovery query, result:
{'hadr_cluster_stat': 'archive', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '0', 'RTO': '0'}
复制

3.1.2 集群状态

[omm@apanweidb02 ~]$ gs_om -t status --detail
[  CMServer State   ]

node           node_ip         instance                             state
---------------------------------------------------------------------------
1  apanweidb01 <主集群1-ip>   1    /database/panweidb/cm/cm_server Standby
2  apanweidb02 <主集群2-ip>   2    /database/panweidb/cm/cm_server Standby
3  apanweidb03 <主集群3-ip>   3    /database/panweidb/cm/cm_server Primary

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node           node_ip         instance                     state            
-----------------------------------------------------------------------------
1  apanweidb01 <主集群1-ip>   6001 /database/panweidb/data P Primary Normal
2  apanweidb02 <主集群2-ip>   6002 /database/panweidb/data S Standby Normal
3  apanweidb03 <主集群3-ip>   6003 /database/panweidb/data S Standby Normal
复制

3.1.3 流复制状态

  • 流复制会包含主集群备节点(out)和备集群主节点(out):
[omm@apanweidb01 ~]$ gs_ctl query
[2024-07-15 20:23:10.078][2670696][][gs_ctl]: gs_ctl query ,datadir is /database/panweidb/data 
 HA state:           
	local_role                     : Primary
	static_connections             : 5
	db_state                       : Normal
	detail_information             : Normal

 Senders info:       
	sender_pid                     : 1868144
	local_role                     : Primary
	peer_role                      : Standby
	peer_state                     : Normal
	state                          : Streaming
	sender_sent_location           : 0/2CCEE3F8
	sender_write_location          : 0/2CCEE3F8
	sender_flush_location          : 0/2CCEE3F8
	sender_replay_location         : 0/2CCEE3F8
	receiver_received_location     : 0/2CCEE3F8
	receiver_write_location        : 0/2CCEE3F8
	receiver_flush_location        : 0/2CCEE3F8
	receiver_replay_location       : 0/2CCEE3F8
	sync_percent                   : 100%
	sync_state                     : Quorum
	sync_priority                  : 1
	sync_most_available            : Off
	channel                        : <主集群1-ip>:15401--><主集群2-ip>:40332

	sender_pid                     : 1868143
	local_role                     : Primary
	peer_role                      : Standby
	peer_state                     : Normal
	state                          : Streaming
	sender_sent_location           : 0/2CCEE3F8
	sender_write_location          : 0/2CCEE3F8
	sender_flush_location          : 0/2CCEE3F8
	sender_replay_location         : 0/2CCEE3F8
	receiver_received_location     : 0/2CCEE3F8
	receiver_write_location        : 0/2CCEE3F8
	receiver_flush_location        : 0/2CCEE3F8
	receiver_replay_location       : 0/2CCEE3F8
	sync_percent                   : 100%
	sync_state                     : Quorum
	sync_priority                  : 1
	sync_most_available            : Off
	channel                        : <主集群1-ip>:15401--><主集群3-ip>:65376

	sender_pid                     : 1868145
	local_role                     : Primary
	peer_role                      : Standby
	peer_state                     : Normal
	state                          : Streaming
	sender_sent_location           : 0/2CCEE3F8
	sender_write_location          : 0/2CCEE3F8
	sender_flush_location          : 0/2CCEE3F8
	sender_replay_location         : 0/2CCEE3F8
	receiver_received_location     : 0/2CCEE3F8
	receiver_write_location        : 0/2CCEE3F8
	receiver_flush_location        : 0/2CCEE3F8
	receiver_replay_location       : 0/2CCEE3F8
	sync_percent                   : 100%
	sync_state                     : Async
	sync_priority                  : 0
	sync_most_available            : Off
	channel                        : <主集群1-ip>:15401--><备集群1-ip>:64174

 Receiver info:      
No information
复制

3.1.4 复制槽状态

  • 注意 dn_6001_hadr 是指向备集群主节点的:
postgres=# select * from pg_replication_slots;
  slot_name   | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn | dummy_standby | confirmed_flush 
--------------+--------+-----------+--------+----------+--------+------+--------------+-------------+---------------+-----------------
 dn_6002      |        | physical  |      0 |          | t      |      |              | 0/2CCF8430  | f             | 
 dn_6003      |        | physical  |      0 |          | t      |      |              | 0/2CCF8430  | f             | 
 dn_6001_hadr |        | physical  |      0 |          | t      |      |              | 0/2CCF8430  | f             | 
(3 rows)

postgres=# select * from pg_stat_replication;
      pid       | usesysid | usename |          application_name          |  client_addr  | client_hostname | client_port |         backend_start         |   state   | sender_sent_location
 | receiver_write_location | receiver_flush_location | receiver_replay_location | sync_priority | sync_state 
----------------+----------+---------+------------------------------------+---------------+-----------------+-------------+-------------------------------+-----------+---------------------
-+-------------------------+-------------------------+--------------------------+---------------+------------
 23369422993152 |       10 | omm     | WalSender to Standby[hadr_dn_6001] | <备集群1-ip> |                 |       64174 | 2024-07-11 22:04:30.19872+08  | Streaming | 0/2CCF8430          
 | 0/2CCF8430              | 0/2CCF8430              | 0/2CCF8430               |             0 | Async
 23369553135360 |       10 | omm     | WalSender to Standby[dn_6003]      | <主集群3-ip> |                 |       65376 | 2024-07-11 22:04:29.656365+08 | Streaming | 0/2CCF8430          
 | 0/2CCF8430              | 0/2CCF8430              | 0/2CCF8430               |             1 | Quorum
 23369406211840 |       10 | omm     | WalSender to Standby[dn_6002]      | <主集群2-ip> |                 |       40332 | 2024-07-11 22:04:29.726514+08 | Streaming | 0/2CCF8430          
 | 0/2CCF8430              | 0/2CCF8430              | 0/2CCF8430               |             1 | Quorum
(3 rows)
复制

3.2 备集群主节点

3.2.1 容灾状态

  • hadr_cluster_stat 为 recovery:
[omm@bpanweidb02 ~]$ gs_sdr -t query
--------------------------------------------------------------------------------
Streaming disaster recovery query 55694ade1dc311efbd28000c29a8e394
--------------------------------------------------------------------------------
Start streaming disaster query.
Start check archive.
Start check recovery.
Start check RPO & RTO.
Successfully executed streaming disaster recovery query, result:
{'hadr_cluster_stat': 'recovery', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '', 'RTO': ''}
复制

3.2.2 集群状态

  • 注意dn的角色:
[omm@bpanweidb01 ~]$ gs_om -t status --detail
[  CMServer State   ]

node           node_ip         instance                             state
---------------------------------------------------------------------------
1  bpanweidb01 <备集群1-ip>   1    /database/panweidb/cm/cm_server Standby
2  bpanweidb02 <备集群2-ip>   2    /database/panweidb/cm/cm_server Primary
3  bpanweidb03 <备集群3-ip>   3    /database/panweidb/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node           node_ip         instance                     state            
-----------------------------------------------------------------------------
1  bpanweidb01 <备集群1-ip>   6001 /database/panweidb/data P Main Standby Normal
2  bpanweidb02 <备集群2-ip>   6002 /database/panweidb/data S Cascade Standby Normal
3  bpanweidb03 <备集群3-ip>   6003 /database/panweidb/data S Cascade Standby Normal
复制

3.2.3 流复制状态

  • 流复制会包含主集群主节点(in)备集群备节点(out):
[omm@bpanweidb01 ~]$ gs_ctl query
[2024-07-15 20:25:36.254][3706201][][gs_ctl]: gs_ctl query ,datadir is /database/panweidb/data 
 HA state:           
	local_role                     : Main Standby
	static_connections             : 5
	db_state                       : Normal
	detail_information             : Normal

 Senders info:       
	sender_pid                     : 3250685
	local_role                     : Main Standby
	peer_role                      : Cascade Standby
	peer_state                     : Normal
	state                          : Streaming
	sender_sent_location           : 0/2CCEE638
	sender_write_location          : 0/2CCEE638
	sender_flush_location          : 0/2CCEE638
	sender_replay_location         : 0/2CCEE638
	receiver_received_location     : 0/2CCEE638
	receiver_write_location        : 0/2CCEE638
	receiver_flush_location        : 0/2CCEE638
	receiver_replay_location       : 0/2CCEE638
	sync_percent                   : 100%
	sync_state                     : Async
	sync_priority                  : 0
	sync_most_available            : Off
	channel                        : <备集群1-ip>:15401--><备集群3-ip>:40300

	sender_pid                     : 3250692
	local_role                     : Main Standby
	peer_role                      : Cascade Standby
	peer_state                     : Normal
	state                          : Streaming
	sender_sent_location           : 0/2CCEE638
	sender_write_location          : 0/2CCEE638
	sender_flush_location          : 0/2CCEE638
	sender_replay_location         : 0/2CCEE638
	receiver_received_location     : 0/2CCEE638
	receiver_write_location        : 0/2CCEE638
	receiver_flush_location        : 0/2CCEE638
	receiver_replay_location       : 0/2CCEE638
	sync_percent                   : 100%
	sync_state                     : Async
	sync_priority                  : 0
	sync_most_available            : Off
	channel                        : <备集群1-ip>:15401--><备集群2-ip>:63784

 Receiver info:      
	receiver_pid                   : 3250885
	local_role                     : Main Standby
	peer_role                      : Primary
	peer_state                     : Normal
	state                          : Normal
	sender_sent_location           : 0/2CCEE638
	sender_write_location          : 0/2CCEE638
	sender_flush_location          : 0/2CCEE638
	sender_replay_location         : 0/2CCEE638
	receiver_received_location     : 0/2CCEE638
	receiver_write_location        : 0/2CCEE638
	receiver_flush_location        : 0/2CCEE638
	receiver_replay_location       : 0/2CCEE638
	sync_percent                   : 100%
	channel                        : <备集群1-ip>:64174<--<主集群1-ip>:15401
复制

3.2.4 复制槽状态

postgres=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn | dummy_standby | confirmed_flush 
-----------+--------+-----------+--------+----------+--------+------+--------------+-------------+---------------+-----------------
 dn_6002   |        | physical  |      0 |          | t      |      |              | 0/2CCF8670  | f             | 
 dn_6003   |        | physical  |      0 |          | t      |      |              | 0/2CCF8670  | f             | 
(2 rows)

postgres=# select * from pg_stat_replication;
      pid       | usesysid | usename |       application_name        |  client_addr  | client_hostname | client_port |         backend_start         |   state   | sender_sent_location | re
ceiver_write_location | receiver_flush_location | receiver_replay_location | sync_priority | sync_state 
----------------+----------+---------+-------------------------------+---------------+-----------------+-------------+-------------------------------+-----------+----------------------+---
----------------------+-------------------------+--------------------------+---------------+------------
 23145466558208 |       10 | omm     | WalSender to Standby[dn_6002] | <备集群2-ip> |                 |       63784 | 2024-07-11 22:04:16.878049+08 | Streaming | 0/2CCF8670           | 0/
2CCF8670              | 0/2CCF8670              | 0/2CCF8670               |             0 | Async
 23145405675264 |       10 | omm     | WalSender to Standby[dn_6003] | <备集群3-ip> |                 |       40300 | 2024-07-11 22:04:16.712418+08 | Streaming | 0/2CCF8670           | 0/
2CCF8670              | 0/2CCF8670              | 0/2CCF8670               |             0 | Async
(2 rows)
复制

参考

最后修改时间:2024-07-15 23:16:37
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论