暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

openGauss6.0主备节点都为Primary分析处理

原创 董小姐 2024-10-27
180

由于是在个人虚拟机环境,本文档中主备切换未遇到报错,如果遇到报错,可按以下步骤进行排查。

环境说明

角色

主机名

IPADDR

OS Version

DB version

opendb01

192.168.40.160

Centos7.9 x86_64

openGauss6.0.0

opendb02

192.168.40.161

Centos7.9 x86_64

openGauss6.0.0

主备节点都为Primary的原因

主备节点都为Primary的原因如下:

  • 业务压力下,主备实例切换时间长,这种情况不需要处理。
  • 其他备机正在build的情况下,主机需要发送日志到备机后,才能降备,导致主备切换时间长。这种情况不需要处理,但应尽量避免build过程中进行主备切换。
  • 切换过程中,因网络故障、磁盘满等原因造成主备实例连接断开,出现双主现象。

注意: 出现双主状态后,请按如下步骤恢复成正常的主备状态。否则可能会造成数据丢失。

处理步骤

查看主备情况

任一节点操作均可,若查询结果显示两个实例的状态都为Primary,这种状态为异常状态。

su - omm
gs_om -t status --detail

输出如下:

[omm@opendb01 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node    node_ip         port      instance                            state
-----------------------------------------------------------------------------------------------
1  opendb01 192.168.40.160  5432       6001 /opt/huawei/install/data/dn   P Primary Normal
2  opendb02 192.168.40.161  5432       6002 /opt/huawei/install/data/dn   P Primary Normal

确定降为备机的节点,在节点上执行如下命令关闭服务

su - omm
gs_ctl stop -D /opt/huawei/install/data/dn

参数说明:-D /opt/huawei/install/data/dn  即-D 备节点的数据目录

输出如下:

[omm@opendb01 ~]$ gs_ctl stop -D /opt/huawei/install/data/dn
[2024-10-25 05:10:19.526][8045][][gs_ctl]: gs_ctl stopped ,datadir is /opt/huawei/install/data/dn
waiting for server to shut down.... done
server stopped

以standby模式启动备节点

su - omm
gs_ctl start -D /opt/huawei/install/data/dn -M standby

参数说明:-D /opt/huawei/install/data/dn  即-D 备节点的数据目录
         -M standby 即模式

输出如下:

[omm@opendb01 ~]$ gs_ctl start -D /opt/huawei/install/data/dn -M standby
[2024-10-25 05:11:53.989][8093][][gs_ctl]: gs_ctl started,datadir is /opt/huawei/install/data/dn
[2024-10-25 05:11:54.030][8093][][gs_ctl]: waiting for server to start...
.0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

0 LOG:  [Alarm Module]Host Name: opendb01

0 LOG:  [Alarm Module]Host IP: opendb01. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>

0 LOG:  [Alarm Module]Cluster Name: cluster_dxj

0 LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58

0 WARNING:  failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING:  failed to parse feature control file: gaussdb.version.
0 WARNING:  Failed to load the product control file, so gaussdb cannot distinguish product version.
2024-10-25 05:11:54.111 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  base_page_saved_interval is 400, ori is 400.
2024-10-25 05:11:54.116 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 DB010  0 [REDO] LOG:  Recovery parallelism, cpu count = 1, max = 4, actual = 1
2024-10-25 05:11:54.116 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 DB010  0 [REDO] LOG:  ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  [Alarm Module]Host Name: opendb01

2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  [Alarm Module]Host IP: opendb01. Copy hostname directly in case oftaking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>

2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  [Alarm Module]Cluster Name: cluster_dxj

2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58

2024-10-25 05:11:54.125 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  loaded library "security_plugin"
2024-10-25 05:11:54.128 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2024-10-25 05:11:54.130 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2024-10-25 05:11:54.130 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (3630 Mbytes) is larger.
2024-10-25 05:11:54.192 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [CACHE] LOG:  set data cache  size(805306368)
2024-10-25 05:11:54.532 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [SEGMENT_PAGE] LOG:  Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512
2024-10-25 05:11:54.571 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  gaussdb: fsync file "/opt/huawei/install/data/dn/gaussdb.state.temp" success
2024-10-25 05:11:54.571 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  create gaussdb state file success: db state(STARTING_STATE), server mode(Standby), connection index(1)
2024-10-25 05:11:54.572 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000  0 [BACKEND] LOG:  max_safe_fds = 974, usable_fds = 1000, already_open = 16

[2024-10-25 05:11:55.037][8093][][gs_ctl]:  done
[2024-10-25 05:11:55.037][8093][][gs_ctl]: server started (/opt/huawei/install/data/dn)
[omm@opendb01 ~]$

保存数据库主备机器信息

任一节点操作即可,会动态地保存所有节点机器信息

su - omm
gs_om -t refreshconf

输出如下:

[omm@opendb01 ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.

查看主备情况

任一节点操作均可,确认实例状态恢复,现在161为主,160为备

su - omm
gs_om -t status --detail

输出如下:

[omm@opendb01 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node    node_ip         port      instance                            state
-----------------------------------------------------------------------------------------------
1  opendb01 192.168.40.160  5432       6001 /opt/huawei/install/data/dn   P Primary Normal
2  opendb02 192.168.40.161  5432       6002 /opt/huawei/install/data/dn   S Standby Normal


参考链接:实例主备切换 (osinfra.cn)

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
1人已赞赏
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论