暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

openGauss6.0主备节点都为Primary分析处理

openGauss 2024-11-13
243

由于是在个人虚拟机环境,本文档中主备切换未遇到报错,如果遇到报错,可按以下步骤进行排查。

环境说明

角色

主机名

IPADDR

OS Version

DB version

opendb01

192.168.40.160

Centos7.9 x86_64

openGauss6.0.0

opendb02

192.168.40.161

Centos7.9 x86_64

openGauss6.0.0

主备节点都为Primary的原因

主备节点都为Primary的原因如下:

  • 业务压力下,主备实例切换时间长,这种情况不需要处理。

  • 其他备机正在build的情况下,主机需要发送日志到备机后,才能降备,导致主备切换时间长。这种情况不需要处理,但应尽量避免build过程中进行主备切换。

  • 切换过程中,因网络故障、磁盘满等原因造成主备实例连接断开,出现双主现象。

注意: 出现双主状态后,请按如下步骤恢复成正常的主备状态。否则可能会造成数据丢失。

处理步骤

查看主备情况

任一节点操作均可,若查询结果显示两个实例的状态都为Primary,这种状态为异常状态。

    su - omm
    gs_om -t status --detail

    输出如下:

      [omm@opendb01 ~]$ gs_om -t status --detail
      [ Cluster State ]


      cluster_state : Normal
      redistributing : No
      current_az : AZ_ALL


      [ Datanode State ]


      node node_ip port instance state
      -----------------------------------------------------------------------------------------------
      1 opendb01 192.168.40.160 5432 6001 opt/huawei/install/data/dn P Primary Normal
      2  opendb02 192.168.40.161  5432       6002 /opt/huawei/install/data/dn   P Primary Normal

      确定降为备机的节点,在节点上执行如下命令关闭服务

        su - omm
        gs_ctl stop -D opt/huawei/install/data/dn


        参数说明:-D /opt/huawei/install/data/dn  即-D 备节点的数据目录

        输出如下:

          [omm@opendb01 ~]$ gs_ctl stop -D opt/huawei/install/data/dn
          [2024-10-25 05:10:19.526][8045][][gs_ctl]: gs_ctl stopped ,datadir is opt/huawei/install/data/dn
          waiting for server to shut down.... done
          server stopped

          以standby模式启动备节点

            su - omm
            gs_ctl start -D opt/huawei/install/data/dn -M standby


            参数说明:-D opt/huawei/install/data/dn 即-D 备节点的数据目录
                     -M standby 即模式

            输出如下:

              [omm@opendb01 ~]$ gs_ctl start -D opt/huawei/install/data/dn -M standby
              [2024-10-25 05:11:53.989][8093][][gs_ctl]: gs_ctl started,datadir is opt/huawei/install/data/dn
              [2024-10-25 05:11:54.030][8093][][gs_ctl]: waiting for server to start...
              .0 LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.


              0 LOG: [Alarm Module]Host Name: opendb01


              0 LOG: [Alarm Module]Host IP: opendb01. Copy hostname directly in case of taking 10s to use 'gethostbyname' when etc/hosts does not contain <HOST IP>


              0 LOG: [Alarm Module]Cluster Name: cluster_dxj


              0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58


              0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
              0 WARNING: failed to parse feature control file: gaussdb.version.
              0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version.
              2024-10-25 05:11:54.111 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: base_page_saved_interval is 400, ori is 400.
              2024-10-25 05:11:54.116 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 1, max = 4, actual = 1
              2024-10-25 05:11:54.116 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
              2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.


              2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host Name: opendb01


              2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host IP: opendb01. Copy hostname directly in case oftaking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>


              2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Cluster Name: cluster_dxj


              2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58


              2024-10-25 05:11:54.125 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: loaded library "security_plugin"
              2024-10-25 05:11:54.128 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
              2024-10-25 05:11:54.130 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
              2024-10-25 05:11:54.130 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (3630 Mbytes) is larger.
              2024-10-25 05:11:54.192 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
              2024-10-25 05:11:54.532 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [SEGMENT_PAGE] LOG: Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512
              2024-10-25 05:11:54.571 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/opt/huawei/install/data/dn/gaussdb.state.temp" success
              2024-10-25 05:11:54.571 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Standby), connection index(1)
              2024-10-25 05:11:54.572 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 974, usable_fds = 1000, already_open = 16


              [2024-10-25 05:11:55.037][8093][][gs_ctl]: done
              [2024-10-25 05:11:55.037][8093][][gs_ctl]: server started (/opt/huawei/install/data/dn)
              [omm@opendb01 ~]$

              保存数据库主备机器信息

              任一节点操作即可,会动态地保存所有节点机器信息

                su - omm
                gs_om -t refreshconf

                输出如下:

                  [omm@opendb01 ~]$ gs_om -t refreshconf
                  Generating dynamic configuration file for all nodes.
                  Successfully generated dynamic configuration file.

                  查看主备情况

                  任一节点操作均可,确认实例状态恢复,现在161为主,160为备

                    su - omm
                    gs_om -t status --detail

                    输出如下:

                      [omm@opendb01 ~]$ gs_om -t status --detail
                      [ Cluster State ]


                      cluster_state : Normal
                      redistributing : No
                      current_az : AZ_ALL


                      [ Datanode State ]


                      node node_ip port instance state
                      -----------------------------------------------------------------------------------------------
                      1 opendb01 192.168.40.160 5432 6001 /opt/huawei/install/data/dn P Primary Normal
                      2  opendb02 192.168.40.161  5432       6002 /opt/huawei/install/data/dn   S Standby Normal

                      点击阅读原文跳转作者文章

                      文章转载自openGauss,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                      评论