暂无图片
暂无图片
4
暂无图片
暂无图片
暂无图片

openGauss训练营学习心得-手动安装主备HA环境,并测试switchover和failover效果(2)

原创 jieyancai 2021-09-13
3948

接上篇openGauss安装初体验
https://www.modb.pro/db/108362

这篇手动安装主备HA环境,并测试switchover和failover效果。

主库:omm02
192.168.52.143
备库:omm03
192.168.52.144

克隆虚机omm02为omm03修改主机名和ip地址。
克隆后:目录结构一致,omm用户和组已OK,并且openGauss的数据库软件已经安装完成。

基本思路:
1. 主库操作:
	初始化主库
	修改主库postgresql.conf和pg_hba.conf
	主库模式启动

2.备库操作:
	备库做恢复
	修改主库postgresql.conf和pg_hba.conf

3.主备切换测试:
	switchover测试
	failover测试
	build重建主备关系

重新开始搭建:
1.主库操作:
环境变量:
/etc/profile添加

export GAUSSHOME=/gaussdb/app
export PGDATA=/gaussdb/data/db1

停止原先数据库:

su - omm
gs_ctl stop -D /gaussdb/data/db1

删除/gaussdb/data/db1目录下的文件:

rm -rf /gaussdb/data/db1/*

初始化库:
gs_initdb -D /gaussdb/data/db1 --nodename=omm02

[omm@omm02 ~]$ rm -rf /gaussdb/data/db1/*
[omm@omm02 ~]$ gs_initdb -D /gaussdb/data/db1 --nodename=omm02
The files belonging to this database system will be owned by user "omm".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

fixing permissions on existing directory /gaussdb/data/db1 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /gaussdb/data/db1/base/1 ... ok
initializing pg_authid ... ok
setting password ... ok
initializing dependencies ... ok
loading PL/pgSQL server-side language ... ok
creating system views ... ok
creating performance views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
initialize global configure for bucketmap length ... ok
creating information schema ... ok
loading foreign-data wrapper for distfs access ... ok
loading foreign-data wrapper for hdfs access ... ok
loading foreign-data wrapper for log access ... ok
loading hstore extension ... ok
loading foreign-data wrapper for MOT access ... ok
loading security plugin ... ok
update system tables ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
freezing database template0 ... ok
freezing database template1 ... ok
freezing database postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run gs_initdb.

Success. You can now start the database server of single node using:

    gaussdb -D /gaussdb/data/db1 --single_node
or
gs_ctl start -D /gaussdb/data/db1 -Z single_node -l logfile

启动数据库:gs_ctl start -D /gaussdb/data/db1 -Z single_node -l logfile



[omm@omm02 ~]$ gs_ctl start -D /gaussdb/data/db1 -Z single_node -l logfile
[2021-09-13 10:15:55.071][25537][][gs_ctl]: gs_ctl started,datadir is /gaussdb/data/db1 
[2021-09-13 10:15:55.163][25537][][gs_ctl]: waiting for server to start...
.
[2021-09-13 10:15:56.185][25537][][gs_ctl]:  done
[2021-09-13 10:15:56.185][25537][][gs_ctl]: server started (/gaussdb/data/db1)

检查端口,默认为5432:
[omm@omm02 ~]$ netstat -tunlp|grep LISTEN
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 127.0.0.1:5432          0.0.0.0:*               LISTEN      25540/gaussdb       
tcp        0      0 127.0.0.1:5433          0.0.0.0:*               LISTEN      25540/gaussdb       
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      -                   
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -                   
tcp6       0      0 ::1:5432                :::*                    LISTEN      25540/gaussdb       
tcp6       0      0 ::1:5433                :::*                    LISTEN      25540/gaussdb       
tcp6       0      0 ::1:25                  :::*                    LISTEN      -                   
tcp6       0      0 :::22                   :::*                    LISTEN      -                   
连接数据库正常:
[omm@omm02 db1]$ gsql -d postgres -r
gsql ((openGauss 2.0.1 build d97c0e8a) compiled at 2021-06-02 19:37:17 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

postgres=# \l
ERROR:  Please use "ALTER ROLE user_name PASSWORD 'password';" to set the password of user omm before other operation!
postgres=# ALTER ROLE omm PASSWORD 'Jyczxm666';
ALTER ROLE
postgres=# \l
                              List of databases
   Name    | Owner | Encoding |   Collate   |    Ctype    | Access privileges 
-----------+-------+----------+-------------+-------------+-------------------
 postgres  | omm   | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | omm   | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/omm           +
           |       |          |             |             | omm=CTc/omm
 template1 | omm   | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/omm           +
           |       |          |             |             | omm=CTc/omm
(3 rows)

postgres=# \q

修改postgresql.conf文件(简单测试主备,所以没有修改归档等参数,生产上需启用归档模式并设置保留归档策略):

replconninfo1 = 'localhost=192.168.52.143 localport=5433 localheartbeatport=5434 remotehost=192.168.52.144 remoteport=5433 remoteheartbeatport=5434'

修改pg_hba.conf文件,保证两台直接可互信:

host    all             omm             192.168.52.0/24         trust
host    replication     omm        192.168.52.0/24            trust

启动主库-M primary:gs_ctl start -D /gaussdb/data/db1 -M primary


[omm@omm02 db1]$ gs_ctl stop -D /gaussdb/data/db1
[2021-09-13 10:23:50.254][25632][][gs_ctl]: gs_ctl stopped ,datadir is /gaussdb/data/db1 
waiting for server to shut down......... done
server stopped
[omm@omm02 db1]$ gs_ctl start -D /gaussdb/data/db1 -M primary 
[2021-09-13 10:24:11.565][25633][][gs_ctl]: gs_ctl started,datadir is /gaussdb/data/db1 
[2021-09-13 10:24:11.657][25633][][gs_ctl]: waiting for server to start...
.0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

0 LOG:  [Alarm Module]Host Name: omm02 

0 LOG:  [Alarm Module]Host IP: 192.168.52.143 

0 LOG:  [Alarm Module]Cluster Name: dbCluster 

0 LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52

0 WARNING:  failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING:  failed to parse feature control file: gaussdb.version.
0 WARNING:  Failed to load the product control file, so gaussdb cannot distinguish product version.
0 LOG:  Failed to initialze environment for codegen.
The core dump path is an invalid directory
2021-09-13 10:24:11.845 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  when starting as multi_standby mode, we couldn't support data replicaton.
2021-09-13 10:24:11.845 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

2021-09-13 10:24:11.845 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  [Alarm Module]Host Name: omm02 

2021-09-13 10:24:11.845 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  [Alarm Module]Host IP: 192.168.52.143 

2021-09-13 10:24:11.845 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  [Alarm Module]Cluster Name: dbCluster 

2021-09-13 10:24:11.845 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52

2021-09-13 10:24:11.845 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  Transparent encryption disabled.

2021-09-13 10:24:11.848 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  loaded library "security_plugin"
2021-09-13 10:24:11.851 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2021-09-13 10:24:11.851 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  reserved memory for backend threads is: 220 MB
2021-09-13 10:24:11.851 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  reserved memory for WAL buffers is: 128 MB
2021-09-13 10:24:11.851 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  Set max backend reserve memory is: 348 MB, max dynamic memory is: 11097 MB
2021-09-13 10:24:11.851 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  shared memory 330 Mbytes, memory context 11445 Mbytes, max process memory 12288 Mbytes
2021-09-13 10:24:11.870 [unknown] [unknown] localhost 139986958436096 0  0 [CACHE] LOG:  set data cache  size(402653184)
2021-09-13 10:24:11.882 [unknown] [unknown] localhost 139986958436096 0  0 [CACHE] LOG:  set metadata cache  size(134217728)
2021-09-13 10:24:11.917 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  gaussdb: fsync file "/gaussdb/data/db1/gaussdb.state.temp" success
2021-09-13 10:24:11.917 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  create gaussdb state file success: db state(STARTING_STATE), server mode(Primary)
2021-09-13 10:24:11.940 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  max_safe_fds = 977, usable_fds = 1000, already_open = 13
The core dump path is an invalid directory
2021-09-13 10:24:11.942 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  the configure file /gaussdb/app/etc/gscgroup_omm.cfg doesn't exist or the size of configure file has changed. Please create it by root user!
2021-09-13 10:24:11.942 [unknown] [unknown] localhost 139986958436096 0  0 [BACKEND] LOG:  Failed to parse cgroup config file.

[2021-09-13 10:24:12.685][25633][][gs_ctl]:  done
[2021-09-13 10:24:12.685][25633][][gs_ctl]: server started (/gaussdb/data/db1)

由于没有使用gs_om管理部署,所以下面的gs_om命令无作用。

[omm@omm02 db1]$ gs_om -t status --all
-----------------------------------------------------------------------

cluster_state             : Unavailable
redistributing            : No

-----------------------------------------------------------------------

node                      : 1
node_name                 : omm02
instance_id               : 6001
node_ip                   : 192.168.52.143
data_path                 : /gaussdb/data/db1
type                      : Datanode
instance_state            : Manually stopped
az_name                   : AZ1
instance_role             : Down

2.备库操作:
环境变量:
/etc/profile添加
export GAUSSHOME=/gaussdb/app
export PGDATA=/gaussdb/data/db1

恢复备库:gs_basebackup -D /gaussdb/data/db1 -h 192.168.52.143 -p 5432


[omm@omm03 log]$ rm -rf /gaussdb/data/db1/*
[omm@omm03 log]$ gs_basebackup -D /gaussdb/data/db1 -h 192.168.52.143 -p 5432
INFO:  The starting position of the xlog copy of the full build is: 0/3000028. The slot minimum LSN is: 0/0.
[2021-09-13 10:32:24]:begin build tablespace list
[2021-09-13 10:32:24]:finish build tablespace list
[2021-09-13 10:32:24]:begin get xlog by xlogstream
[2021-09-13 10:32:24]: check identify system success                                                
[2021-09-13 10:32:24]: send START_REPLICATION 0/3000000 success                                     
[2021-09-13 10:32:24]: keepalive message is received                                                
[2021-09-13 10:32:24]: keepalive message is received                                                
[2021-09-13 10:32:29]:gs_basebackup: base backup successfully

修改postgresql.conf文件(调整本地和远程IP):

replconninfo1 = 'localhost=192.168.52.144 localport=5433 localheartbeatport=5434 remotehost=192.168.52.143 remoteport=5433 remoteheartbeatport=5434'

同样检查pg_hba.conf文件,保持互信。

启动备库-M standby: gs_ctl start -D /gaussdb/data/db1 -M standby

[omm@omm03 db1]$ gs_ctl start -D /gaussdb/data/db1 -M standby
[2021-09-13 10:34:32.972][20469][][gs_ctl]: gs_ctl started,datadir is /gaussdb/data/db1 
[2021-09-13 10:34:33.093][20469][][gs_ctl]: waiting for server to start...
.0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

0 LOG:  [Alarm Module]Host Name: omm03 

0 LOG:  [Alarm Module]Host IP: 192.168.52.144 

0 LOG:  [Alarm Module]Cluster Name: dbCluster 

0 LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52

0 WARNING:  failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING:  failed to parse feature control file: gaussdb.version.
0 WARNING:  Failed to load the product control file, so gaussdb cannot distinguish product version.
0 LOG:  Failed to initialze environment for codegen.
0 WARNING:  bbox_dump_path is set to /gaussdb/corefile/

2021-09-13 10:34:33.339 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  when starting as multi_standby mode, we couldn't support data replicaton.
gaussdb.state does not exist, and skipt setting since it is optional.2021-09-13 10:34:33.339 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

2021-09-13 10:34:33.339 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  [Alarm Module]Host Name: omm03 

2021-09-13 10:34:33.339 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  [Alarm Module]Host IP: 192.168.52.144 

2021-09-13 10:34:33.339 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  [Alarm Module]Cluster Name: dbCluster 

2021-09-13 10:34:33.339 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52

2021-09-13 10:34:33.339 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  Transparent encryption disabled.

2021-09-13 10:34:33.341 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  loaded library "security_plugin"
2021-09-13 10:34:33.344 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2021-09-13 10:34:33.344 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  reserved memory for backend threads is: 220 MB
2021-09-13 10:34:33.344 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  reserved memory for WAL buffers is: 128 MB
2021-09-13 10:34:33.344 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  Set max backend reserve memory is: 348 MB, max dynamic memory is: 11097 MB
2021-09-13 10:34:33.344 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  shared memory 330 Mbytes, memory context 11445 Mbytes, max process memory 12288 Mbytes
2021-09-13 10:34:33.369 [unknown] [unknown] localhost 139715480606464 0  0 [CACHE] LOG:  set data cache  size(402653184)
2021-09-13 10:34:33.382 [unknown] [unknown] localhost 139715480606464 0  0 [CACHE] LOG:  set metadata cache  size(134217728)
2021-09-13 10:34:33.427 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  gaussdb: fsync file "/gaussdb/data/db1/gaussdb.state.temp" success
2021-09-13 10:34:33.427 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  create gaussdb state file success: db state(STARTING_STATE), server mode(Standby)
2021-09-13 10:34:33.450 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  max_safe_fds = 977, usable_fds = 1000, already_open = 13
bbox_dump_path is set to /gaussdb/corefile/
2021-09-13 10:34:33.453 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  the configure file /gaussdb/app/etc/gscgroup_omm.cfg doesn't exist or the size of configure file has changed. Please create it by root user!
2021-09-13 10:34:33.453 [unknown] [unknown] localhost 139715480606464 0  0 [BACKEND] LOG:  Failed to parse cgroup config file.

[2021-09-13 10:34:34.128][20469][][gs_ctl]:  done
[2021-09-13 10:34:34.128][20469][][gs_ctl]: server started (/gaussdb/data/db1)

执行检查感觉遇到命令的bug,语法没错,但报错了,很奇怪。
image.png
image.png
添加PGDATA环境变量后,执行gs_ctl query正常。
检查备库状态正常:

[omm@omm03 db1]$  gs_ctl query
[2021-09-13 10:46:21.394][48172][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Standby
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
No information 
 Receiver info:      
        receiver_pid                   : 20574
        local_role                     : Standby
        peer_role                      : Primary
        peer_state                     : Normal
        state                          : Normal
        sender_sent_location           : 0/4002920
        sender_write_location          : 0/4002920
        sender_flush_location          : 0/4002920
        sender_replay_location         : 0/4002920
        receiver_received_location     : 0/4002920
        receiver_write_location        : 0/4002920
        receiver_flush_location        : 0/4002920
        receiver_replay_location       : 0/4002920
        sync_percent                   : 100%
        channel                        : 192.168.52.144:41396<--192.168.52.143:5433

检查主库状态正常:

[omm@omm02 ~]$ gs_ctl query
[2021-09-13 10:46:10.314][85603][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
        sender_pid                     : 3713
        local_role                     : Primary
        peer_role                      : Standby
        peer_state                     : Normal
        state                          : Streaming
        sender_sent_location           : 0/4002920
        sender_write_location          : 0/4002920
        sender_flush_location          : 0/4002920
        sender_replay_location         : 0/4002920
        receiver_received_location     : 0/4002920
        receiver_write_location        : 0/4002920
        receiver_flush_location        : 0/4002920
        receiver_replay_location       : 0/4002920
        sync_percent                   : 100%
        sync_state                     : Sync
        sync_priority                  : 1
        sync_most_available            : On
        channel                        : 192.168.52.143:5433-->192.168.52.144:41396

 Receiver info:      
No information

3.主备库切换测试:
3.1备库执行切换switchover

[omm@omm03 db1]$ gs_ctl switchover
[2021-09-13 10:49:25.008][25109][][gs_ctl]: gs_ctl switchover ,datadir is /gaussdb/data/db1 
[2021-09-13 10:49:25.008][25109][][gs_ctl]: switchover term (1)
[2021-09-13 10:49:25.015][25109][][gs_ctl]: waiting for server to switchover...............
[2021-09-13 10:49:37.106][25109][][gs_ctl]: done
[2021-09-13 10:49:37.106][25109][][gs_ctl]: switchover completed (/gaussdb/data/db1)
[omm@omm03 db1]$ gs_ctl query
[2021-09-13 10:50:10.682][46300][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Primary 
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
        sender_pid                     : 31573
        local_role                     : Primary
        peer_role                      : Standby
        peer_state                     : Normal
        state                          : Streaming
        sender_sent_location           : 0/4002E98
        sender_write_location          : 0/4002E98
        sender_flush_location          : 0/4002E98
        sender_replay_location         : 0/4002E98
        receiver_received_location     : 0/4002E98
        receiver_write_location        : 0/4002E98
        receiver_flush_location        : 0/4002E98
        receiver_replay_location       : 0/4002E98
        sync_percent                   : 100%
        sync_state                     : Sync
        sync_priority                  : 1
        sync_most_available            : On
        channel                        : 192.168.52.144:5433-->192.168.52.143:43076

 Receiver info:      
No information

3.1.1测试插入,主库可插入正常:

[omm@omm03 db1]$ gsql -d postgres
gsql ((openGauss 2.0.1 build d97c0e8a) compiled at 2021-06-02 19:37:17 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

postgres=# \d
                        List of relations
 Schema | Name | Type  | Owner |             Storage              
--------+------+-------+-------+----------------------------------
 public | test | table | omm   | {orientation=row,compression=no}
(1 row)

postgres=# insert into test values(2);
INSERT 0 1
postgres=# select * from test;
 id 
----
  1
  2
(2 rows)

postgres=# insert into test values(3);
INSERT 0 1
postgres=# select * from test;
 id 
----
  1
  2
  3
(3 rows)

postgres=# \q

3.1.2检查主库自动变为备库:

[omm@omm02 ~]$ gs_ctl query
[2021-09-13 10:50:04.313][67581][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Standby
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
No information 
 Receiver info:      
        receiver_pid                   : 49375
        local_role                     : Standby
        peer_role                      : Primary
        peer_state                     : Normal
        state                          : Normal
        sender_sent_location           : 0/4002E98
        sender_write_location          : 0/4002E98
        sender_flush_location          : 0/4002E98
        sender_replay_location         : 0/4002E98
        receiver_received_location     : 0/4002E98
        receiver_write_location        : 0/4002E98
        receiver_flush_location        : 0/4002E98
        receiver_replay_location       : 0/4002E98
        sync_percent                   : 100%
        channel                        : 192.168.52.143:43076<--192.168.52.144:5433

3.1.3测试插入报只读,并且自动同步备库记录:

[omm@omm02 ~]$ gsql -d postgres
gsql ((openGauss 2.0.1 build d97c0e8a) compiled at 2021-06-02 19:37:17 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

postgres=# \d
                        List of relations
 Schema | Name | Type  | Owner |             Storage              
--------+------+-------+-------+----------------------------------
 public | test | table | omm   | {orientation=row,compression=no}
(1 row)

postgres=# select * from test;
 id 
----
  1
  2
(2 rows)

postgres=# insert into test values(3);
ERROR:  cannot execute INSERT in a read-only transaction
postgres=# select * from test;
 id 
----
  1
  2
  3  从主库自动同步过来
(3 rows)

postgres=# \q

多次主备切换均正常。注意点是在备库主机上做gs_ctl switchover

3.2备库执行failover切换
3.2.1执行failover后主备关系破坏

[omm@omm03 db1]$ gs_ctl failover
[2021-09-13 10:53:42.390][22834][][gs_ctl]: gs_ctl failover ,datadir is /gaussdb/data/db1 
[2021-09-13 10:53:42.390][22834][][gs_ctl]: failover term (1)
[2021-09-13 10:53:42.398][22834][][gs_ctl]:  waiting for server to failover...
.[2021-09-13 10:53:43.414][22834][][gs_ctl]:  done
[2021-09-13 10:53:43.414][22834][][gs_ctl]:  failover completed (/gaussdb/data/db1)
[omm@omm03 db1]$ gs_ctl query
[2021-09-13 10:53:45.568][24306][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
No information 
 Receiver info:      
No information

此时检查主库状态,发现都是Primary了,failover作为故障切换,使得原先的主备关系被破坏了:

[omm@omm02 ~]$ gs_ctl query
[2021-09-13 10:53:49.538][51634][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
No information 
 Receiver info:      
No information

3.2.2尝试将原主库启动为备库模式,提示需要修复:

[omm@omm02 ~]$ gs_ctl start -D /gaussdb/data/db1 -M standby
[2021-09-13 10:55:39.121][91485][][gs_ctl]: gs_ctl started,datadir is /gaussdb/data/db1 
[2021-09-13 10:55:39.126][91485][][gs_ctl]:  another server might be running; Please use the restart command
[omm@omm02 ~]$ gs_ctl restart -D /gaussdb/data/db1 -M standby
[2021-09-13 10:55:45.753][93714][][gs_ctl]: gs_ctl restarted ,datadir is /gaussdb/data/db1 
waiting for server to shut down... done
server stopped
[2021-09-13 10:55:46.760][93714][][gs_ctl]: waiting for server to start...
.0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

0 LOG:  [Alarm Module]Host Name: omm02 

0 LOG:  [Alarm Module]Host IP: 192.168.52.143 

0 LOG:  [Alarm Module]Cluster Name: dbCluster 

0 LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52

0 WARNING:  failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING:  failed to parse feature control file: gaussdb.version.
0 WARNING:  Failed to load the product control file, so gaussdb cannot distinguish product version.
0 LOG:  Failed to initialze environment for codegen.
The core dump path is an invalid directory
2021-09-13 10:55:47.003 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  when starting as multi_standby mode, we couldn't support data replicaton.
2021-09-13 10:55:47.003 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

2021-09-13 10:55:47.003 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  [Alarm Module]Host Name: omm02 

2021-09-13 10:55:47.003 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  [Alarm Module]Host IP: 192.168.52.143 

2021-09-13 10:55:47.003 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  [Alarm Module]Cluster Name: dbCluster 

2021-09-13 10:55:47.003 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52

2021-09-13 10:55:47.003 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  Transparent encryption disabled.

2021-09-13 10:55:47.006 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  loaded library "security_plugin"
2021-09-13 10:55:47.008 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2021-09-13 10:55:47.008 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  reserved memory for backend threads is: 220 MB
2021-09-13 10:55:47.008 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  reserved memory for WAL buffers is: 128 MB
2021-09-13 10:55:47.008 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  Set max backend reserve memory is: 348 MB, max dynamic memory is: 11097 MB
2021-09-13 10:55:47.008 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  shared memory 330 Mbytes, memory context 11445 Mbytes, max process memory 12288 Mbytes
2021-09-13 10:55:47.033 [unknown] [unknown] localhost 140477628577536 0  0 [CACHE] LOG:  set data cache  size(402653184)
2021-09-13 10:55:47.047 [unknown] [unknown] localhost 140477628577536 0  0 [CACHE] LOG:  set metadata cache  size(134217728)
2021-09-13 10:55:47.095 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  gaussdb: fsync file "/gaussdb/data/db1/gaussdb.state.temp" success
2021-09-13 10:55:47.096 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  create gaussdb state file success: db state(STARTING_STATE), server mode(Standby)
2021-09-13 10:55:47.118 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  max_safe_fds = 977, usable_fds = 1000, already_open = 13
The core dump path is an invalid directory
2021-09-13 10:55:47.121 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  the configure file /gaussdb/app/etc/gscgroup_omm.cfg doesn't exist or the size of configure file has changed. Please create it by root user!
2021-09-13 10:55:47.121 [unknown] [unknown] localhost 140477628577536 0  0 [BACKEND] LOG:  Failed to parse cgroup config file.

[2021-09-13 10:55:47.773][93714][][gs_ctl]:  done
[2021-09-13 10:55:47.773][93714][][gs_ctl]: server started (/gaussdb/data/db1)
[omm@omm02 ~]$ gs_ctl query
[2021-09-13 10:55:50.840][95773][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Standby
        static_connections             : 1
        db_state                       : Need repair
        detail_information             : WAL segment removed

 Senders info:       
No information 
 Receiver info:      
No information


3.2.3此时采用build方式一般可恢复主备关系:

[omm@omm02 ~]$ gs_ctl -D /gaussdb/data/db1 build
[2021-09-13 10:56:37.999][122004][][gs_ctl]: gs_ctl incremental build ,datadir is /gaussdb/data/db1
waiting for server to shut down............. done
server stopped
[2021-09-13 10:56:48.025][122004][][gs_ctl]:  fopen build pid file "/gaussdb/data/db1/gs_build.pid" success
[2021-09-13 10:56:48.025][122004][][gs_ctl]:  fprintf build pid file "/gaussdb/data/db1/gs_build.pid" success
[2021-09-13 10:56:48.025][122004][][gs_ctl]:  fsync build pid file "/gaussdb/data/db1/gs_build.pid" success
[2021-09-13 10:56:48.034][122004][omm02][gs_rewind]: set gaussdb state file when rewind:db state(BUILDING_STATE), server mode(STANDBY_MODE), build mode(INC_BUILD).
[2021-09-13 10:56:48.048][122004][omm02][gs_rewind]: connected to server: host=192.168.52.144 port=5433 dbname=postgres application_name=gs_rewind connect_timeout=5
[2021-09-13 10:56:48.053][122004][omm02][gs_rewind]: connect to primary success
[2021-09-13 10:56:48.054][122004][omm02][gs_rewind]: get pg_control success
[2021-09-13 10:56:48.054][122004][omm02][gs_rewind]: target server was interrupted in mode 2.
[2021-09-13 10:56:48.054][122004][omm02][gs_rewind]: sanityChecks success
[2021-09-13 10:56:48.054][122004][omm02][gs_rewind]: find last checkpoint at 0/4004FB0 and checkpoint redo at 0/4004F30 from source control file
[2021-09-13 10:56:48.054][122004][omm02][gs_rewind]: find last checkpoint at 0/4003758 and checkpoint redo at 0/40036D8 from target control file
[2021-09-13 10:56:48.055][122004][omm02][gs_rewind]: find max lsn success, find max lsn rec (0/4003758) success.

[2021-09-13 10:56:48.079][122004][omm02][gs_rewind]: request lsn is 0/4003758 and its crc(source, target):[0, 2405112170]
[2021-09-13 10:56:48.085][122004][omm02][gs_rewind]: request lsn is 0/4003640 and its crc(source, target):[1633193886, 3048580869]
[2021-09-13 10:56:48.093][122004][omm02][gs_rewind]: request lsn is 0/4003528 and its crc(source, target):[4203906556, 780119399]
[2021-09-13 10:56:48.099][122004][omm02][gs_rewind]: request lsn is 0/4003410 and its crc(source, target):[1661964706, 1661964706]
[2021-09-13 10:56:48.099][122004][omm02][gs_rewind]: find common checkpoint 0/4003410
[2021-09-13 10:56:48.099][122004][omm02][gs_rewind]: find diverge point success
[2021-09-13 10:56:48.099][122004][omm02][gs_rewind]: read checkpoint redo (0/4003390) success before rewinding.
[2021-09-13 10:56:48.099][122004][omm02][gs_rewind]: rewinding from checkpoint redo point at 0/4003390 on timeline 1
[2021-09-13 10:56:48.099][122004][omm02][gs_rewind]: diverge xlogfile is 000000010000000000000004, older ones will not be copied or removed.
[2021-09-13 10:56:48.101][122004][omm02][gs_rewind]: targetFileStatThread success pid 140385912338176.
[2021-09-13 10:56:48.101][122004][omm02][gs_rewind]: reading source file list
[2021-09-13 10:56:48.101][122004][omm02][gs_rewind]: traverse_datadir start.
[2021-09-13 10:56:48.104][122004][omm02][gs_rewind]: filemap_list_to_array start.
[2021-09-13 10:56:48.104][122004][omm02][gs_rewind]: filemap_list_to_array end sort start. length is 1273 
[2021-09-13 10:56:48.104][122004][omm02][gs_rewind]: sort end.
[2021-09-13 10:56:48.111][122004][omm02][gs_rewind]: targetFileStatThread return success.
[2021-09-13 10:56:48.116][122004][omm02][gs_rewind]: reading target file list
[2021-09-13 10:56:48.117][122004][omm02][gs_rewind]: traverse target datadir success
[2021-09-13 10:56:48.117][122004][omm02][gs_rewind]: reading WAL in target
[2021-09-13 10:56:48.117][122004][omm02][gs_rewind]: could not read WAL record at 0/40037F0: invalid record length at 0/40037F0: wanted 32, got 0
[2021-09-13 10:56:48.118][122004][omm02][gs_rewind]: calculate totals rewind success
[2021-09-13 10:56:48.118][122004][omm02][gs_rewind]: need to copy 203MB (total source directory size is 231MB)
[2021-09-13 10:56:48.118][122004][omm02][gs_rewind]: starting background WAL receiver
[2021-09-13 10:56:48.118][122004][omm02][gs_rewind]: Starting copy xlog, start point: 0/4003390
[2021-09-13 10:56:48.118][122004][omm02][gs_rewind]: in gs_rewind proecess,so no need remove.
[2021-09-13 10:56:48.126][122004][omm02][gs_rewind]:  check identify system success
[2021-09-13 10:56:48.126][122004][omm02][gs_rewind]:  send START_REPLICATION 0/4000000 success
[2021-09-13 10:56:48.171][122004][omm02][gs_rewind]: receiving and unpacking files...
[2021-09-13 10:56:49.171][122004][omm02][gs_rewind]: execute file map success
[2021-09-13 10:56:49.173][122004][omm02][gs_rewind]: find minRecoveryPoint success from xlog insert location 0/4009438
[2021-09-13 10:56:49.173][122004][omm02][gs_rewind]: update pg_control file success, minRecoveryPoint: 0/4009438, ckpLoc:0/4004FB0, ckpRedo:0/4004F30, preCkp:0/4003640
[2021-09-13 10:56:49.192][122004][omm02][gs_rewind]: update pg_dw file success
[2021-09-13 10:56:49.193][122004][omm02][gs_rewind]: xlog end point: 0/4009438
[2021-09-13 10:56:49.193][122004][omm02][gs_rewind]: waiting for background process to finish streaming...
[2021-09-13 10:56:53.146][122004][omm02][gs_rewind]: creating backup label and updating control file
[2021-09-13 10:56:53.146][122004][omm02][gs_rewind]: create backup label success
[2021-09-13 10:56:53.147][122004][omm02][gs_rewind]: read checkpoint redo (0/4003390) success.
[2021-09-13 10:56:53.147][122004][omm02][gs_rewind]: read checkpoint rec (0/4003410) success.
[2021-09-13 10:56:53.147][122004][omm02][gs_rewind]: dn incremental build completed.
[2021-09-13 10:56:53.153][122004][omm02][gs_rewind]: fetching MOT checkpoint
[2021-09-13 10:56:53.264][122004][omm02][gs_ctl]: waiting for server to start...
.0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

0 LOG:  [Alarm Module]Host Name: omm02 

0 LOG:  [Alarm Module]Host IP: 192.168.52.143 

0 LOG:  [Alarm Module]Cluster Name: dbCluster 

0 LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52

0 WARNING:  failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING:  failed to parse feature control file: gaussdb.version.
0 WARNING:  Failed to load the product control file, so gaussdb cannot distinguish product version.
0 LOG:  Failed to initialze environment for codegen.
The core dump path is an invalid directory
2021-09-13 10:56:53.499 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  when starting as multi_standby mode, we couldn't support data replicaton.
2021-09-13 10:56:53.499 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

2021-09-13 10:56:53.499 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  [Alarm Module]Host Name: omm02 

2021-09-13 10:56:53.499 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  [Alarm Module]Host IP: 192.168.52.143 

2021-09-13 10:56:53.499 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  [Alarm Module]Cluster Name: dbCluster 

2021-09-13 10:56:53.499 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 52

2021-09-13 10:56:53.499 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  Transparent encryption disabled.

2021-09-13 10:56:53.502 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  loaded library "security_plugin"
2021-09-13 10:56:53.503 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2021-09-13 10:56:53.503 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  reserved memory for backend threads is: 220 MB
2021-09-13 10:56:53.503 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  reserved memory for WAL buffers is: 128 MB
2021-09-13 10:56:53.503 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  Set max backend reserve memory is: 348 MB, max dynamic memory is: 11097 MB
2021-09-13 10:56:53.503 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  shared memory 330 Mbytes, memory context 11445 Mbytes, max process memory 12288 Mbytes
2021-09-13 10:56:53.525 [unknown] [unknown] localhost 140648515995392 0  0 [CACHE] LOG:  set data cache  size(402653184)
2021-09-13 10:56:53.539 [unknown] [unknown] localhost 140648515995392 0  0 [CACHE] LOG:  set metadata cache  size(134217728)
2021-09-13 10:56:53.585 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  gaussdb: fsync file "/gaussdb/data/db1/gaussdb.state.temp" success
2021-09-13 10:56:53.585 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  create gaussdb state file success: db state(STARTING_STATE), server mode(Standby)
2021-09-13 10:56:53.609 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  max_safe_fds = 975, usable_fds = 1000, already_open = 15
The core dump path is an invalid directory
2021-09-13 10:56:53.611 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  the configure file /gaussdb/app/etc/gscgroup_omm.cfg doesn't exist or the size of configure file has changed. Please create it by root user!
2021-09-13 10:56:53.612 [unknown] [unknown] localhost 140648515995392 0  0 [BACKEND] LOG:  Failed to parse cgroup config file.

[2021-09-13 10:56:54.276][122004][omm02][gs_ctl]:  done
[2021-09-13 10:56:54.276][122004][omm02][gs_ctl]: server started (/gaussdb/data/db1)
[2021-09-13 10:56:54.276][122004][omm02][gs_ctl]:  fopen build pid file "/gaussdb/data/db1/gs_build.pid" success
[2021-09-13 10:56:54.276][122004][omm02][gs_ctl]:  fprintf build pid file "/gaussdb/data/db1/gs_build.pid" success
[2021-09-13 10:56:54.277][122004][omm02][gs_ctl]:  fsync build pid file "/gaussdb/data/db1/gs_build.pid" success

3.2.4检查备库状态,发现已恢复:

[omm@omm02 ~]$ gs_ctl query
[2021-09-13 10:56:58.929][124649][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Standby
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
No information 
 Receiver info:      
        receiver_pid                   : 122726
        local_role                     : Standby
        peer_role                      : Primary
        peer_state                     : Normal
        state                          : Normal
        sender_sent_location           : 0/400A678
        sender_write_location          : 0/400A678
        sender_flush_location          : 0/400A678
        sender_replay_location         : 0/400A678
        receiver_received_location     : 0/400A678
        receiver_write_location        : 0/400A678
        receiver_flush_location        : 0/400A678
        receiver_replay_location       : 0/400A678
        sync_percent                   : 100%
        channel                        : 192.168.52.143:43736<--192.168.52.144:5433

3.2.5主库状态也恢复主备正常关系:

[omm@omm03 db1]$ gs_ctl query
[2021-09-13 10:57:04.337][116412][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
        sender_pid                     : 112663
        local_role                     : Primary
        peer_role                      : Standby
        peer_state                     : Normal
        state                          : Streaming
        sender_sent_location           : 0/400A678
        sender_write_location          : 0/400A678
        sender_flush_location          : 0/400A678
        sender_replay_location         : 0/400A678
        receiver_received_location     : 0/400A678
        receiver_write_location        : 0/400A678
        receiver_flush_location        : 0/400A678
        receiver_replay_location       : 0/400A678
        sync_percent                   : 100%
        sync_state                     : Sync
        sync_priority                  : 1
        sync_most_available            : On
        channel                        : 192.168.52.144:5433-->192.168.52.143:43736

 Receiver info:      
No information 
[omm@omm03 db1]$ gsql -d postgres -c "\d"
                         List of relations
 Schema | Name  | Type  | Owner |             Storage              
--------+-------+-------+-------+----------------------------------
 public | test  | table | omm   | {orientation=row,compression=no}
 public | test2 | table | omm   | {orientation=row,compression=no}
(2 rows)

[omm@omm03 db1]$ gsql -d postgres -c "insert into test2 values(3);"
INSERT 0 1

4.遇到问题:FATAL: no pg_hba.conf entry for host “192.168.52.144”, user “omm”, database “postgres”, SSL off

[omm@omm03 db1]$ gs_ctl query
[2021-09-13 10:42:19.338][37450][][gs_ctl]: gs_ctl query ,datadir is /gaussdb/data/db1 
 HA state:
        local_role                    : Standby
        static_connections            : 1
        db_state                      : Normal
        detail_information            : FATAL:  no pg_hba.conf entry for host "192.168.52.144", user "omm", database "postgres", SSL off
FATAL:  no pg_hba.conf entry for host "192.168.52.144", user "omm", database "postgres", SSL off

        sync_mode                     : Async

 Senders info:
        No information
 Receiver info:
        No information

解决办法:前面有说了,修改两节点pg_hba.conf,添加相应配置即可。

host    all             omm             192.168.52.0/24         trust
host    replication     omm        192.168.52.0/24            trust

一点体会:openGauss基于postgresql开发,数据库管理命令基本一致,但其主备切换做得非常不错,两条简单命令就搞定,很方便切换,明显优于postgresql本身。

最后修改时间:2021-09-13 14:53:56
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
1人已赞赏
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论