KingbaseES（KES）V9 RWC集群在线扩缩容

原创飞天 2025-02-27

110

一、在线扩缩容介绍

KingbaseES 提供数据库扩缩容工具进行数据库集群的在线扩缩容。对于不支持 GUI 的服务器，KingbaseES 提供基于命令行操作的集群扩缩容方式，本文主要介绍如何使用命令行的方式对KES V9 RWC集群进行在线扩缩容。

一主一备rwc集群部署请参考：KingbaseES（KES）V9 RWC集群部署实战

二、KES V9 RWC集群环境说明

目前已存在如下一主一备两节点的rwc集群环境：

主机名	ip地址	OS版本	内存、CPU	节点角色	数据库端口	集群软件安装目录	数据目录
node1	192...60	Centos7.9	4G 、 1个双核	主节点	54321	/opt/kes/v9	/data/cluster
node2	192...62	Centos7.9	4G 、 1个双核	备节点	54321	/opt/kes/v9	/data/cluster

集群vip地址: 192.*.*.64

三、KES V9 RWC集群扩容

需求：需要把node3节点加入到现有的一主一备两节点rwc集群中：

主机名	ip地址	OS版本	内存、CPU	节点角色	数据库端口	集群软件安装目录	数据目录
node3	192...66	Centos7.9	4G 、 1个双核	主节点	54321	/opt/kes/v9	/data/cluster

详细扩容步骤

1、准备待扩容节点node3的操作系统环境
参考KingbaseES（KES）V9 RWC集群部署实战中的 <安装前环境准备> 章节。

【注意】要在三台主机node1、node2、node3的/etc/hosts文件中加入node3的信息：

192.*.*.66 node3

2、在node1或node2上检查现有集群状态

[kingbase@node1 ~]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 77003 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 43158 | no      | 1 second(s) ago

2、准备扩容所需文件
从现有集群的主节点node1上获取文件：db.zip、license.dat、install.conf、cluster_install.sh 和 trust_cluster.sh，拷贝到待扩容主机node3上。

#进入{kes软件安装目录}/KESRealPro/V009R001C002B0014/ClientTools/guitools/DeployTools/zip目录
[root@node1 ~]# cd /opt/kes/v9/KESRealPro/V009R001C002B0014/ClientTools/guitools/DeployTools/zip/
[root@node1 zip]# ll
total 322412
-rwxrwxr-x 1 kingbase kingbase    252402 Sep 23 18:41 cluster_install.sh
-rw-rw-r-- 1 kingbase kingbase 327258132 Sep 23 18:41 db.zip
-rw-rw-r-- 1 kingbase kingbase     19580 Jan 18 12:24 install.conf
-rw-rw-r-- 1 kingbase kingbase      3676 Jan 18 11:47 license.dat
-rw-rw-r-- 1 kingbase kingbase   2595145 Sep 23 18:41 securecmdd.zip
-rwxrwxr-x 1 kingbase kingbase      9677 Sep 23 18:41 trust_cluster.sh
[root@node1 zip]# 
# 拷贝文件到待扩容节点node3
[root@node1 zip]# scp * node3:/soft
#登录node3修改扩容需要文件的权限：
[root@node3 ~]# chown -R kingbase:kingbase /soft/*
[root@node3 ~]# ll /soft/*
total 322432
-rwxr-xr-x 1 kingbase kingbase    252402 Feb 27 17:59 cluster_install.sh
-rw-r--r-- 1 kingbase kingbase 327258132 Feb 27 17:59 db.zip
-rw-r--r-- 1 kingbase kingbase     19678 Feb 27 20:40 install.conf
-rw-r--r-- 1 kingbase kingbase      3676 Feb 27 17:59 license.dat
-rw-r--r-- 1 kingbase kingbase   2595145 Feb 27 17:59 securecmdd.zip
-rwxr-xr-x 1 kingbase kingbase      9677 Feb 27 17:59 trust_cluster.sh

以下操作都在待扩容节点node3上进行。

3、配置 install.conf 文件
3.1 编辑 install.conf 中 install 标签下的参数

在all_ip所在的行加入待扩容主机node3的ip

[root@node3 ~]# cd /soft
[root@node3 soft]# vi install.conf
#在all_ip所在的行加入待扩容主机node3的ip：
192.*.*.66

3.2 编辑 install.conf 中 expand 标签下的参数

[expand]
expand_type="0"                   # The node type of standby/witness node, which would be add to cluster. 0:standby  1:witness
primary_ip="192.*.*.60"                    # The ip addr of cluster primary node, which need to expand a standby/witness node.
expand_ip="192.*.*.66"                     # The ip addr of standby/witness node, which would be add to cluster.
node_id="3"                       # The node_id of standby/witness node, which would be add to cluster. It does not the same with any one in  cluster node
                                 # for example: node_id="3"
sync_type=""                     # the sync_type parameter is used to specify the sync type for expand node. 0:sync 1:potential 2:async
                                 # this parameter is only valid when expand_type="0" and the synchronous parameter of the cluster is set to custom mode.

## Specific instructions ,see it under [install]
install_dir="/opt/kes/v9"                   # the last layer of directory could not add '/'
zip_package="/soft/db.zip"
net_device=(ens33)                    # if virtual_ip set,it must be set
net_device_ip=(192.*.*.66)                 # if virtual_ip set,it must be set
license_file=(license.dat)
deploy_by_sshd="1"
ssh_port="22"
scmd_port="8890"

【注意】如需修改 ssh 连接端口，先修改 install.conf 文件中 ssh_port 项的值，然后修改系统/etc/ssh/sshd_config 文件中的 Port 项的值，最后需要重启 sshd 服务。

4、配置ssh免密
在待扩容主机node3上配置各节点 root 与kingbase用户的免密。操作如下：

#配置ssh免密
[root@node3 soft]# ./trust_cluster.sh

5、集群扩容
使用root用户或者kingbase用户都可以扩容成功，本文中使用kingbase用户执行”cluster_install.sh expand” 命令进行扩容，脚本将按照配置自动完成集群扩容工作。
【注意】在扩容过程中会自动创建集群安装目录/opt/kes/v9，而kingbase用户默认没有在/opt目录创建文件的权限，因此需提前创建/opt/kes目录并修改权限为kingbase:kingbase。如果使用root用户扩容则不需要提前创建目录/opt/kes。

root用户创建目录并授权：

[root@node3 ~]# mkdir /opt/kes
[root@node3 ~]# chown -R kingbase:kingbase /opt/kes

使用kingbase用户执行扩容操作：

[kingbase@node3 soft]$ ./cluster_install.sh expand

扩容日志如下：

[kingbase@node3 soft]$ ./cluster_install.sh expand
[CONFIG_CHECK] will deploy the cluster of 
[RUNNING] success connect to the target "192.*.*.66" ..... OK
[RUNNING] success connect to "192.*.*.66" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.*.*.60" ..... OK
[RUNNING] success connect to "192.*.*.60" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.*.*.60 ...
[RUNNING] Primary node ip is 192.*.*.60 ... OK
[CONFIG_CHECK] set install_with_root=1
[RUNNING] success connect to the target "192.*.*.66" ..... OK
[RUNNING] success connect to "192.*.*.66" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.*.*.60" ..... OK
[RUNNING] success connect to "192.*.*.60" from current node by 'ssh' ..... OK
[INSTALL] load config from cluster.....
 [INFO] db_user=system
 [INFO] db_port=54321
 [INFO] use_scmd=1
 [INFO] data_directory=/data/cluster
 [INFO] scmd_port=8890
 [INFO] recovery=standby
 [INFO] use_check_disk=off
./cluster_install.sh: line 4981: 192.*.*.62: command not found
 [INFO] trusted_servers=192.*.*.60 192.*.*.62
 [INFO] virtual_ip=192.*.*.64/24
 [INFO] ipaddr_path=/usr/sbin
 [INFO] ping_path=/usr/bin
 [INFO] arping_path=/opt/kes/bin
 [INFO] reconnect_attempts=10
 [INFO] reconnect_interval=6
 [INFO] auto_cluster_recovery_level=1
 [INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] success to access license_file: /soft/license.dat
[CONFIG_CHECK] file format is correct ... OK
[CONFIG_CHECK] check database connection ... 
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] expand_ip[192.*.*.66] is not used in the cluster ...
[CONFIG_CHECK] expand_ip[192.*.*.66] is not used in the cluster ...ok
[CONFIG_CHECK] The localhost is expand_ip:[192.*.*.66] ...
[CONFIG_CHECK] The localhost is expand_ip:[192.*.*.66] ...ok
[CONFIG_CHECK] check node_id is in cluster ... 
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] check the db is running or not...
[RUNNING] the db is not running on "192.*.*.66:54321" ..... OK
[RUNNING] the install dir is not exist on "192.*.*.66" ..... OK
[RUNNING] check the sys_securecmdd is running or not...
[RUNNING] the sys_securecmdd is not running on "192.*.*.66:8890" ..... OK
[CONFIG_CHECK] The virtual ip [192.*.*.64] exists on primary host [192.*.*.60].....
[CONFIG_CHECK] The virtual ip [192.*.*.64] exists on primary host [192.*.*.60].....OK
[CONFIG_CHECK] The net_device_ip:[192.*.*.66] exists on dev ens33 on [192.*.*.66].....
[CONFIG_CHECK] The net_device_ip:[192.*.*.66] exists on host "192.*.*.66" on dev ens33 .....OK
 [INFO] use_ssl=0
2025-02-27 21:12:21 [INFO] start to check system parameters on 192.*.*.66 ...
2025-02-27 21:12:21 [WARNING] [GSSAPIAuthentication] yes (should be: no) on 192.*.*.66
2025-02-27 21:12:21 [INFO] [UseDNS] is null on 192.*.*.66
2025-02-27 21:12:22 [INFO] [UsePAM] yes  on 192.*.*.66
2025-02-27 21:12:22 [INFO] [ulimit.open files] 65536 on 192.*.*.66
2025-02-27 21:12:22 [INFO] [ulimit.open proc] 65536 on 192.*.*.66
2025-02-27 21:12:22 [INFO] [ulimit.core size] unlimited on 192.*.*.66
2025-02-27 21:12:22 [INFO] [ulimit.mem lock] 50000000 on 192.*.*.66
2025-02-27 21:12:23 [INFO] [kernel.sem] 5010 641280 5010 256 on 192.*.*.66
2025-02-27 21:12:23 [INFO] [RemoveIPC] no on 192.*.*.66
2025-02-27 21:12:23 [INFO] [DefaultTasksAccounting] no on 192.*.*.66
2025-02-27 21:12:23 [INFO] write file "/etc/udev/rules.d/kingbase.rules" on 192.*.*.66
2025-02-27 21:12:24 [INFO] [crontab] chmod /usr/bin/crontab ...
2025-02-27 21:12:24 [INFO] [crontab] chmod /usr/bin/crontab ... Done
2025-02-27 21:12:24 [INFO] [crontab access] OK
2025-02-27 21:12:25 [INFO] [cron.deny] kingbase not exists in cron.deny
2025-02-27 21:12:25 [INFO] [crontab auth] crontab is accessible by kingbase now on 192.*.*.66
2025-02-27 21:12:25 [INFO] [SELINUX] disabled on 192.*.*.66
2025-02-27 21:12:26 [INFO] [firewall] down on 192.*.*.66
2025-02-27 21:12:26 [INFO] [The memory] OK on 192.*.*.66
2025-02-27 21:12:26 [INFO] [The hard disk] OK on 192.*.*.66
2025-02-27 21:12:26 [INFO] [ping] chmod /usr/bin/ping ...
2025-02-27 21:12:26 [INFO] [ping] chmod /usr/bin/ping ... Done
2025-02-27 21:12:27 [INFO] [ping access] OK
2025-02-27 21:12:27 [INFO] [/bin/cp --version] on 192.*.*.66 OK
2025-02-27 21:12:27 [INFO] [ip command path] on 192.*.*.66 OK
[INSTALL] create the install dir "/opt/kes/v9/kingbase" on 192.*.*.66 ...
[INSTALL] success to create the install dir "/opt/kes/v9/kingbase" on "192.*.*.66" ..... OK
[INSTALL] try to copy the zip package "/soft/db.zip" to /opt/kes/v9/kingbase of "192.*.*.66" .....
[INSTALL] success to scp the zip package "/soft/db.zip" /opt/kes/v9/kingbase of to "192.*.*.66" ..... OK
[INSTALL] decompress the "/opt/kes/v9/kingbase" to "/opt/kes/v9/kingbase" on 192.*.*.66
[INSTALL] success to decompress the "/opt/kes/v9/kingbase/db.zip" to "/opt/kes/v9/kingbase" on "192.*.*.66"..... OK
[RUNNING] chmod u+s and a+x for "/usr/sbin" and "/opt/kes/bin" on 192.*.*.66
[RUNNING] chmod u+s and a+x /usr/sbin/ip on "192.*.*.66" ..... OK
[RUNNING] chmod u+s and a+x /opt/kes/bin/arping on "192.*.*.66" ..... OK
[INSTALL] check license_file "license.dat"
[INSTALL] Scp license to /opt/kes/v9/kingbase/../license.dat on 192.*.*.66
[INSTALL] success to copy /soft/license.dat to /opt/kes/v9/kingbase/../ on 192.*.*.66
[RUNNING] config sys_securecmdd and start it ...
[RUNNING] config the sys_securecmdd port to 8890 ...
[RUNNING] success to config the sys_securecmdd port on 192.*.*.66 ... OK
successfully initialized the sys_securecmdd, please use "/opt/kes/v9/kingbase/bin/sys_HAscmdd.sh start" to start the sys_securecmdd
[RUNNING] success to config sys_securecmdd on 192.*.*.66 ... OK
Created symlink from /etc/systemd/system/multi-user.target.wants/securecmdd.service to /etc/systemd/system/securecmdd.service.
[RUNNING] success to start sys_securecmdd on 192.*.*.66 ... OK
[INSTALL] success to access file: /opt/kes/v9/kingbase/etc/all_nodes_tools.conf
[INSTALL] success to scp the /opt/kes/v9/kingbase/etc/repmgr.conf from 192.*.*.60 to "192.*.*.66"..... ok
[INSTALL] success to scp the ~/.encpwd from 192.*.*.60 to "192.*.*.66"..... ok
[INSTALL] success to scp /opt/kes/v9/kingbase/etc/all_nodes_tools.conf from "192.*.*.60" to "192.*.*.66" ...ok
[INSTALL] success to chmod 600 the ~/.encpwd on 192.*.*.66..... ok
 [INFO] parameter_name=node_id
 [INFO] parameter_values='3'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*node_id[ ]*=/cnode_id='3'" /opt/kes/v9/kingbase/etc/repmgr.conf
 [INFO] parameter_name=node_name
 [INFO] parameter_values='node3'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*node_name[ ]*=/cnode_name='node3'" /opt/kes/v9/kingbase/etc/repmgr.conf
 [INFO] parameter_name=conninfo
 [INFO] parameter_values='host
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*conninfo[ ]*=/cconninfo='host=192.*.*.66 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'" /opt/kes/v9/kingbase/etc/repmgr.conf
 [INFO] parameter_name=ping_path
 [INFO] parameter_values='/usr/bin'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*ping_path[ ]*=/cping_path='/usr/bin'" /opt/kes/v9/kingbase/etc/repmgr.conf
 [INFO] parameter_name=net_device
 [INFO] parameter_values='ens33'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*net_device[ ]*=/cnet_device='ens33'" /opt/kes/v9/kingbase/etc/repmgr.conf
 [INFO] parameter_name=net_device_ip
 [INFO] parameter_values='192.*.*.66'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*net_device_ip[ ]*=/cnet_device_ip='192.*.*.66'" /opt/kes/v9/kingbase/etc/repmgr.conf
 [INFO] parameter_name=arping_path
 [INFO] parameter_values='/opt/kes/bin'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*arping_path[ ]*=/carping_path='/opt/kes/bin'" /opt/kes/v9/kingbase/etc/repmgr.conf
 [INFO] parameter_name=ipaddr_path
 [INFO] parameter_values='/usr/sbin'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*ipaddr_path[ ]*=/cipaddr_path='/usr/sbin'" /opt/kes/v9/kingbase/etc/repmgr.conf
[RUNNING] standby clone ...
[WARNING] following problems with command line parameters detected:
  -D/--sysdata will be ignored if a repmgr configuration file is provided
[NOTICE] destination directory "/data/cluster" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=192.*.*.60 user=esrep port=54321 dbname=esrep
[DETAIL] current installation size is 87 MB
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] checking and correcting permissions on existing directory "/data/cluster"
[INFO] creating replication slot as user "esrep"
[NOTICE] starting backup (using sys_basebackup)...
[INFO] executing:
  /opt/kes/v9/kingbase/bin/sys_basebackup -l "repmgr base backup"  -D /data/cluster -h 192.*.*.60 -p 54321 -U esrep -c fast -X stream -S repmgr_slot_3 
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /data/cluster start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"
[RUNNING] standby clone ...OK
[RUNNING] db start ...
waiting for server to start.... done
server started
[RUNNING] db start ...OK
[INFO] connecting to local node "node3" (ID: 3)
[INFO] connecting to primary database
[WARNING] --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
[INFO] standby registration complete
[NOTICE] standby node "node3" (ID: 3) successfully registered
2025-02-27 21:12:52 begin to start DB on "[localhost]".
2025-02-27 21:12:53 DB on "[localhost]" already started, connect to check it.
2025-02-27 21:12:54 DB on "[localhost]" start success.
2025-02-27 21:12:54 Ready to start local kbha daemon and repmgrd daemon ...
2025-02-27 21:12:54 begin to start repmgrd on "[localhost]".
[2025-02-27 21:12:55] [NOTICE] using provided configuration file "/opt/kes/v9/kingbase/bin/../etc/repmgr.conf"
[2025-02-27 21:12:55] [INFO] creating directory "/opt/kes/v9/kingbase/log"...
[2025-02-27 21:12:55] [NOTICE] redirecting logging output to "/opt/kes/v9/kingbase/log/hamgr.log"

2025-02-27 21:12:56 repmgrd on "[localhost]" start success.
[2025-02-27 21:12:58] [NOTICE] redirecting logging output to "/opt/kes/v9/kingbase/log/kbha.log"

2025-02-27 21:12:59 Done.
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.*.*.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.*.*.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.*.*.66 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] query archive command at 192.*.*.60 ...
[RUNNING] current cluster not config sys_rman,return.
[root@node3 soft]#

6、集群扩容结束后，查看集群状态

[kingbase@node3 soft]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 77003 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 43158 | no      | 1 second(s) ago    
 3  | node3 | standby |   running | node1    | running | 15704 | no      | 0 second(s) ago  

[kingbase@node3 soft]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.100.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.100.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.100.66 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@node3 soft]$

至此，节点node3成功加入集群中，集群状态正常。

四、KES V9 RWC集群缩容

需求：需要把node3节点从下面的rwc集群中删除：

主机名	ip地址	OS版本	内存、CPU	节点角色	数据库端口	集群软件安装目录	数据目录
node1	192...60	Centos7.9	4G 、 1个双核	主节点	54321	/opt/kes/v9	/data/cluster
node2	192...62	Centos7.9	4G 、 1个双核	备节点	54321	/opt/kes/v9	/data/cluster
node3	192...66	Centos7.9	4G 、 1个双核	备节点	54321	/opt/kes/v9	/data/cluster

集群vip地址: 192...64

详细缩容步骤

1、在任意节点上检查现有集群状态

[kingbase@node3 soft]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 77003 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 43158 | no      | 1 second(s) ago    
 3  | node3 | standby |   running | node1    | running | 15704 | no      | 1 second(s) ago    
[kingbase@node3 soft]$

以下操作都在待缩容节点node3上进行。
2、配置 install.conf 文件
2.1 编辑 install.conf 中 shrink 标签下的参数

[shrink]
shrink_type="standby"                   # The node type of standby/witness node, which would be delete from cluster. 0:standby  1:witness
primary_ip="192.168.100.60"                    # The ip addr of cluster primary node, which need to shrink a standby/witness node.
shrink_ip="192.168.100.66"                     # The ip addr of standby/witness node, which would be delete from cluster.
node_id="3"                       # The node_id of standby/witness node, which would be delete from cluster. It does not the same with any one in  cluster node
                                 # for example: node_id="3"
## Specific instructions ,see it under [install]
install_dir="/opt/kes/v9"                   # the last layer of directory could not add '/'
ssh_port="22"                    # the port of ssh, default is 22
scmd_port="8890"                 # the port of sys_securecmd, default is 8890

3、集群缩容
使用root用户或者kingbase用户都可以缩容成功，本文中使用kingbase用户执行”cluster_install.sh shrink” 命令进行缩容，脚本将按照配置自动完成集群缩容工作。

[kingbase@node3 soft]$  ./cluster_install.sh shrink
[CONFIG_CHECK] will deploy the cluster of 
[RUNNING] success connect to the target "192.*.*.66" ..... OK
[RUNNING] success connect to "192.*.*.66" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.*.*.60" ..... OK
[RUNNING] success connect to "192.*.*.60" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.*.*.60 ...
[RUNNING] Primary node ip is 192.*.*.60 ... OK
[CONFIG_CHECK] set install_with_root=1
[RUNNING] success connect to "" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.*.*.60" ..... OK
[RUNNING] success connect to "192.*.*.60" from current node by 'ssh' ..... OK
[INSTALL] load config from cluster.....
 [INFO] db_user=system
 [INFO] db_port=54321
 [INFO] use_scmd=1
 [INFO] auto_cluster_recovery_level=1
 [INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] check database connection ... 
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] shrink_ip[192.*.*.66] is a standby node IP in the cluster ...
[CONFIG_CHECK] shrink_ip[192.*.*.66] is a standby node IP in the cluster ...ok 
[CONFIG_CHECK] The localhost is shrink_ip:[192.*.*.66] or primary_ip:[192.*.*.60]...
[CONFIG_CHECK] The localhost is shrink_ip:[192.*.*.66] or primary_ip:[192.*.*.60]...ok
[RUNNING] Primary node ip is 192.*.*.60 ...
[RUNNING] Primary node ip is 192.*.*.60 ... OK
[CONFIG_CHECK] check node_id is in cluster ... 
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] The /opt/kes/v9/kingbase/bin dir exist on "192.*.*.66" ... 
[RUNNING] The /opt/kes/v9/kingbase/bin dir exist on "192.*.*.66" ... OK
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.*.*.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.*.*.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.*.*.66 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] Del node is standby ...
[INFO] node:192.*.*.66 can be deleted ... OK
[RUNNING] query archive command at 192.*.*.60 ...
[RUNNING] current cluster not config sys_rman,return.
[2025年 02月 27日 星期四 22:42:06 CST] [INFO] /opt/kes/v9/kingbase/bin/repmgr standby unregister --node-id=3 ...
[INFO] connecting to local standby
[INFO] connecting to primary database
[NOTICE] unregistering node 3
[INFO] SET synchronous TO "quorum" on primary host 
[INFO] change synchronous_standby_names from "ANY 1( node2,node3)" to "ANY 1( node2)"
[INFO] try to drop slot "repmgr_slot_3" of node 3 on primary node
[WARNING] replication slot "repmgr_slot_3" is still active on node 3
[INFO] standby unregistration complete
[2025年 02月 27日 星期四 22:42:07 CST] [INFO] /opt/kes/v9/kingbase/bin/repmgr standby unregister --node-id=3 ...OK
[2025年 02月 27日 星期四 22:42:07 CST] [INFO] check db connection ...
[2025年 02月 27日 星期四 22:42:07 CST] [INFO] check db connection ...ok
2025-02-27 22:42:07 Ready to stop local kbha daemon and repmgrd daemon ...
2025-02-27 22:42:11 begin to stop repmgrd on "[localhost]".
2025-02-27 22:42:12 repmgrd on "[localhost]" stop success.
2025-02-27 22:42:12 Done.
2025-02-27 22:42:12 begin to stop DB on "[localhost]".
waiting for server to shut down.... done
server stopped
2025-02-27 22:42:12 DB on "[localhost]" stop success.
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.*.*.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.*.*.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[2025年 02月 27日 星期四 22:42:12 CST] [INFO] drop replication slot:repmgr_slot_3...
 pg_drop_replication_slot 
--------------------------
 
(1 row)

[2025年 02月 27日 星期四 22:42:13 CST] [INFO] drop replication slot:repmgr_slot_3...OK
[2025年 02月 27日 星期四 22:42:13 CST] [INFO] modify synchronous parameter configuration...
[2025年 02月 27日 星期四 22:42:14 CST] [INFO] modify synchronous parameter configuration...ok
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.*.*.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.*.*.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@node3 soft]$

4、集群缩容结束后，查看集群状态
登录到node1或node2节点，检查集群状态：

[kingbase@node2 ~]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 77003 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 43158 | no      | 0 second(s) ago    
[kingbase@node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                      
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.100.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.100.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@node2 ~]$

至此，节点node3成功从集群中删除，集群状态正常。

五、参考文档

https://bbs.kingbase.com.cn/docHtml?recId=d16e9a1be637c8fe4644c2c82fe16444&url=aHR0cHM6Ly9iYnMua2luZ2Jhc2UuY29tLmNuL2tpbmdiYXNlLWRvYy92OS9oaWdobHkvYXZhaWxhYmlsaXR5L2luZGV4Lmh0bWw
详细路径：KingbaseES > 高可用 > 金仓数据守护集群和读写分离集群使用手册> 第7章日常运维管理> 7.5. 在线扩缩容章节

六、总结

KingbaseES（KES）V9 RWC集群在线扩缩容还是非常丝滑的，欢迎大家体验~~~

关于作者：
网名：飞天，墨天轮2024年度优秀原创作者，拥有 Oracle 10g OCM 认证、PGCE认证以及OBCA、KCP、ACP、磐维等众多国产数据库认证证书，目前从事Oracle、Mysql、PostgresSQL、磐维数据库管理运维工作，喜欢结交更多志同道合的朋友，热衷于研究、分享数据库技术。
微信公众号：飞天online
墨天轮：https://www.modb.pro/u/15197
如有任何疑问，欢迎大家留言，共同探讨~~~

墨力计划人大金仓

最后修改时间：2025-02-28 09:41:52

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

KINGBASE数据库（共7篇）

金仓数据库