一、在线扩缩容介绍
KingbaseES 提供数据库扩缩容工具进行数据库集群的在线扩缩容。对于不支持 GUI 的服务器,KingbaseES 提供基于命令行操作的集群扩缩容方式,本文主要介绍如何使用命令行的方式对KES V9 RWC集群进行在线扩缩容。
一主一备rwc集群部署请参考:KingbaseES(KES)V9 RWC集群部署实战
二、KES V9 RWC集群环境说明
目前已存在如下一主一备两节点的rwc集群环境:
主机名 | ip地址 | OS版本 | 内存、CPU | 节点角色 | 数据库端口 | 集群软件安装目录 | 数据目录 |
---|---|---|---|---|---|---|---|
node1 | 192.*.*.60 | Centos7.9 | 4G 、 1个双核 | 主节点 | 54321 | /opt/kes/v9 | /data/cluster |
node2 | 192.*.*.62 | Centos7.9 | 4G 、 1个双核 | 备节点 | 54321 | /opt/kes/v9 | /data/cluster |
集群vip地址: 192.*.*.64
三、KES V9 RWC集群扩容
需求:需要把node3节点加入到现有的一主一备两节点rwc集群中:
主机名 | ip地址 | OS版本 | 内存、CPU | 节点角色 | 数据库端口 | 集群软件安装目录 | 数据目录 |
---|---|---|---|---|---|---|---|
node3 | 192.*.*.66 | Centos7.9 | 4G 、 1个双核 | 主节点 | 54321 | /opt/kes/v9 | /data/cluster |
详细扩容步骤
1、准备待扩容节点node3的操作系统环境
参考KingbaseES(KES)V9 RWC集群部署实战中的 <安装前环境准备> 章节。
【注意】要在三台主机node1、node2、node3的/etc/hosts文件中加入node3的信息:
192.*.*.66 node3
2、在node1或node2上检查现有集群状态
[kingbase@node1 ~]$ repmgr service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 77003 | no | n/a 2 | node2 | standby | running | node1 | running | 43158 | no | 1 second(s) ago
2、准备扩容所需文件
从现有集群的主节点node1上获取文件:db.zip、license.dat、install.conf、cluster_install.sh 和 trust_cluster.sh,拷贝到待扩容主机node3上。
#进入{kes软件安装目录}/KESRealPro/V009R001C002B0014/ClientTools/guitools/DeployTools/zip目录
[root@node1 ~]# cd /opt/kes/v9/KESRealPro/V009R001C002B0014/ClientTools/guitools/DeployTools/zip/
[root@node1 zip]# ll
total 322412
-rwxrwxr-x 1 kingbase kingbase 252402 Sep 23 18:41 cluster_install.sh
-rw-rw-r-- 1 kingbase kingbase 327258132 Sep 23 18:41 db.zip
-rw-rw-r-- 1 kingbase kingbase 19580 Jan 18 12:24 install.conf
-rw-rw-r-- 1 kingbase kingbase 3676 Jan 18 11:47 license.dat
-rw-rw-r-- 1 kingbase kingbase 2595145 Sep 23 18:41 securecmdd.zip
-rwxrwxr-x 1 kingbase kingbase 9677 Sep 23 18:41 trust_cluster.sh
[root@node1 zip]#
# 拷贝文件到待扩容节点node3
[root@node1 zip]# scp * node3:/soft
#登录node3修改扩容需要文件的权限:
[root@node3 ~]# chown -R kingbase:kingbase /soft/*
[root@node3 ~]# ll /soft/*
total 322432
-rwxr-xr-x 1 kingbase kingbase 252402 Feb 27 17:59 cluster_install.sh
-rw-r--r-- 1 kingbase kingbase 327258132 Feb 27 17:59 db.zip
-rw-r--r-- 1 kingbase kingbase 19678 Feb 27 20:40 install.conf
-rw-r--r-- 1 kingbase kingbase 3676 Feb 27 17:59 license.dat
-rw-r--r-- 1 kingbase kingbase 2595145 Feb 27 17:59 securecmdd.zip
-rwxr-xr-x 1 kingbase kingbase 9677 Feb 27 17:59 trust_cluster.sh
以下操作都在待扩容节点node3上进行。
3、配置 install.conf 文件
3.1 编辑 install.conf 中 install 标签下的参数
在all_ip所在的行加入待扩容主机node3的ip
[root@node3 ~]# cd /soft
[root@node3 soft]# vi install.conf
#在all_ip所在的行加入待扩容主机node3的ip:
192.*.*.66
3.2 编辑 install.conf 中 expand 标签下的参数
[expand]
expand_type="0" # The node type of standby/witness node, which would be add to cluster. 0:standby 1:witness
primary_ip="192.*.*.60" # The ip addr of cluster primary node, which need to expand a standby/witness node.
expand_ip="192.*.*.66" # The ip addr of standby/witness node, which would be add to cluster.
node_id="3" # The node_id of standby/witness node, which would be add to cluster. It does not the same with any one in cluster node
# for example: node_id="3"
sync_type="" # the sync_type parameter is used to specify the sync type for expand node. 0:sync 1:potential 2:async
# this parameter is only valid when expand_type="0" and the synchronous parameter of the cluster is set to custom mode.
## Specific instructions ,see it under [install]
install_dir="/opt/kes/v9" # the last layer of directory could not add '/'
zip_package="/soft/db.zip"
net_device=(ens33) # if virtual_ip set,it must be set
net_device_ip=(192.*.*.66) # if virtual_ip set,it must be set
license_file=(license.dat)
deploy_by_sshd="1"
ssh_port="22"
scmd_port="8890"
【注意】如需修改 ssh 连接端口,先修改 install.conf 文件中 ssh_port 项的值,然后修改系统/etc/ssh/sshd_config 文件中的 Port 项的值,最后需要重启 sshd 服务。
4、配置ssh免密
在待扩容主机node3上配置各节点 root 与kingbase用户的免密。操作如下:
#配置ssh免密
[root@node3 soft]# ./trust_cluster.sh
5、集群扩容
使用root用户或者kingbase用户都可以扩容成功,本文中使用kingbase用户执行”cluster_install.sh expand” 命令进行扩容,脚本将按照配置自动完成集群扩容工作。
【注意】在扩容过程中会自动创建集群安装目录/opt/kes/v9,而kingbase用户默认没有在/opt目录创建文件的权限,因此需提前创建/opt/kes目录并修改权限为kingbase:kingbase。如果使用root用户扩容则不需要提前创建目录/opt/kes。
root用户创建目录并授权:
[root@node3 ~]# mkdir /opt/kes
[root@node3 ~]# chown -R kingbase:kingbase /opt/kes
使用kingbase用户执行扩容操作:
[kingbase@node3 soft]$ ./cluster_install.sh expand
扩容日志如下:
[kingbase@node3 soft]$ ./cluster_install.sh expand
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "192.*.*.66" ..... OK
[RUNNING] success connect to "192.*.*.66" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.*.*.60" ..... OK
[RUNNING] success connect to "192.*.*.60" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.*.*.60 ...
[RUNNING] Primary node ip is 192.*.*.60 ... OK
[CONFIG_CHECK] set install_with_root=1
[RUNNING] success connect to the target "192.*.*.66" ..... OK
[RUNNING] success connect to "192.*.*.66" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.*.*.60" ..... OK
[RUNNING] success connect to "192.*.*.60" from current node by 'ssh' ..... OK
[INSTALL] load config from cluster.....
[INFO] db_user=system
[INFO] db_port=54321
[INFO] use_scmd=1
[INFO] data_directory=/data/cluster
[INFO] scmd_port=8890
[INFO] recovery=standby
[INFO] use_check_disk=off
./cluster_install.sh: line 4981: 192.*.*.62: command not found
[INFO] trusted_servers=192.*.*.60 192.*.*.62
[INFO] virtual_ip=192.*.*.64/24
[INFO] ipaddr_path=/usr/sbin
[INFO] ping_path=/usr/bin
[INFO] arping_path=/opt/kes/bin
[INFO] reconnect_attempts=10
[INFO] reconnect_interval=6
[INFO] auto_cluster_recovery_level=1
[INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] success to access license_file: /soft/license.dat
[CONFIG_CHECK] file format is correct ... OK
[CONFIG_CHECK] check database connection ...
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] expand_ip[192.*.*.66] is not used in the cluster ...
[CONFIG_CHECK] expand_ip[192.*.*.66] is not used in the cluster ...ok
[CONFIG_CHECK] The localhost is expand_ip:[192.*.*.66] ...
[CONFIG_CHECK] The localhost is expand_ip:[192.*.*.66] ...ok
[CONFIG_CHECK] check node_id is in cluster ...
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] check the db is running or not...
[RUNNING] the db is not running on "192.*.*.66:54321" ..... OK
[RUNNING] the install dir is not exist on "192.*.*.66" ..... OK
[RUNNING] check the sys_securecmdd is running or not...
[RUNNING] the sys_securecmdd is not running on "192.*.*.66:8890" ..... OK
[CONFIG_CHECK] The virtual ip [192.*.*.64] exists on primary host [192.*.*.60].....
[CONFIG_CHECK] The virtual ip [192.*.*.64] exists on primary host [192.*.*.60].....OK
[CONFIG_CHECK] The net_device_ip:[192.*.*.66] exists on dev ens33 on [192.*.*.66].....
[CONFIG_CHECK] The net_device_ip:[192.*.*.66] exists on host "192.*.*.66" on dev ens33 .....OK
[INFO] use_ssl=0
2025-02-27 21:12:21 [INFO] start to check system parameters on 192.*.*.66 ...
2025-02-27 21:12:21 [WARNING] [GSSAPIAuthentication] yes (should be: no) on 192.*.*.66
2025-02-27 21:12:21 [INFO] [UseDNS] is null on 192.*.*.66
2025-02-27 21:12:22 [INFO] [UsePAM] yes on 192.*.*.66
2025-02-27 21:12:22 [INFO] [ulimit.open files] 65536 on 192.*.*.66
2025-02-27 21:12:22 [INFO] [ulimit.open proc] 65536 on 192.*.*.66
2025-02-27 21:12:22 [INFO] [ulimit.core size] unlimited on 192.*.*.66
2025-02-27 21:12:22 [INFO] [ulimit.mem lock] 50000000 on 192.*.*.66
2025-02-27 21:12:23 [INFO] [kernel.sem] 5010 641280 5010 256 on 192.*.*.66
2025-02-27 21:12:23 [INFO] [RemoveIPC] no on 192.*.*.66
2025-02-27 21:12:23 [INFO] [DefaultTasksAccounting] no on 192.*.*.66
2025-02-27 21:12:23 [INFO] write file "/etc/udev/rules.d/kingbase.rules" on 192.*.*.66
2025-02-27 21:12:24 [INFO] [crontab] chmod /usr/bin/crontab ...
2025-02-27 21:12:24 [INFO] [crontab] chmod /usr/bin/crontab ... Done
2025-02-27 21:12:24 [INFO] [crontab access] OK
2025-02-27 21:12:25 [INFO] [cron.deny] kingbase not exists in cron.deny
2025-02-27 21:12:25 [INFO] [crontab auth] crontab is accessible by kingbase now on 192.*.*.66
2025-02-27 21:12:25 [INFO] [SELINUX] disabled on 192.*.*.66
2025-02-27 21:12:26 [INFO] [firewall] down on 192.*.*.66
2025-02-27 21:12:26 [INFO] [The memory] OK on 192.*.*.66
2025-02-27 21:12:26 [INFO] [The hard disk] OK on 192.*.*.66
2025-02-27 21:12:26 [INFO] [ping] chmod /usr/bin/ping ...
2025-02-27 21:12:26 [INFO] [ping] chmod /usr/bin/ping ... Done
2025-02-27 21:12:27 [INFO] [ping access] OK
2025-02-27 21:12:27 [INFO] [/bin/cp --version] on 192.*.*.66 OK
2025-02-27 21:12:27 [INFO] [ip command path] on 192.*.*.66 OK
[INSTALL] create the install dir "/opt/kes/v9/kingbase" on 192.*.*.66 ...
[INSTALL] success to create the install dir "/opt/kes/v9/kingbase" on "192.*.*.66" ..... OK
[INSTALL] try to copy the zip package "/soft/db.zip" to /opt/kes/v9/kingbase of "192.*.*.66" .....
[INSTALL] success to scp the zip package "/soft/db.zip" /opt/kes/v9/kingbase of to "192.*.*.66" ..... OK
[INSTALL] decompress the "/opt/kes/v9/kingbase" to "/opt/kes/v9/kingbase" on 192.*.*.66
[INSTALL] success to decompress the "/opt/kes/v9/kingbase/db.zip" to "/opt/kes/v9/kingbase" on "192.*.*.66"..... OK
[RUNNING] chmod u+s and a+x for "/usr/sbin" and "/opt/kes/bin" on 192.*.*.66
[RUNNING] chmod u+s and a+x /usr/sbin/ip on "192.*.*.66" ..... OK
[RUNNING] chmod u+s and a+x /opt/kes/bin/arping on "192.*.*.66" ..... OK
[INSTALL] check license_file "license.dat"
[INSTALL] Scp license to /opt/kes/v9/kingbase/../license.dat on 192.*.*.66
[INSTALL] success to copy /soft/license.dat to /opt/kes/v9/kingbase/../ on 192.*.*.66
[RUNNING] config sys_securecmdd and start it ...
[RUNNING] config the sys_securecmdd port to 8890 ...
[RUNNING] success to config the sys_securecmdd port on 192.*.*.66 ... OK
successfully initialized the sys_securecmdd, please use "/opt/kes/v9/kingbase/bin/sys_HAscmdd.sh start" to start the sys_securecmdd
[RUNNING] success to config sys_securecmdd on 192.*.*.66 ... OK
Created symlink from /etc/systemd/system/multi-user.target.wants/securecmdd.service to /etc/systemd/system/securecmdd.service.
[RUNNING] success to start sys_securecmdd on 192.*.*.66 ... OK
[INSTALL] success to access file: /opt/kes/v9/kingbase/etc/all_nodes_tools.conf
[INSTALL] success to scp the /opt/kes/v9/kingbase/etc/repmgr.conf from 192.*.*.60 to "192.*.*.66"..... ok
[INSTALL] success to scp the ~/.encpwd from 192.*.*.60 to "192.*.*.66"..... ok
[INSTALL] success to scp /opt/kes/v9/kingbase/etc/all_nodes_tools.conf from "192.*.*.60" to "192.*.*.66" ...ok
[INSTALL] success to chmod 600 the ~/.encpwd on 192.*.*.66..... ok
[INFO] parameter_name=node_id
[INFO] parameter_values='3'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*node_id[ ]*=/cnode_id='3'" /opt/kes/v9/kingbase/etc/repmgr.conf
[INFO] parameter_name=node_name
[INFO] parameter_values='node3'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*node_name[ ]*=/cnode_name='node3'" /opt/kes/v9/kingbase/etc/repmgr.conf
[INFO] parameter_name=conninfo
[INFO] parameter_values='host
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*conninfo[ ]*=/cconninfo='host=192.*.*.66 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'" /opt/kes/v9/kingbase/etc/repmgr.conf
[INFO] parameter_name=ping_path
[INFO] parameter_values='/usr/bin'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*ping_path[ ]*=/cping_path='/usr/bin'" /opt/kes/v9/kingbase/etc/repmgr.conf
[INFO] parameter_name=net_device
[INFO] parameter_values='ens33'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*net_device[ ]*=/cnet_device='ens33'" /opt/kes/v9/kingbase/etc/repmgr.conf
[INFO] parameter_name=net_device_ip
[INFO] parameter_values='192.*.*.66'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*net_device_ip[ ]*=/cnet_device_ip='192.*.*.66'" /opt/kes/v9/kingbase/etc/repmgr.conf
[INFO] parameter_name=arping_path
[INFO] parameter_values='/opt/kes/bin'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*arping_path[ ]*=/carping_path='/opt/kes/bin'" /opt/kes/v9/kingbase/etc/repmgr.conf
[INFO] parameter_name=ipaddr_path
[INFO] parameter_values='/usr/sbin'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*ipaddr_path[ ]*=/cipaddr_path='/usr/sbin'" /opt/kes/v9/kingbase/etc/repmgr.conf
[RUNNING] standby clone ...
[WARNING] following problems with command line parameters detected:
-D/--sysdata will be ignored if a repmgr configuration file is provided
[NOTICE] destination directory "/data/cluster" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=192.*.*.60 user=esrep port=54321 dbname=esrep
[DETAIL] current installation size is 87 MB
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] checking and correcting permissions on existing directory "/data/cluster"
[INFO] creating replication slot as user "esrep"
[NOTICE] starting backup (using sys_basebackup)...
[INFO] executing:
/opt/kes/v9/kingbase/bin/sys_basebackup -l "repmgr base backup" -D /data/cluster -h 192.*.*.60 -p 54321 -U esrep -c fast -X stream -S repmgr_slot_3
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /data/cluster start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"
[RUNNING] standby clone ...OK
[RUNNING] db start ...
waiting for server to start.... done
server started
[RUNNING] db start ...OK
[INFO] connecting to local node "node3" (ID: 3)
[INFO] connecting to primary database
[WARNING] --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
[INFO] standby registration complete
[NOTICE] standby node "node3" (ID: 3) successfully registered
2025-02-27 21:12:52 begin to start DB on "[localhost]".
2025-02-27 21:12:53 DB on "[localhost]" already started, connect to check it.
2025-02-27 21:12:54 DB on "[localhost]" start success.
2025-02-27 21:12:54 Ready to start local kbha daemon and repmgrd daemon ...
2025-02-27 21:12:54 begin to start repmgrd on "[localhost]".
[2025-02-27 21:12:55] [NOTICE] using provided configuration file "/opt/kes/v9/kingbase/bin/../etc/repmgr.conf"
[2025-02-27 21:12:55] [INFO] creating directory "/opt/kes/v9/kingbase/log"...
[2025-02-27 21:12:55] [NOTICE] redirecting logging output to "/opt/kes/v9/kingbase/log/hamgr.log"
2025-02-27 21:12:56 repmgrd on "[localhost]" start success.
[2025-02-27 21:12:58] [NOTICE] redirecting logging output to "/opt/kes/v9/kingbase/log/kbha.log"
2025-02-27 21:12:59 Done.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.*.*.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.*.*.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.*.*.66 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] query archive command at 192.*.*.60 ...
[RUNNING] current cluster not config sys_rman,return.
[root@node3 soft]#
6、集群扩容结束后,查看集群状态
[kingbase@node3 soft]$ repmgr service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 77003 | no | n/a 2 | node2 | standby | running | node1 | running | 43158 | no | 1 second(s) ago 3 | node3 | standby | running | node1 | running | 15704 | no | 0 second(s) ago [kingbase@node3 soft]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.100.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.100.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.100.66 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 [kingbase@node3 soft]$
至此,节点node3成功加入集群中,集群状态正常。
四、KES V9 RWC集群缩容
需求:需要把node3节点从下面的rwc集群中删除:
主机名 | ip地址 | OS版本 | 内存、CPU | 节点角色 | 数据库端口 | 集群软件安装目录 | 数据目录 |
---|---|---|---|---|---|---|---|
node1 | 192.*.*.60 | Centos7.9 | 4G 、 1个双核 | 主节点 | 54321 | /opt/kes/v9 | /data/cluster |
node2 | 192.*.*.62 | Centos7.9 | 4G 、 1个双核 | 备节点 | 54321 | /opt/kes/v9 | /data/cluster |
node3 | 192.*.*.66 | Centos7.9 | 4G 、 1个双核 | 备节点 | 54321 | /opt/kes/v9 | /data/cluster |
集群vip地址: 192...64
详细缩容步骤
1、在任意节点上检查现有集群状态
[kingbase@node3 soft]$ repmgr service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 77003 | no | n/a 2 | node2 | standby | running | node1 | running | 43158 | no | 1 second(s) ago 3 | node3 | standby | running | node1 | running | 15704 | no | 1 second(s) ago [kingbase@node3 soft]$
以下操作都在待缩容节点node3上进行。
2、配置 install.conf 文件
2.1 编辑 install.conf 中 shrink 标签下的参数
[shrink]
shrink_type="standby" # The node type of standby/witness node, which would be delete from cluster. 0:standby 1:witness
primary_ip="192.168.100.60" # The ip addr of cluster primary node, which need to shrink a standby/witness node.
shrink_ip="192.168.100.66" # The ip addr of standby/witness node, which would be delete from cluster.
node_id="3" # The node_id of standby/witness node, which would be delete from cluster. It does not the same with any one in cluster node
# for example: node_id="3"
## Specific instructions ,see it under [install]
install_dir="/opt/kes/v9" # the last layer of directory could not add '/'
ssh_port="22" # the port of ssh, default is 22
scmd_port="8890" # the port of sys_securecmd, default is 8890
【注意】如需修改 ssh 连接端口,先修改 install.conf 文件中 ssh_port 项的值,然后修改系统/etc/ssh/sshd_config 文件中的 Port 项的值,最后需要重启 sshd 服务。
3、集群缩容
使用root用户或者kingbase用户都可以缩容成功,本文中使用kingbase用户执行”cluster_install.sh shrink” 命令进行缩容,脚本将按照配置自动完成集群缩容工作。
[kingbase@node3 soft]$ ./cluster_install.sh shrink
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "192.*.*.66" ..... OK
[RUNNING] success connect to "192.*.*.66" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.*.*.60" ..... OK
[RUNNING] success connect to "192.*.*.60" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.*.*.60 ...
[RUNNING] Primary node ip is 192.*.*.60 ... OK
[CONFIG_CHECK] set install_with_root=1
[RUNNING] success connect to "" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.*.*.60" ..... OK
[RUNNING] success connect to "192.*.*.60" from current node by 'ssh' ..... OK
[INSTALL] load config from cluster.....
[INFO] db_user=system
[INFO] db_port=54321
[INFO] use_scmd=1
[INFO] auto_cluster_recovery_level=1
[INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] check database connection ...
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] shrink_ip[192.*.*.66] is a standby node IP in the cluster ...
[CONFIG_CHECK] shrink_ip[192.*.*.66] is a standby node IP in the cluster ...ok
[CONFIG_CHECK] The localhost is shrink_ip:[192.*.*.66] or primary_ip:[192.*.*.60]...
[CONFIG_CHECK] The localhost is shrink_ip:[192.*.*.66] or primary_ip:[192.*.*.60]...ok
[RUNNING] Primary node ip is 192.*.*.60 ...
[RUNNING] Primary node ip is 192.*.*.60 ... OK
[CONFIG_CHECK] check node_id is in cluster ...
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] The /opt/kes/v9/kingbase/bin dir exist on "192.*.*.66" ...
[RUNNING] The /opt/kes/v9/kingbase/bin dir exist on "192.*.*.66" ... OK
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.*.*.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.*.*.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.*.*.66 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] Del node is standby ...
[INFO] node:192.*.*.66 can be deleted ... OK
[RUNNING] query archive command at 192.*.*.60 ...
[RUNNING] current cluster not config sys_rman,return.
[2025年 02月 27日 星期四 22:42:06 CST] [INFO] /opt/kes/v9/kingbase/bin/repmgr standby unregister --node-id=3 ...
[INFO] connecting to local standby
[INFO] connecting to primary database
[NOTICE] unregistering node 3
[INFO] SET synchronous TO "quorum" on primary host
[INFO] change synchronous_standby_names from "ANY 1( node2,node3)" to "ANY 1( node2)"
[INFO] try to drop slot "repmgr_slot_3" of node 3 on primary node
[WARNING] replication slot "repmgr_slot_3" is still active on node 3
[INFO] standby unregistration complete
[2025年 02月 27日 星期四 22:42:07 CST] [INFO] /opt/kes/v9/kingbase/bin/repmgr standby unregister --node-id=3 ...OK
[2025年 02月 27日 星期四 22:42:07 CST] [INFO] check db connection ...
[2025年 02月 27日 星期四 22:42:07 CST] [INFO] check db connection ...ok
2025-02-27 22:42:07 Ready to stop local kbha daemon and repmgrd daemon ...
2025-02-27 22:42:11 begin to stop repmgrd on "[localhost]".
2025-02-27 22:42:12 repmgrd on "[localhost]" stop success.
2025-02-27 22:42:12 Done.
2025-02-27 22:42:12 begin to stop DB on "[localhost]".
waiting for server to shut down.... done
server stopped
2025-02-27 22:42:12 DB on "[localhost]" stop success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.*.*.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.*.*.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[2025年 02月 27日 星期四 22:42:12 CST] [INFO] drop replication slot:repmgr_slot_3...
pg_drop_replication_slot
--------------------------
(1 row)
[2025年 02月 27日 星期四 22:42:13 CST] [INFO] drop replication slot:repmgr_slot_3...OK
[2025年 02月 27日 星期四 22:42:13 CST] [INFO] modify synchronous parameter configuration...
[2025年 02月 27日 星期四 22:42:14 CST] [INFO] modify synchronous parameter configuration...ok
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.*.*.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.*.*.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@node3 soft]$
4、集群缩容结束后,查看集群状态
登录到node1或node2节点,检查集群状态:
[kingbase@node2 ~]$ repmgr service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 77003 | no | n/a 2 | node2 | standby | running | node1 | running | 43158 | no | 0 second(s) ago [kingbase@node2 ~]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.100.60 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.100.62 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 [kingbase@node2 ~]$
至此,节点node3成功从集群中删除,集群状态正常。
五、参考文档
https://bbs.kingbase.com.cn/docHtml?recId=d16e9a1be637c8fe4644c2c82fe16444&url=aHR0cHM6Ly9iYnMua2luZ2Jhc2UuY29tLmNuL2tpbmdiYXNlLWRvYy92OS9oaWdobHkvYXZhaWxhYmlsaXR5L2luZGV4Lmh0bWw
详细路径:KingbaseES > 高可用 > 金仓数据守护集群和读写分离集群使用手册> 第7章 日常运维管理> 7.5. 在线扩缩容章节
六、总结
KingbaseES(KES)V9 RWC集群在线扩缩容还是非常丝滑的,欢迎大家体验~~~
关于作者:
网名:飞天,墨天轮2024年度优秀原创作者,拥有 Oracle 10g OCM 认证、PGCE认证以及OBCA、KCP、ACP、磐维等众多国产数据库认证证书,目前从事Oracle、Mysql、PostgresSQL、磐维数据库管理运维工作,喜欢结交更多志同道合的朋友,热衷于研究、分享数据库技术。
微信公众号:飞天online
墨天轮:https://www.modb.pro/u/15197
如有任何疑问,欢迎大家留言,共同探讨~~~