1. 什么是RWC?
RWC是金仓数据库读写分离集群软件(KingbaseRWC)
金仓数据库读写分离集群软件在金仓数据守护集群软件的基础上增加了对应用透明的读写负载均衡能力。相比数据守护集群,该类集群中所有备库均可对外提供查询能力,从而减轻了主库的读负载压力,可实现更高的事务吞吐率;该软件支持在多个备库间进行读负载均衡。
其成员可能包括主节点(primary node)、备节点(standby node)、辅助节点(witness node)、备份节点(repo node)。
读写分离的架构
官方推荐的部署架构
安装包下载
https://download.kingbase.com.cn/xzzx/index.htm
2. 环境配置
操作系统 | 配置 | 数据盘 | 系统盘 | IP | 角色 |
---|---|---|---|---|---|
Centos 7.8 | 4H16G | 100G | 大于8G | 10.10.100.236 | 主库 |
Centos 7.8 | 4H16G | 100G | 大于8G | 10.10.100.235 | 备库 |
3.前置准备工作
3.1 内核参数配置
cat /etc/sysctl.conf fs.aio-max-nr= 1048576 fs.file-max= 6815744 kernel.shmall= 2097152 kernel.shmmax= 4294967295 kernel.shmmni= 4096 kernel.sem= 250 32000 100 128 net.ipv4.ip_local_port_range= 9000 65500 net.core.rmem_default= 262144 net.core.rmem_max= 4194304 net.core.wmem_default= 262144 net.core.wmem_max= 1048576
配置完成后,立即生效和查看命令(当然可以的话,重启系统也是没问题的)
/sbin/sysctl -p /sbin/sysctl -a
3.2 资源限制配置
cat /etc/security/limits.conf
# *表示所有用户,可只设置root和kingbase用户
* soft nofile 65536
# 注意:设置nofile的hard limit不能大于/proc/sys/fs/nr_open,否则注销后将无法正常登录
* hard nofile 65535
* soft nproc 65536
* hard nproc 65535
# unlimited表示无限制
* soft core unlimited
* hard core unlimited
立即生效
# 重新登录或运行命令
sysctl -p
3.3 用户及目录配置
# 新建安装金仓的用户
useradd -m kingbase
passwd kingbase
#新建用于安装和存放数据的目录
mkdir -p /data/Kingbase{kdb,data}
# 创建用于挂载的目录
mkdir kbinstall
# 变更属组(很重要)`
cd /data/
chown -R kingbase:kingbase Kingbase
chown -R kingbase:kingbase kbinstall
chown -R kingbase:kingbase KingbaseES_V009R001C002B0014_Lin64_install.iso
3.4 挂载安装包(root挂载)
cd /data/ mount KingbaseES_V009R001C002B0014_Lin64_install.iso ./kbinstall
PS: 主节点单机版数据库安装参考:
https://bbs.kingbase.com.cn/blogDetail?postsId=9a0cfd3b13dad2b395ee0e2df4a1b3dd
4. 集群部署
4.1数据库自动启动服务配置
服务名称:kingbased
需要关闭数据库自动启动服务:
service kingbased stop 或 systemctl stop kingbased
关闭服务开机自动启动:
# Redhat或CentOS系列操作系统
chkconfig --del kingbased 或 systemctl disable kingbased
4.2 单机安装完成后进入安装相应目录下相应的包: ${install_dir/ClientTools/guitools/DeployTools/zip
cluster_install.sh # 部署脚本
db.zip # 数据库部署压缩包
install.conf #部署配置文件
trust_cluster.sh # 配置SSH免密脚本
4.3 编辑安装配置文件
[kingbase@dba236 install]$ diff install.conf ../kbs/ClientTools/guitools/DeployTools/zip/install.conf
23c23
< all_ip=(10.10.100.236 10.10.30.235)
---
> all_ip=()
51c51
< install_dir="/data/Kingbase/cluster"
---
> install_dir="/home/kingbase/cluster/install"
55c55
< zip_package="/data/Kingbase/install/db.zip"
---
> zip_package=""
128c128
< trusted_servers="10.10.100.1"
---
> trusted_servers=""
143c143
< data_directory="/data/Kingbase/kbs/data"
---
> data_directory=""
152c152
< virtual_ip="10.10.100.237/24"
---
> virtual_ip=""
4.4 配置互信,进入install目录
[root@dba236 install]# ./trust_cluster.sh
[INFO] set password-free between root and kingbase
known_hosts 100% 521 1.0MB/s 00:00
id_rsa 100% 1675 9.2MB/s 00:00
id_rsa.pub 100% 393 2.1MB/s 00:00
authorized_keys 100% 393 2.8MB/s 00:00
known_hosts 100% 521 35.0KB/s 00:00
id_rsa 100% 1675 2.5MB/s 00:00
id_rsa.pub 100% 393 704.0KB/s 00:00
authorized_keys 100% 393 1.2MB/s 00:00
connect to "10.10.100.236" from current node by 'ssh' kingbase:0..... OK
connect to "10.10.100.236" from current node by 'ssh' root:0..... OK
connect to "10.10.100.235" from "10.10.100.236" by 'ssh' kingbase->kingbase:0 .... OK
connect to "10.10.100.235" from "10.10.100.236" by 'ssh' root->root:0 root->kingbase:0 kingbase->root:0.... OK
connect to "10.10.100.235" from current node by 'ssh' kingbase:0..... OK
connect to "10.10.100.235" from current node by 'ssh' root:0..... OK
connect to "10.10.100.236" from "10.10.100.235" by 'ssh' kingbase->kingbase:0 .... OK
connect to "10.10.100.236" from "10.10.100.235" by 'ssh' root->root:0 root->kingbase:0 kingbase->root:0.... OK
check ssh connection success!
4.5 集群安装(kingbase用户安装)
[kingbase@dba236 install]$ ./cluster_install.sh
[CONFIG_CHECK] will deploy the cluster of DG
[CONFIG_CHECK] file format is correct ... OK
[CONFIG_CHECK] encoding: UTF8 OK
[CONFIG_CHECK] locale: zh_CN.UTF-8 OK
[CONFIG_CHECK] the number of license_num matches the length of all_ip or the number of license_num is 1 ... OK
[RUNNING] check if the host can be reached from current node and between all nodes by ssh ...
[RUNNING] success connect to "10.10.100.236" from current node by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.236" from "10.10.100.236" by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.235" from "10.10.100.236" by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.235" from current node by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.236" from "10.10.100.235" by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.235" from "10.10.100.235" by 'ssh' ... OK
[RUNNING] chmod /bin/ping ...
[RUNNING] chmod /bin/ping ... Done
[RUNNING] ping access rights OK
[RUNNING] check the db is running or not...
[RUNNING] the db is not running on "10.10.100.236:54321" ..... OK
[RUNNING] the db is not running on "10.10.100.235:54321" ..... OK
[RUNNING] check the sys_securecmdd is running or not...
[RUNNING] the sys_securecmdd is not running on "10.10.100.236:8890" ..... OK
[RUNNING] the sys_securecmdd is not running on "10.10.100.235:8890" ..... OK
[RUNNING] check if the install dir (create dir and check it's owner/permission) ...
[RUNNING] check if the install dir (create dir and check it's owner/permission) on "10.10.100.236" ... OK
[RUNNING] check if the install dir (create dir and check it's owner/permission) on "10.10.100.235" ... OK
[RUNNING] check if the dir "/data/Kingbase/cluster/kingbase" is already exist ...
[RUNNING] the dir "/data/Kingbase/cluster/kingbase" is not exist on "10.10.100.236" ..... OK
[RUNNING] the dir "/data/Kingbase/cluster/kingbase" is not exist on "10.10.100.235" ..... OK
[RUNNING] check the data directory (create it and check whether it is empty) ...
[RUNNING] when use_exist_data=0, create the empty data directory on "10.10.100.236" ..... OK
[RUNNING] when use_exist_data=0, create the empty data directory on "10.10.100.235" ..... OK
2025-02-17 10:58:21 [INFO] start to check system parameters on 10.10.100.236 ...
2025-02-17 10:58:21 [INFO] [GSSAPIAuthentication] no on 10.10.100.236
2025-02-17 10:58:22 [INFO] [UseDNS] no on 10.10.100.236
2025-02-17 10:58:22 [INFO] [UsePAM] yes on 10.10.100.236
2025-02-17 10:58:22 [INFO] [ulimit.open files] 655360 on 10.10.100.236
2025-02-17 10:58:22 [INFO] [ulimit.open proc] 655360 on 10.10.100.236
2025-02-17 10:58:22 [INFO] [ulimit.core size] unlimited on 10.10.100.236
2025-02-17 10:58:22 [INFO] [ulimit.mem lock] 50000000 on 10.10.100.236
2025-02-17 10:58:23 [INFO] [kernel.sem] 5010 641280 5010 256 on 10.10.100.236
2025-02-17 10:58:23 [INFO] [RemoveIPC] no on 10.10.100.236
2025-02-17 10:58:23 [INFO] [DefaultTasksAccounting] is null on 10.10.100.236
2025-02-17 10:58:23 [INFO] file "/etc/udev/rules.d/kingbase.rules" exists on 10.10.100.236
2025-02-17 10:58:23 [INFO] [crontab] chmod /usr/bin/crontab ...
2025-02-17 10:58:24 [INFO] [crontab] chmod /usr/bin/crontab ... Done
2025-02-17 10:58:24 [INFO] [crontab access] OK
2025-02-17 10:58:24 [INFO] [cron.deny] kingbase not exists in cron.deny
2025-02-17 10:58:24 [INFO] [cron.allow] kingbase already exists in cron.allow
2025-02-17 10:58:24 [INFO] [crontab auth] crontab is accessible by kingbase now on 10.10.100.236
2025-02-17 10:58:24 [INFO] [SELINUX] disabled on 10.10.100.236
2025-02-17 10:58:25 [WARNING] [firewall] up (should be: down or add port rules) on 10.10.100.236
2025-02-17 10:58:25 [INFO] [The memory] OK on 10.10.100.236
2025-02-17 10:58:25 [INFO] [The hard disk] OK on 10.10.100.236
2025-02-17 10:58:25 [INFO] [ping] chmod /bin/ping ...
2025-02-17 10:58:25 [INFO] [ping] chmod /bin/ping ... Done
2025-02-17 10:58:25 [INFO] [ping access] OK
2025-02-17 10:58:26 [INFO] [/bin/cp --version] on 10.10.100.236 OK
2025-02-17 10:58:26 [INFO] [Virtual IP] Not configured on 10.10.100.236
2025-02-17 10:58:26 [INFO] start to check system parameters on 10.10.100.235 ...
2025-02-17 10:58:26 [INFO] [GSSAPIAuthentication] no on 10.10.100.235
2025-02-17 10:58:26 [INFO] [UseDNS] no on 10.10.100.235
2025-02-17 10:58:26 [INFO] [UsePAM] yes on 10.10.100.235
2025-02-17 10:58:27 [INFO] [ulimit.open files] 655360 on 10.10.100.235
2025-02-17 10:58:27 [INFO] [ulimit.open proc] 65536 on 10.10.100.235
2025-02-17 10:58:27 [INFO] [ulimit.core size] unlimited on 10.10.100.235
2025-02-17 10:58:28 [INFO] [ulimit.mem lock] 50000000 on 10.10.100.235
2025-02-17 10:58:29 [INFO] [kernel.sem] 5010 641280 5010 256 on 10.10.100.235
2025-02-17 10:58:29 [INFO] [RemoveIPC] no on 10.10.100.235
2025-02-17 10:58:29 [INFO] [DefaultTasksAccounting] is null on 10.10.100.235
2025-02-17 10:58:30 [INFO] file "/etc/udev/rules.d/kingbase.rules" exists on 10.10.100.235
2025-02-17 10:58:31 [INFO] [crontab] chmod /usr/bin/crontab ...
2025-02-17 10:58:31 [INFO] [crontab] chmod /usr/bin/crontab ... Done
2025-02-17 10:58:31 [INFO] [crontab access] OK
2025-02-17 10:58:32 [INFO] [cron.deny] kingbase not exists in cron.deny
2025-02-17 10:58:33 [INFO] [crontab auth] crontab is accessible by kingbase now on 10.10.100.235
2025-02-17 10:58:33 [INFO] [SELINUX] permissive on 10.10.100.235
2025-02-17 10:58:34 [WARNING] [firewall] up (should be: down or add port rules) on 10.10.100.235
2025-02-17 10:58:34 [INFO] [The memory] OK on 10.10.100.235
2025-02-17 10:58:34 [INFO] [The hard disk] OK on 10.10.100.235
2025-02-17 10:58:35 [INFO] [ping] chmod /bin/ping ...
2025-02-17 10:58:35 [INFO] [ping] chmod /bin/ping ... Done
2025-02-17 10:58:35 [INFO] [ping access] OK
2025-02-17 10:58:35 [INFO] [/bin/cp --version] on 10.10.100.235 OK
2025-02-17 10:58:35 [INFO] [Virtual IP] Not configured on 10.10.100.235
[INSTALL] create the install dir "/data/Kingbase/cluster/kingbase" on every host ...
[INSTALL] success to create the install dir "/data/Kingbase/cluster/kingbase" on "10.10.100.236" ..... OK
[INSTALL] success to create the install dir "/data/Kingbase/cluster/kingbase" on "10.10.100.235" ..... OK
[INSTALL] success to access the zip_package "/data/Kingbase/install/db.zip" on "10.10.100.236" ..... OK
[INSTALL] decompress the "/data/Kingbase/install/db.zip" to "/data/Kingbase/cluster/kingbase/__tmp_decompress__"
[INSTALL] success to recreate the tmp dir "/data/Kingbase/cluster/kingbase/__tmp_decompress__" on "10.10.100.236" ..... OK
[INSTALL] success to decompress the "/data/Kingbase/install/db.zip" to "/data/Kingbase/cluster/kingbase/__tmp_decompress__" on "10.10.100.236"..... OK
[INSTALL] scp the dir "/data/Kingbase/cluster/kingbase/__tmp_decompress__" to "/data/Kingbase/cluster/kingbase" on all host
[INSTALL] try to copy the install dir "/data/Kingbase/cluster/kingbase" to "10.10.100.236" .....
[INSTALL] success to scp the install dir "/data/Kingbase/cluster/kingbase" to "10.10.100.236" ..... OK
[INSTALL] try to copy the install dir "/data/Kingbase/cluster/kingbase" to "10.10.100.235" .....
[INSTALL] success to scp the install dir "/data/Kingbase/cluster/kingbase" to "10.10.100.235" ..... OK
[INSTALL] remove the dir "/data/Kingbase/cluster/kingbase/__tmp_decompress__"
[INSTALL] change the auth of bin directory on 10.10.100.236 ...
[INSTALL] change the auth of bin directory on 10.10.100.235 ...
[INSTALL] check license_file ...
[INSTALL] success to access license_file on 10.10.100.236: /data/Kingbase/cluster/kingbase/bin/license.dat
[INSTALL] check license_file ...
[INSTALL] success to access license_file on 10.10.100.235: /data/Kingbase/cluster/kingbase/bin/license.dat
[INSTALL] set the archive_command to "exit 0" and the archive dir is NULL
[INSTALL] the archive dir is NULL, not do archive ...
[INSTALL] create the dir "etc" "log" on all host
[RUNNING] config sys_securecmdd and start it ...
[RUNNING] config the sys_securecmdd port to 8890 ...
[RUNNING] success to config the sys_securecmdd port on 10.10.100.236 ... OK
successfully initialized the sys_securecmdd, please use "/data/Kingbase/cluster/kingbase/bin/sys_HAscmdd.sh start" to start the sys_securecmdd
[RUNNING] success to config sys_securecmdd on 10.10.100.236 ... OK
[RUNNING] success to start sys_securecmdd on 10.10.100.236 ... OK
[RUNNING] config sys_securecmdd and start it ...
[RUNNING] config the sys_securecmdd port to 8890 ...
[RUNNING] success to config the sys_securecmdd port on 10.10.100.235 ... OK
successfully initialized the sys_securecmdd, please use "/data/Kingbase/cluster/kingbase/bin/sys_HAscmdd.sh start" to start the sys_securecmdd
[RUNNING] success to config sys_securecmdd on 10.10.100.235 ... OK
[RUNNING] success to start sys_securecmdd on 10.10.100.235 ... OK
[RUNNING] check if the host can be reached between all nodes by scmd ...
[RUNNING] success connect to "10.10.100.236" from "10.10.100.236" by '/data/Kingbase/cluster/kingbase/bin/sys_securecmd' ... OK
[RUNNING] success connect to "10.10.100.235" from "10.10.100.236" by '/data/Kingbase/cluster/kingbase/bin/sys_securecmd' ... OK
[RUNNING] success connect to "10.10.100.236" from "10.10.100.235" by '/data/Kingbase/cluster/kingbase/bin/sys_securecmd' ... OK
[RUNNING] success connect to "10.10.100.235" from "10.10.100.235" by '/data/Kingbase/cluster/kingbase/bin/sys_securecmd' ... OK
[INSTALL] begin to init the database on "10.10.100.236" ...
The database cluster will be initialized with locales
COLLATE: zh_CN.UTF-8
CTYPE: zh_CN.UTF-8
MESSAGES: C
MONETARY: zh_CN.UTF-8
NUMERIC: zh_CN.UTF-8
TIME: zh_CN.UTF-8
The files belonging to this database system will be owned by user "kingbase".
This user must also own the server process.
The default text search configuration will be set to "simple".
The comparision of strings is case-sensitive.
Data page checksums are enabled.
fixing permissions on existing directory /data/Kingbase/V9/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... initdb: could not find suitable text search configuration for locale "zh_CN.UTF-8"
100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Shanghai
creating configuration files ... ok
Begin setup encrypt device
initializing the encrypt device ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
create security database ... ok
load security database ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
/data/Kingbase/cluster/kingbase/bin/sys_ctl -D /data/Kingbase/V9/data -l logfile start
[INSTALL] end to init the database on "10.10.100.236" ... OK
[INSTALL] wirte the kingbase.conf on "10.10.100.236" ...
[INSTALL] wirte the kingbase.conf on "10.10.100.236" ... OK
[INSTALL] wirte the es_rep.conf on "10.10.100.236" ...
[INSTALL] wirte the es_rep.conf on "10.10.100.236" ... OK
[INSTALL] wirte the sys_hba.conf on "10.10.100.236" ...
[INSTALL] wirte the sys_hba.conf on "10.10.100.236" ... OK
[INSTALL] wirte the .encpwd on every host
[INSTALL] write the repmgr.conf on every host
[INSTALL] write the repmgr.conf on "10.10.100.236" ...
[INSTALL] write the repmgr.conf on "10.10.100.236" ... OK
[INSTALL] write the repmgr.conf on "10.10.100.235" ...
[INSTALL] write the repmgr.conf on "10.10.100.235" ... OK
[INSTALL] start up the database on "10.10.100.236" ...
[INSTALL] /data/Kingbase/cluster/kingbase/bin/sys_ctl -w -t 60 -l /data/Kingbase/cluster/kingbase/logfile -D /data/Kingbase/V9/data start
waiting for server to start.... done
server started
[INSTALL] start up the database on "10.10.100.236" ... OK
[INSTALL] create the database "esrep" and user "esrep" for repmgr ...
CREATE DATABASE
CREATE ROLE
GRANT
GRANT ROLE
[INSTALL] create the database "esrep" and user "esrep" for repmgr ... OK
[INSTALL] register the primary on "10.10.100.236" ...
[INFO] connecting to primary database...
[NOTICE] attempting to install extension "repmgr"
[NOTICE] "repmgr" extension successfully installed
[NOTICE] primary node record (ID: 1) registered
[INSTALL] register the primary on "10.10.100.236" ... OK
[INSTALL] clone and start up the standby ...
clone the standby on "10.10.100.235" ...
/data/Kingbase/cluster/kingbase/bin/repmgr -h 10.10.100.236 -U esrep -d esrep -p 54321 --fast-checkpoint --upstream-node-id 1 standby clone
[NOTICE] destination directory "/data/Kingbase/V9/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=10.10.100.236 user=esrep port=54321 dbname=esrep
[DETAIL] current installation size is 87 MB
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] checking and correcting permissions on existing directory "/data/Kingbase/V9/data"
[INFO] creating replication slot as user "esrep"
[NOTICE] starting backup (using sys_basebackup)...
[INFO] executing:
/data/Kingbase/cluster/kingbase/bin/sys_basebackup -l "repmgr base backup" -D /data/Kingbase/V9/data -h 10.10.100.236 -p 54321 -U esrep -c fast -X stream -S repmgr_slot_2
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /data/Kingbase/V9/data start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"
clone the standby on "10.10.100.235" ... OK
start up the standby on "10.10.100.235" ...
/data/Kingbase/cluster/kingbase/bin/sys_ctl -w -t 60 -l /data/Kingbase/cluster/kingbase/logfile -D /data/Kingbase/V9/data start
waiting for server to start.... done
server started
start up the standby on "10.10.100.235" ... OK
register the standby on "10.10.100.235" ...
[INFO] connecting to local node "node2" (ID: 2)
[INFO] connecting to primary database
[INFO] standby registration complete
[NOTICE] standby node "node2" (ID: 2) successfully registered
[INSTALL] register the standby on "10.10.100.235" ... OK
[INSTALL] start up the whole cluster ...
2025-02-17 10:59:40 Ready to start all DB ...
2025-02-17 10:59:40 begin to start DB on "[10.10.100.236]".
2025-02-17 10:59:41 DB on "[10.10.100.236]" already started, connect to check it.
2025-02-17 10:59:42 DB on "[10.10.100.236]" start success.
2025-02-17 10:59:42 Try to ping trusted_servers on host 10.10.100.236 ...
2025-02-17 10:59:45 Try to ping trusted_servers on host 10.10.100.235 ...
2025-02-17 10:59:48 begin to start DB on "[10.10.100.235]".
2025-02-17 10:59:48 DB on "[10.10.100.235]" already started, connect to check it.
2025-02-17 10:59:50 DB on "[10.10.100.235]" start success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=10.10.100.236 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=10.10.100.235 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2025-02-17 10:59:50 The primary DB is started.
2025-02-17 10:59:50 begin to start repmgrd on "[10.10.100.236]".
[2025-02-17 10:59:51] [NOTICE] using provided configuration file "/data/Kingbase/cluster/kingbase/bin/../etc/repmgr.conf"
[2025-02-17 10:59:51] [NOTICE] redirecting logging output to "/data/Kingbase/cluster/kingbase/log/hamgr.log"
2025-02-17 10:59:52 repmgrd on "[10.10.100.236]" start success.
2025-02-17 10:59:52 begin to start repmgrd on "[10.10.100.235]".
[2025-02-17 10:59:53] [NOTICE] using provided configuration file "/data/Kingbase/cluster/kingbase/bin/../etc/repmgr.conf"
[2025-02-17 10:59:53] [NOTICE] redirecting logging output to "/data/Kingbase/cluster/kingbase/log/hamgr.log"
2025-02-17 10:59:55 repmgrd on "[10.10.100.235]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 24456 | no | n/a
2 | node2 | standby | running | node1 | running | 1193 | no | 1 second(s) ago
[2025-02-17 10:59:57] [NOTICE] redirecting logging output to "/data/Kingbase/cluster/kingbase/log/kbha.log"
2025-02-17 11:00:03 Done.
[INSTALL] start up the whole cluster ... OK
4.6 集群验证
[kingbase@dba236 install]$ cd /data/Kingbase/cluster/kingbase/bin
[kingbase@dba236 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=10.10.100.236 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=10.10.100.235 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
5. 遇到的问题
端口问题,因为开了防火墙,导致端口不通
[RUNNING] success connect to "10.10.100.236" from "10.10.100.236" by '/data/Kingbase/cluster/kingbase/bin/sys_securecmd' ... OK
[ERROR] can not connect to "10.10.100.235" from "10.10.100.236" (with super_user 'root', execute_user 'kingbase') by '/data/Kingbase/cluster/kingbase/bin/sys_securecmd' on port '8890'
[ERROR] can not connect to "10.10.100.236" from "10.10.100.235" (with super_user 'root', execute_user 'kingbase') by '/data/Kingbase/cluster/kingbase/bin/sys_securecmd' on port '8890'
[RUNNING] success connect to "10.10.100.235" from "10.10.100.235" by '/data/Kingbase/cluster/kingbase/bin/sys_securecmd' ... OK
[ERROR] could not access some host (10.10.100.235 10.10.100.236) by scmd, please check it
解决措施(root或者sudo执行)
[root@ocdb235 data]# firewall-cmd --add-port=8890/tcp --permanent --zone=public
success
[root@ocdb235 data]# firewall-cmd --reload
success
解决端口的问题后,重新运行集群安装,提示端口已经启动
[kingbase@dba236 install]$ ./cluster_install.sh
[CONFIG_CHECK] will deploy the cluster of DG
[CONFIG_CHECK] file format is correct ... OK
[CONFIG_CHECK] encoding: UTF8 OK
[CONFIG_CHECK] locale: zh_CN.UTF-8 OK
[CONFIG_CHECK] the number of license_num matches the length of all_ip or the number of license_num is 1 ... OK
[RUNNING] check if the host can be reached from current node and between all nodes by ssh ...
[RUNNING] success connect to "10.10.100.236" from current node by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.236" from "10.10.100.236" by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.235" from "10.10.100.236" by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.235" from current node by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.236" from "10.10.100.235" by 'ssh' ... OK
[RUNNING] success connect to "10.10.100.235" from "10.10.100.235" by 'ssh' ... OK
[RUNNING] chmod /bin/ping ...
[RUNNING] chmod /bin/ping ... Done
[RUNNING] ping access rights OK
[RUNNING] check the db is running or not...
[RUNNING] the db is not running on "10.10.100.236:54321" ..... OK
[RUNNING] the db is not running on "10.10.100.235:54321" ..... OK
[RUNNING] check the sys_securecmdd is running or not...
[ERROR] the sys_securecmdd on "10.10.100.236:8890" is running, please stop it first.
[ERROR] the sys_securecmdd on "10.10.100.235:8890" is running, please stop it first.
解决问题,关闭securecmdd服务
[root@ocdb235 data]# systemctl status securecmdd
● securecmdd.service - KingbaseES - sys_securecmdd daemon
Loaded: loaded (/etc/systemd/system/securecmdd.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2025-02-17 10:18:38 CST; 17min ago
Main PID: 29684 (sys_securecmdd)
CGroup: /system.slice/securecmdd.service
└─29684 sys_securecmdd: /data/Kingbase/cluster/kingbase/bin/sys_securecmdd -f /opt/kes/etc/securecmdd_config [listener] 0 of 1...
Feb 17 10:18:38 ocdb235 systemd[1]: Started KingbaseES - sys_securecmdd daemon.
[root@ocdb235 data]# systemctl stop securecmdd
重启后cluster_install.sh,再次遇到端口问题:
root@ocdb235 data]# firewall-cmd --add-port=54321/tcp --permanent --zone=public
success
[root@ocdb235 data]# firewall-cmd --reload
success
解决措施:
停止服务及清理相关目录(主备库都需要处理)
# 清理相关目录
rm -rf archive/* cluster/* V9/*
再次安装,稍有波折,但终于成功了
最后修改时间:2025-02-19 16:01:09
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。