暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

金仓探险记:赵今麦的KES RWC集群扩缩容奇遇

原创 严少安 2025-03-07
86

20210210193458_02d68.jpg

前情提要:初探 RWC 秘境

上回说到,金仓赵今麦在师父的指导下成功搭建了 KES RWC 三节点集群。看着监控面板上跳动的数据流,她仿佛看到了数字世界的血脉在三个节点间奔涌不息。但师父的一席话让她陷入沉思:“集群如同活物,需懂得呼吸吐纳之道。今日教你集群的’生长术’与’缩骨功’。”

回归现实。金仓数据库中默认配套了集群管理的图形化操作工具。但对于一些权限管控严格操作环境,或者操作系统以命令行模式启动,就只能使用指令对数据库集群进行管理和维护。

deploy.png

为了应对业务扩张和数据量增长,或者建设完善多机房、异地容灾机制,我们时常需要对数据库集群进行扩容。

20250307_161118.png

Part 1. 集群生长的秘密仪式

神秘祭坛的召唤

赵今麦轻点终端,三节点的运行状态如星图般展开:

[kingbase@kes1 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=kes3 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@kes1 ~]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
 1  | node1 | primary | * running |          | running | 2970 | no      | n/a
 2  | node2 | standby |   running | node1    | running | 2167 | no      | 1 second(s) ago
 3  | node3 | standby |   running | node1    | running | 2163 | no      | 0 second(s) ago
[kingbase@kes1 ~]$

"就像给大树嫁接新枝,"师父的声音在耳边响起,“需先寻得灵土——准备同源而生的 kes4 服务器。”

血脉相连的仪式

配置免密通道时,赵今麦仿佛在节点间架设无形的桥梁:

[kingbase@kes1 zip]$ sudo ./trust_cluster.sh
...
connect to "kes4" from current node by 'ssh' root:0 kingbase:0..... OK
check ssh connection success!
[kingbase@kes1 zip]$

这让她想起武侠小说中的经脉贯通,节点间的信任通道就是集群的任督二脉。

生命复刻的魔法

透过魔法球,我们看到了新土壤的养分足以支撑新树移栽。

# 从 kes1 复制客户端工具和授权文件到 kes4 服务器
[kingbase@kes1 V009R004C010]$ pwd
/opt/Kingbase/ES/V9/KESRealPro/V009R004C010
[kingbase@kes1 V009R004C010]$ scp -r ClientTools license.dat kes4:/opt/Kingbase/ES/V9/
...
# 修改配置文件
[kingbase@kes4 zip]$ diff install.conf install.conf.bak | grep '^<'
< all_ip=(kes1 kes2 kes3 kes4)
< net_device=(ens160 ens160 ens160 ens160)
< net_device_ip=(192.168.43.91 192.168.43.92 192.168.43.93 192.168.43.94)
< expand_type="0"
< primary_ip="kes1"
< expand_ip="192.168.43.94"
< node_id="4"
< sync_type="0"
< install_dir="/home/kingbase/cluster/install"
< zip_package="/opt/Kingbase/ES/V9/ClientTools/guitools/DeployTools/zip/db.zip"
< net_device=(ens160)
< net_device_ip=(192.168.43.94)
[kingbase@kes4 zip]$

执行扩容时,屏幕闪烁的代码如同跳动的符文:

[kingbase@kes4 zip]$ pwd
/opt/Kingbase/ES/V9/ClientTools/guitools/DeployTools/zip
[kingbase@kes4 zip]$ ./cluster_install.sh expand
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "192.168.43.94" ..... OK
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] file format is correct ... OK
[NOTICE] starting backup (using sys_basebackup)...
[INFO] executing:
  /home/kingbase/cluster/install/kingbase/bin/sys_basebackup -l "repmgr base backup"  -D /home/kingbase/cluster/install/kingbase/data -h kes1 -p 54321 -U esrep -c fast -X stream -S repmgr_slot_4
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[NOTICE] standby node "node4" (ID: 4) successfully registered
[2025-03-07 23:56:01] [NOTICE] redirecting logging output to "/home/kingbase/cluster/install/kingbase/log/kbha.log"

 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=kes3 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 4  | node4 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.43.94 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@kes4 zip]$ 

赵今麦屏息凝视,直到新节点如破茧之蝶般现身集群:

4.png

此刻的监控面板上,数据洪流自动分出一支注入新节点,宛如江河开凿新河道般自然。

Part 2. 缩容的精准手术

诊断病灶节点

当需要下线 node3 时,赵今麦先以"望闻问切"之法检查节点状态:

[kingbase@kes3 ~]$ repmgr node check Node “node3”: Server role: OK (node is standby) Replication lag: OK (0 seconds) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (node “node3” (ID: 3) is attached to expected upstream node “node1” (ID: 1)) Downstream servers: OK (this node has no downstream nodes) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured “data_directory” is “/home/kingbase/cluster/install/kingbase/data”)

确认节点健康后,她开始准备这场"无痛摘除术"。

精准切割术

修改配置文件如同调整手术方案,注意定位文件中 [shrink] 部分:

[kingbase@kes1 zip]$ diff install.conf install.conf.bak | grep '^<'
< shrink_type="0"
< primary_ip="kes1"
< shrink_ip="kes3"
< node_id="3"
< install_dir="/home/kingbase/cluster/install"
[kingbase@kes1 zip]$

执行缩容命令时,她仿佛看到数据流被优雅地重定向:

[kingbase@kes1 zip]$ ./cluster_install.sh shrink
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "kes3" ..... OK
[RUNNING] The /home/kingbase/cluster/install/kingbase/bin dir exist on "kes3" ... OK
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=kes3 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 4  | node4 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.43.94 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[INFO] node:kes3 can be deleted ... OK
[NOTICE] unregistering node 3
[INFO] standby unregistration complete
2025-03-08 00:06:44 DB on "[localhost]" stop success.

 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 4  | node4 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.43.94 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@kes1 zip]$

整个过程如庖丁解牛,node3 平滑下线不留隐患。

术后康复观察

确认集群状态,四节点缩容成三节点后,已自动重组为新的稳定三角:

[kingbase@kes1 zip]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 4  | node4 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.43.94 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@kes1 zip]$

s.png

后记:集群生命周期的顿悟

"这是数字生命的呼吸,"师父缓缓道,“优秀的 DBA 不是操作工,而是把握集群生命韵律的医者。KES 的扩缩容不是简单的加减法,而是让系统始终保持’黄金平衡点’的艺术。”

月色下的机房,赵今麦凝视着呼吸灯有节奏的明灭。

她突然领悟到:

“扩缩容的真谛,在于让数据库学会像生物一样——在春天生长,在秋天沉淀。就像《庄子》所言:‘其生若浮,其死若休’。每个节点的加入都为集群注入新活力,而优雅下线则是数字生命的轮回。”

但望着集群安装脚本,她又在思考:

“若能在 dry-run 中预见操作结果,就像拥有预见未来的水晶球,该多好?”

[kingbase@kes4 zip]$ ./cluster_install.sh help
Do not choose any method, install/expand/shrink!

机房外,晨曦微露。

赵今麦知道,这场关于数据库生命奥秘的探索,才刚刚开始…


金仓数据库产品体验官招募 ING

微信图片_20250307234319.jpg


Have a nice day ~


🌻 往期精彩 ▼

– / END / –

👉 这里可以找到我

👉 这里有得聊

如果对国产基础软件(操作系统、数据库、中间件)感兴趣,可以加群一起聊聊。
关注微信公众号:少安事务所,后台回复[群],即可看到入口。

如果这篇文章为你带来了灵感或启发,请帮忙『三连』吧,感谢!ღ( ´・ᴗ・` )~

最后修改时间:2025-03-10 17:22:54
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论