暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

MogDB 5.0.6版本新增特性介绍(enable_async_standby_promotion)

原创 天成 2024-04-21
391

在2024年3月30号,MogDB发布了最新版的5.0.6版本,引入了一个比较有意思的小特性,CM新增参数enable_async_standby_promotion来控制两节点部署模式下是否允许异步备机升主,提高用户对HA行为的控制。这个需求也是源于一些用户场景的实际需求洞察。

CM两节点部署模式必备参数。特殊场景下(如同步备机变为异步备机)是否允许异步备机升主。基于数据库集群服务可用性考虑,该参数默认开启。

这里我们通过实际测试,来为大家演示一下该特性的效果究竟如何。

准备环境

1.新建别名:alias c=“cm_ctl query -Cvid”,方便操作。
2.数据库架构是一主一从,备库是异步备。

[omm@test01 ~]$ c [ CMServer State ] node node_ip instance state ------------------------------------------------------------------ 1 test01 192.168.1.206 1 /data/mogdb5.0/cm/cm_server Standby 2 test02 192.168.1.207 2 /data/mogdb5.0/cm/cm_server Primary [ Cluster State ] cluster_state : Normal redistributing : No balanced : Yes current_az : AZ_ALL [ Datanode State ] node node_ip instance state | node node_ip instance state -------------------------------------------------------------------------------------------------------------------------------------------- 1 test01 192.168.1.206 6001 /data/mogdb5.0/data P Primary Normal | 2 test02 192.168.1.207 6002 /data/mogdb5.0/data S Standby Normal [omm@test01 ~]$

备库是异步备:

MogDB=# select node_name,sync_priority,sync_state,usename,client_addr,state,sender_sent_location,receiver_replay_location from dbe_perf.get_global_replication_stat(); node_name | sync_priority | sync_state | usename | client_addr | state | sender_sent_location | receiver_replay_location --------------+---------------+------------+---------+-------------+-----------+----------------------+-------------------------- dn_6001_6002 | 0 | Async | omm | 192.168.1.207 | Streaming | 1B6/E179E268 | 1B6/E179E268 (1 row) MogDB=#

enable_async_standby_promotion为默认值(on)

[omm@test01 ~]$ c [ CMServer State ] node node_ip instance state ------------------------------------------------------------------ 1 test01 192.168.1.206 1 /data/mogdb5.0/cm/cm_server Standby 2 test02 192.168.1.207 2 /data/mogdb5.0/cm/cm_server Primary [ Cluster State ] cluster_state : Normal redistributing : No balanced : Yes current_az : AZ_ALL [ Datanode State ] node node_ip instance state | node node_ip instance state -------------------------------------------------------------------------------------------------------------------------------------------- 1 test01 192.168.1.206 6001 /data/mogdb5.0/data P Primary Normal | 2 test02 192.168.1.207 6002 /data/mogdb5.0/data S Standby Normal [omm@test01 ~]$ [omm@test01 ~]$ ps -fu omm UID PID PPID C STIME TTY TIME CMD omm 6756 1 0 4月15 ? 00:00:00 /usr/lib/systemd/systemd --user omm 6759 6756 0 4月15 ? 00:00:00 (sd-pam) omm 6806 1 2 4月15 ? 00:36:11 /data/mogdb5.0/app/5.0.5/bin/om_monitor -L /opt/mogdb5.0/log/cm/om_monitor omm 7739 7098 0 4月15 ? 00:00:49 sshd: omm@pts/0 omm 7744 7739 0 4月15 pts/0 00:00:00 -bash omm 15883 15878 0 4月15 pts/0 00:00:00 -bash omm 1621240 6806 17 14:33 ? 00:01:26 /data/mogdb5.0/app/5.0.5/bin/cm_agent omm 1621272 1 52 14:33 ? 00:04:27 /data/mogdb5.0/app/5.0.5/bin/mogdb -D /data/mogdb5.0/data -M pending omm 1621281 1 0 14:33 ? 00:00:00 mogdb fenced UDF master process omm 1621296 1 6 14:33 ? 00:00:30 /data/mogdb5.0/app/5.0.5/bin/cm_server omm 1623618 1623613 0 14:34 pts/0 00:00:00 -bash omm 1633444 1621240 0 14:42 ? 00:00:00 arping -D -f -w 1 -I bond0 192.168.1.208 omm 1633452 1623618 0 14:42 pts/0 00:00:00 ps -fu omm [omm@test01 ~]$ kill -9 1621272 <<<<<<<<<<<<< 杀主库数据库进程 [omm@test01 ~]$ [omm@test01 ~]$

查看数据库节点是否有变化

[Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:26.088, RTO: Unknown, Insert with id 147 , els: 2(ms) [Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:27.092, RTO: Unknown, Insert with id 148 , els: 1(ms) [Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:28.097, RTO: Unknown, Insert with id 149 , els: 1(ms) [Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:29.101, RTO: Unknown, Insert with id 150 , els: 1(ms) [Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:30.105, RTO: Unknown, Insert with id 151 , els: 2(ms) [Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:31.109, RTO: Unknown, Insert with id 152 , els: 2(ms) [Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:32.114, RTO: Unknown, Insert with id 153 , els: 1(ms) [Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:33.118, RTO: Unknown, Insert with id 154 , els: 1(ms) [Thd#001, dn_6001@14:33:52] 2024-04-16 14:42:34.123, RTO: Unknown, Insert with id 155 , els: 2(ms) [Thd#001, ****ERROR***** ] 2024-04-16 14:42:35.128, RTO: Unknown, Error in SQL:insert into jdbc_reconnect_test1 values (156,1713249755123), els: 4(ms) [Thd#001, ****ERROR***** ] 2024-04-16 14:42:35.128, RTO: Unknown, Insert failed org.opengauss.util.PSQLException: [192.168.1.207:45992/192.168.1.208:26000] socket is not closed; Sending Urgent packet failed, detail: 断开的管道 (Write failed). An I/O error occured while sending to the backend.detail:EOF Exception; , els: 0(ms) [Thd#001, ] 2024-04-16 14:42:35.128, RTO: Unknown, Thread timeout, exit this thread, els: 0(ms) [WatchDog thread ] 2024-04-16 14:42:37.087, Other threads are stuck, starting new thread: #002 [Thd#002, ] 2024-04-16 14:42:37.088, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 0(ms) [WatchDog thread ] 2024-04-16 14:42:39.922, Other threads are stuck, starting new thread: #003 [Thd#003, ] 2024-04-16 14:42:39.923, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 1(ms) [Thd#002, ] 2024-04-16 14:42:42.095, RTO: Unknown, Thread timeout, exit this thread, els: 5007(ms) [WatchDog thread ] 2024-04-16 14:42:42.757, Other threads are stuck, starting new thread: #004 [Thd#004, ] 2024-04-16 14:42:42.757, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 0(ms) [Thd#004, ] 2024-04-16 14:42:43.790, RTO: Unknown, Connected to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 1033(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:43.797, RTO: Unknown, Insert with id 157 , els: 2(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:43.799, RTO: Unknown, Reconnect success, last data in table is:id=155 lastUpdate=1713249754118, els: 0(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:43.802, RTO: 9681 ms, RTO is :9681ms , els: 1(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:44.809, RTO: 9681 ms, Insert with id 158 , els: 4(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:45.816, RTO: 9681 ms, Insert with id 159 , els: 3(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:46.823, RTO: 9681 ms, Insert with id 160 , els: 4(ms) [Thd#003, ] 2024-04-16 14:42:47.149, RTO: 9681 ms, current thread ID: 3, last thread ID:4, els: 7225(ms) [Thd#003, ] 2024-04-16 14:42:47.149, RTO: 9681 ms, Thread timeout, exit this thread, els: 0(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:47.829, RTO: 9681 ms, Insert with id 161 , els: 3(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:48.836, RTO: 9681 ms, Insert with id 162 , els: 3(ms) [Thd#004, dn_6002@14:33:55] 2024-04-16 14:42:49.843, RTO: 9681 ms, Insert with id 163 , els: 3(ms)

数据库主节点发生了变化,dn_6001变成dn_6002,检查如下:

[omm@test01 ~]$ c [ CMServer State ] node node_ip instance state ------------------------------------------------------------------ 1 test01 192.168.1.206 1 /data/mogdb5.0/cm/cm_server Standby 2 test02 192.168.1.207 2 /data/mogdb5.0/cm/cm_server Primary [ Cluster State ] cluster_state : Normal redistributing : No balanced : No current_az : AZ_ALL [ Datanode State ] node node_ip instance state | node node_ip instance state -------------------------------------------------------------------------------------------------------------------------------------------- 1 test01 192.168.1.206 6001 /data/mogdb5.0/data P Standby Normal | 2 test02 192.168.1.207 6002 /data/mogdb5.0/data S Primary Normal [omm@test01 ~]$

调整enable_async_standby_promotion参数为off

[omm@test01 ~]$ cm_ctl set --param --server -k enable_async_standby_promotion=off cm_ctl: set cm_server.conf success. [omm@test01 ~]$ cm_ctl reload --param --server cm_ctl: reload cm_server.conf success. [omm@test01 ~]$ [omm@test01 ~]$ cm_ctl list --param --server|egrep -i "cms_enable_failover_on2nodes|third_party_gateway_ip|protect_standby|enable_async_standby_promotion" third_party_gateway_ip = 192.168.1.1 cms_enable_failover_on2nodes = on enable_async_standby_promotion = off third_party_gateway_ip = 192.168.1.1 cms_enable_failover_on2nodes = on enable_async_standby_promotion = off [omm@test01 ~]$ [omm@test01 ~]$ c [ CMServer State ] node node_ip instance state ------------------------------------------------------------------ 1 test01 192.168.1.206 1 /data/mogdb5.0/cm/cm_server Standby 2 test02 192.168.1.207 2 /data/mogdb5.0/cm/cm_server Primary [ Cluster State ] cluster_state : Normal redistributing : No balanced : Yes current_az : AZ_ALL [ Datanode State ] node node_ip instance state | node node_ip instance state -------------------------------------------------------------------------------------------------------------------------------------------- 1 test01 192.168.1.206 6001 /data/mogdb5.0/data P Primary Normal | 2 test02 192.168.1.207 6002 /data/mogdb5.0/data S Standby Normal [omm@test01 ~]$

杀主库数据库进程:

[omm@test01 ~]$ ps -fu omm UID PID PPID C STIME TTY TIME CMD omm 6756 1 0 4月15 ? 00:00:00 /usr/lib/systemd/systemd --user omm 6759 6756 0 4月15 ? 00:00:00 (sd-pam) omm 6806 1 2 4月15 ? 00:36:22 /data/mogdb5.0/app/5.0.5/bin/om_monitor -L /opt/mogdb5.0/log/cm/om_monitor omm 7739 7098 0 4月15 ? 00:00:49 sshd: omm@pts/0 omm 7744 7739 0 4月15 pts/0 00:00:00 -bash omm 15883 15878 0 4月15 pts/0 00:00:00 -bash omm 1621240 6806 17 14:33 ? 00:02:32 /data/mogdb5.0/app/5.0.5/bin/cm_agent omm 1621281 1 0 14:33 ? 00:00:00 mogdb fenced UDF master process omm 1621296 1 5 14:33 ? 00:00:52 /data/mogdb5.0/app/5.0.5/bin/cm_server omm 1623618 1623613 0 14:34 pts/0 00:00:00 -bash omm 1633890 1 51 14:42 ? 00:03:09 /data/mogdb5.0/app/5.0.5/bin/mogdb -D /data/mogdb5.0/data -M pending omm 1643092 1623618 0 14:48 pts/0 00:00:00 ps -fu omm [omm@test01 ~]$ [omm@test01 ~]$ kill -9 1633890 [omm@test01 ~]$ [omm@test01 ~]$ [omm@test01 ~]$ c [ CMServer State ] node node_ip instance state ------------------------------------------------------------------ 1 test01 192.168.1.206 1 /data/mogdb5.0/cm/cm_server Standby 2 test02 192.168.1.207 2 /data/mogdb5.0/cm/cm_server Primary [ Cluster State ] cluster_state : Unavailable redistributing : No balanced : No current_az : AZ_ALL [ Datanode State ] node node_ip instance state | node node_ip instance state -------------------------------------------------------------------------------------------------------------------------------------------- 1 test01 192.168.1.206 6001 /data/mogdb5.0/data P Unknown Unknown | 2 test02 192.168.1.207 6002 /data/mogdb5.0/data S Standby Need repair(Connecting) [omm@test01 ~]$ c [ CMServer State ] node node_ip instance state ------------------------------------------------------------------ 1 test01 192.168.1.206 1 /data/mogdb5.0/cm/cm_server Standby 2 test02 192.168.1.207 2 /data/mogdb5.0/cm/cm_server Primary [ Cluster State ] cluster_state : Normal redistributing : No balanced : Yes current_az : AZ_ALL [ Datanode State ] node node_ip instance state | node node_ip instance state -------------------------------------------------------------------------------------------------------------------------------------------- 1 test01 192.168.1.206 6001 /data/mogdb5.0/data P Primary Normal | 2 test02 192.168.1.207 6002 /data/mogdb5.0/data S Standby Normal [omm@test01 ~]$

备库没有升主,检查数据插入情况:

[Thd#001, dn_6001@14:42:45] 2024-04-16 14:48:54.407, RTO: Unknown, Insert with id 26 , els: 2(ms) [Thd#001, dn_6001@14:42:45] 2024-04-16 14:48:55.412, RTO: Unknown, Insert with id 27 , els: 2(ms) [Thd#001, dn_6001@14:42:45] 2024-04-16 14:48:56.417, RTO: Unknown, Insert with id 28 , els: 1(ms) [Thd#001, dn_6001@14:42:45] 2024-04-16 14:48:57.422, RTO: Unknown, Insert with id 29 , els: 2(ms) [Thd#001, dn_6001@14:42:45] 2024-04-16 14:48:58.427, RTO: Unknown, Insert with id 30 , els: 2(ms) [Thd#001, dn_6001@14:42:45] 2024-04-16 14:48:59.433, RTO: Unknown, Insert with id 31 , els: 3(ms) [Thd#001, dn_6001@14:42:45] 2024-04-16 14:49:00.438, RTO: Unknown, Insert with id 32 , els: 1(ms) [Thd#001, ****ERROR***** ] 2024-04-16 14:49:01.443, RTO: Unknown, Error in SQL:insert into jdbc_reconnect_test1 values (33,1713250141438), els: 5(ms) [Thd#001, ****ERROR***** ] 2024-04-16 14:49:01.444, RTO: Unknown, Insert failed org.opengauss.util.PSQLException: [192.168.1.207:41664/192.168.1.208:26000] socket is not closed; Sending Urgent packet failed, detail: 断开的管道 (Write failed). An I/O error occured while sending to the backend.detail:EOF Exception; , els: 0(ms) [Thd#001, ] 2024-04-16 14:49:01.444, RTO: Unknown, Thread timeout, exit this thread, els: 0(ms) [WatchDog thread ] 2024-04-16 14:49:03.925, Other threads are stuck, starting new thread: #002 [Thd#002, ] 2024-04-16 14:49:03.925, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 0(ms) [WatchDog thread ] 2024-04-16 14:49:06.760, Other threads are stuck, starting new thread: #003 [Thd#003, ] 2024-04-16 14:49:06.760, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 0(ms) [WatchDog thread ] 2024-04-16 14:49:09.594, Other threads are stuck, starting new thread: #004 [Thd#004, ] 2024-04-16 14:49:09.595, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 0(ms) [WatchDog thread ] 2024-04-16 14:49:12.429, Other threads are stuck, starting new thread: #005 [Thd#005, ] 2024-04-16 14:49:12.430, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 1(ms) [WatchDog thread ] 2024-04-16 14:49:15.264, Other threads are stuck, starting new thread: #006 [Thd#006, ] 2024-04-16 14:49:15.264, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 0(ms) [WatchDog thread ] 2024-04-16 14:49:18.099, Other threads are stuck, starting new thread: #007 [Thd#007, ] 2024-04-16 14:49:18.099, RTO: Unknown, Attempt connect to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 0(ms) [Thd#007, ] 2024-04-16 14:49:18.117, RTO: Unknown, Connected to database: jdbc:opengauss://192.168.1.208:26000/postgres, els: 18(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:18.124, RTO: Unknown, Insert with id 34 , els: 2(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:18.127, RTO: Unknown, Reconnect success, last data in table is:id=32 lastUpdate=1713250140434, els: 0(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:18.129, RTO:17693 ms, RTO is :17693ms , els: 0(ms) [Thd#006, ] 2024-04-16 14:49:18.429, RTO:17693 ms, current thread ID: 6, last thread ID:7, els: 3163(ms) [Thd#006, ] 2024-04-16 14:49:18.429, RTO:17693 ms, Thread timeout, exit this thread, els: 0(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:19.136, RTO:17693 ms, Insert with id 35 , els: 3(ms) [Thd#005, ] 2024-04-16 14:49:19.629, RTO:17693 ms, current thread ID: 5, last thread ID:7, els: 7199(ms) [Thd#005, ] 2024-04-16 14:49:19.629, RTO:17693 ms, Thread timeout, exit this thread, els: 0(ms) [Thd#002, ] 2024-04-16 14:49:19.790, RTO:17693 ms, current thread ID: 2, last thread ID:7, els: 15864(ms) [Thd#002, ] 2024-04-16 14:49:19.790, RTO:17693 ms, Thread timeout, exit this thread, els: 0(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:20.142, RTO:17693 ms, Insert with id 36 , els: 3(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:21.148, RTO:17693 ms, Insert with id 37 , els: 3(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:22.155, RTO:17693 ms, Insert with id 38 , els: 3(ms) [Thd#003, ] 2024-04-16 14:49:22.351, RTO:17693 ms, current thread ID: 3, last thread ID:7, els: 15590(ms) [Thd#003, ] 2024-04-16 14:49:22.351, RTO:17693 ms, Thread timeout, exit this thread, els: 0(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:23.161, RTO:17693 ms, Insert with id 39 , els: 3(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:24.167, RTO:17693 ms, Insert with id 40 , els: 3(ms) [Thd#004, ] 2024-04-16 14:49:24.911, RTO:17693 ms, current thread ID: 4, last thread ID:7, els: 15316(ms) [Thd#004, ] 2024-04-16 14:49:24.911, RTO:17693 ms, Thread timeout, exit this thread, els: 0(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:25.172, RTO:17693 ms, Insert with id 41 , els: 2(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:26.177, RTO:17693 ms, Insert with id 42 , els: 2(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:27.183, RTO:17693 ms, Insert with id 43 , els: 2(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:28.189, RTO:17693 ms, Insert with id 44 , els: 3(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:29.195, RTO:17693 ms, Insert with id 45 , els: 3(ms) [Thd#007, dn_6001@14:49:08] 2024-04-16 14:49:30.202, RTO:17693 ms, Insert with id 46 , els: 4(ms)

主库的节点没有发生变化,还是在dn_6001上。

应用场景

一些特定场景下,用户可以控制异步备库是否可以升主,必要的时候可以进行人工干预切换;而不是让集群管理软件来选择。

参考

MogDB 5.0.6新特性

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论