暂无图片
暂无图片
6
暂无图片
暂无图片
3
暂无图片

Halo数据库的高可用解决方案Halo Shield

原创 贾桂军 2023-12-05
20924

1 Halo Shield的介绍

Halo Shield 是一套自动管理 Halo 主从数据库的高可用解决方案。该方案包括以下几个组成部分。

Etcd 分布式键值数据库

Patroni 流复制创建、管理、监控和自动故障转移进程

 

Etcd 最少需要三个节点且为奇数来进行 leader 选举。一般可以和Halo数据库部署在相同的服务器上。Etcd 集群之间会进行心跳检测,当发生网络故障时,节点数少的etcd集群会不可用,从而避免脑裂。

 

Patroni 运用 etcd 集群存储、检测Halo 主从节点的状态与配置信息,当由于故障使得某个节点无法工作,Patroni会自动侦测故障,并通过 etcd 更新节点信息,从而通知所有其他节点。如果该故障节点是主节点,Patroni 还会自动切换,并重新绑定VIP到新的主节点,从而减少对应用系统的影响。

 

Patroni 通过连接 etcd 对其它节点做心跳检测,当主节点无法更新etcd中的leader lock,patroni会终止当前节点的Halo数据库从而避免脑裂。 


2 Halo Shield的安装

软件安装

Halo的产品安装包里已带有Halo Shield的完整功能。因此,Halo数据库正确安装后,意味着Halo Shield也已经安装完成。Halo的具体安装过程请参见第2章。

 

3 Halo Shield的配置

物理复制环境规划和搭建

Halo Shield 可以在已有的物理复制环境之上进行配置。建议根据第10章手动搭建物理复制环境。

Halo Shield 依赖于 etcd 分布式键值数据库。如果主从数据库超过3台服务器,建议在其中3台上配置并运行etcd服务。

 

ETCD 配置

确保 HALO_BASE 环境变量已设置,在所有需要运行 etcd 服务的服务器上用 halo用户执行 etcd_config.sh,输入etcd 将要运行的所有服务器 IP 地址。

vi /home/halo/.bash_profile

export HALO_BASE=/u01/app/halo

 

$HALO_BASE/product/shield/etcd/v3.5.2/conf/etcd_config.sh

Starting to run etcd configuration setup

Please set HALO_BASE environment variables before proceed

Press y/Y to continue, any other key to cancel

y

 

Please input IP list where etcd cluster will be running

e.g. 192.168.1.1,192.168.1.2,192.168.1.3

10.16.16.155,10.16.16.156,10.16.16.157

 

10.16.16.155 will be used as node IP

 

Initializing done.

 

Following steps to be done manually.

Run following commands as root

ln -s /u01/app/halo/product/shield/etcd/v3.5.2/conf/etcd.service /usr/lib/systemd/system/etcd.service

 

根据输出信息,在所有运行 etcd 服务的服务器上用 root 用户创建自启动服务连接

ln -s /u01/app/halo/product/shield/etcd/v3.5.2/conf/etcd.service /usr/lib/systemd/system/etcd.service

 

Patroni 配置

确保 HALO_BASE、HALO_HOME、PGDATA 环境变量已设置,在所有主备服务器上用halo用户执行 patroni_config.sh,输入etcd 将要运行的所有服务器 IP 地址、输入当前服务器名和VIP信息。

$HALO_BASE/product/shield/patroni/conf/patroni_config.sh

Starting to run patroni configuration setup

Please set HALO_BASE, HALO_HOME, PGDATA environment variables before proceed

Press y/Y to continue, any other key to cancel

y

 

Please input IP list where etcd cluster will be running

e.g. 192.168.1.1,192.168.1.2,192.168.1.3

10.16.16.155,10.16.16.156,10.16.16.157

 

Input current node name

Default: node1

 

 

Input VIP for the HA cluster

10.16.16.199

10.16.16.199 will be used as VIP

 

Input network interface to bind the VIP

Default: eth0 

 

eth0 will be used as network interface

 

Input VIP netmask

Default: 255.255.255.0 

 

255.255.255.0 will be used as VIP netmask

 

Input VIP broadcast address

Default: 10.16.16.255

 

10.16.16.255 will be used as VIP broadcast address

 

Initialize python ...

 

Python initializing done.

 

 

Initializing done.

Following steps to be done manually.

1. Add the following line into the end of .bash_profile of user halo

export PATH=/u01/app/halo/product/shield/patroni/python/bin:$PATH

export PATRONICTL_CONFIG_FILE=/u01/app/halo/product/shield/patroni/conf/patroni_halo.yml

 

2. Run following commands as root

ln -s /u01/app/halo/product/shield/patroni/conf/patroni.service /usr/lib/systemd/system/patroni.service

 

3. Add the following line to /etc/sudoers

halo    ALL=(ALL)       NOPASSWD: /usr/sbin/ip, /usr/bin/arping, /usr/sbin/iptables

 

根据输出信息,在所有主备服务器上执行以下步骤:

1. 在 halo 用户的 .bash_profile 中加入

export PATH=/u01/app/halo/product/shield/patroni/python/bin:$PATH

export PATRONICTL_CONFIG_FILE=/u01/app/halo/product/shield/patroni/conf/patroni_halo.yml

 

2. 用 root 用户创建自启动服务连接

ln -s /u01/app/halo/product/shield/patroni/conf/patroni.service /usr/lib/systemd/system/patroni.service

 

3. 用 root 用户在 /etc/sudoers 中加入以下行

halo    ALL=(ALL)       NOPASSWD: /usr/sbin/ip, /usr/bin/arping, /usr/sbin/iptables

 

创建数据库管理用户

在主库上创建以下用户

psql

create user patroni SUPERUSER password 'patroni';

 

创建watchdog服务(可选)

使用 root 执行以下命令

yum install -y watchdog

modprobe softdog

chown halo /dev/watchdog

systemctl start watchdog

systemctl enable watchdog

 

4 Halo Shield的使用

启动ETCD服务

在所有配置了etcd服务的服务器上用root启动服务

systemctl start etcd

确保至少同时在2台服务器上启动etcd服务,如果只在1台服务器上启动,etcd会因为找不到其他服务器而启动失败

 

如需开机自启动etcd服务,用root执行以下命令

systemctl enable etcd

 

查询etcd运行状态

$HALO_BASE/product/shield/etcd/v3.5.2/etcdctl endpoint status --cluster=true -w table

+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

|         ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |

+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

| http://10.16.16.155:2379 | 1c3f7702e434565f |   3.5.2 |   37 kB |      true |      false |         2 |         54 |                 54 |        |

| http://10.16.16.157:2379 | 698a77fbbe16ea8a |   3.5.2 |   37 kB |     false |      false |         2 |         54 |                 54 |        |

| http://10.16.16.156:2379 | c67e61e43ed4f1d4 |   3.5.2 |   37 kB |     false |      false |         2 |         54 |                 54 |        |

+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

启动patroni服务

在所有主备服务器上用root启动patroni服务

systemctl start patroni

 

如需开机自启动patroni服务,用root执行以下命令

systemctl enable patroni

 

主备节点查询

patronictl list

+--------+-------------------+---------+---------+----+-----------+

| Member | Host              | Role    | State   | TL | Lag in MB |

+ Cluster: halo-cluster (7078872291747273152) ---+----+-----------+

| node1  | 10.16.16.155:1921 | Leader  | running |  2 |           |

| node2  | 10.16.16.156:1921 | Replica | running |  2 |         0 |

| node3  | 10.16.16.157:1921 | Replica | running |  2 |         0 |

+--------+-------------------+---------+---------+----+-----------+

 

停止自动主备切换

patronictl pause

Success: cluster management is paused

[halo@node1 ~]$ patronictl list

+--------+-------------------+---------+---------+----+-----------+

| Member | Host              | Role    | State   | TL | Lag in MB |

+ Cluster: halo-cluster (7078872291747273152) ---+----+-----------+

| node1  | 10.16.16.155:1921 | Leader  | running |  2 |           |

| node2  | 10.16.16.156:1921 | Replica | running |  2 |         0 |

| node3  | 10.16.16.157:1921 | Replica | running |  2 |         0 |

+--------+-------------------+---------+---------+----+-----------+

 Maintenance mode: on

 

手动主备切换

patronictl switchover

Master [node1]:

Candidate ['node2', 'node3'] []: node2

When should the switchover take place (e.g. 2022-05-10T13:57 )  [now]:

Current cluster topology

+--------+-------------------+---------+---------+----+-----------+-----------------+

| Member | Host              | Role    | State   | TL | Lag in MB | Pending restart |

+ Cluster: halo-cluster (7078872291747273152) ---+----+-----------+-----------------+

| node1  | 10.16.16.155:1921 | Leader  | running |  2 |           |                 |

| node2  | 10.16.16.156:1921 | Replica | running |  2 |         0 |                 |

| node3  | 10.16.16.157:1921 | Replica | running |  2 |         0 | *               |

+--------+-------------------+---------+---------+----+-----------+-----------------+

 Maintenance mode: on

Are you sure you want to switchover cluster halo-cluster, demoting current master node1? [y/N]: y

2022-05-10 12:57:57.44645 Successfully switched over to "node2"

+--------+-------------------+---------+---------+----+-----------+-----------------+

| Member | Host              | Role    | State   | TL | Lag in MB | Pending restart |

+ Cluster: halo-cluster (7078872291747273152) ---+----+-----------+-----------------+

| node1  | 10.16.16.155:1921 | Replica | stopped |    |   unknown |                 |

| node2  | 10.16.16.156:1921 | Leader  | running |  2 |           |                 |

| node3  | 10.16.16.157:1921 | Replica | running |  2 |         0 | *               |

+--------+-------------------+---------+---------+----+-----------+-----------------+

 Maintenance mode: on

 

重启其中一个节点

patronictl restart halo-cluster node3

+--------+-------------------+---------+---------+----+-----------+-----------------+

| Member | Host              | Role    | State   | TL | Lag in MB | Pending restart |

+ Cluster: halo-cluster (7078872291747273152) ---+----+-----------+-----------------+

| node1  | 10.16.16.155:1921 | Replica | running |  3 |         0 |                 |

| node2  | 10.16.16.156:1921 | Leader  | running |  3 |           |                 |

| node3  | 10.16.16.157:1921 | Replica | running |  2 |         0 | *               |

+--------+-------------------+---------+---------+----+-----------+-----------------+

 Maintenance mode: on

When should the restart take place (e.g. 2022-05-10T13:58)  [now]:

Are you sure you want to restart members node3? [y/N]: y

Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)  []:

Success: restart on member node3

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
1人已赞赏
5
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

星星之火
暂无图片
4月前
评论
暂无图片 0
高性能低侵入,金仓KFS助力老系统焕新生高性能低侵入,金仓KFS助力老系统焕新生高性能低侵入,金仓KFS助力老系统焕新生
4月前
暂无图片 点赞
评论
筱悦星辰
暂无图片
7月前
评论
暂无图片 0
行走世间,顺心与否,不只在于遇到了什么样的人和事,更在于我们看事的高度和处事的格局。胸怀广博,目光深远,踏实笃定往前走,便会心宽路也宽。
7月前
暂无图片 点赞
评论
冷狼
暂无图片
1年前
评论
暂无图片 0
Halo Shield 是一套自动管理 Halo 主从数据库的高可用解决方案
1年前
暂无图片 点赞
评论