说明:本系列文章仅用于共享我的学习成果,未经过生产系统考验,对于知识点和一些组件的使用会不定时更新,仅供参考,如有错误的地方,欢迎留言共同学习。
本高可用系列测试不说理论知识,如有需要自行百度,因生产环境大多数是内网环境,无法连接互联网,为模拟生产环境安装,PostgreSQL高可用测试均采用离线部署。
所需软件包均以打包上传百度网盘,如有需要自行下载:https://pan.baidu.com/s/1Tb7GPMvj4kfKEIh8iyvdbA 提取码:n9w2 文件名:PostgreSQL_HA.tar.gz
第一章: 介绍测试环境
第二章: PostgreSQL + replication 部署
第三章: Etcd 部署和管理
第四章: Patroni 部署和管理
第五章: HAProxy + Keepalived 部署和管理
第六章: 高可用模拟故障测试用例
第七章: Prometheus + Grafana 监控部署
第八章: 高可用管理
第八章: 高可用管理
[root@pgtest1 ~]# etcdctl endpoint status --cluster -w table
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 6 | 489460 | 489460 | |
| http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | true | false | 6 | 489460 | 489460 | |
| http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | false | false | 6 | 489460 | 489460 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@pgtest1 ~]# patronictl list
+---------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) --+----+-----------+
| pgtest1 | 192.168.58.10 | Leader | running | 18 | |
| pgtest2 | 192.168.58.11 | Replica | running | 18 | 0 |
| pgtest3 | 192.168.58.12 | Replica | running | 18 | 0 |
+---------+---------------+---------+---------+----+-----------+
复制
1. 主动维护重启服务器时软件的关闭和启动顺序
1. 计划内只重启PostgreSQL主节点(pgtest1)
需要考虑是否将数据库switchover到备节点上,在这个架构上,PostgreSQL主节点同时也是 haproxy 和 keepalived 的主节点,VIP在此节点上运行
# 1. 数据库从 pgtest1 switchover 到 pgtest2
[root@pgtest1 ~]# curl -s http://192.168.58.10:8008/switchover -XPOST -d '{"leader":"pgtest1","candidate":"pgtest2"}'
Successfully switched over to "pgtest2"
# 也可以使用命令 patronictl switchover 进行数据库切换
[root@pgtest1 ~]# patronictl list
+---------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) --+----+-----------+
| pgtest1 | 192.168.58.10 | Replica | running | 19 | 0 |
| pgtest2 | 192.168.58.11 | Leader | running | 19 | |
| pgtest3 | 192.168.58.12 | Replica | running | 19 | 0 |
+---------+---------------+---------+---------+----+-----------+
# 2. pgtest1关闭 keepalived,自动将 VIP 切换到 keepalived 备节点(pgtest2),检查VIP测试业务连接正常
[root@pgtest1 ~]# systemctl stop keepalived
# 3. pgtest1关闭 haproxy
[root@pgtest1 ~]# systemctl stop haproxy
# 4. pgtest1关闭 patroni 和 etcd,patroni 会把 PostgreSQL 自动关闭
[root@pgtest1 ~]# systemctl stop patroni
[root@pgtest1 ~]# systemctl stop etcd
# 5. pgtest1重启服务器
[root@pgtest1 ~]# reboot
# 6. 服务器重启后自动启动 etcd patroni haproxy keepalived exporter(all),启动 patroni 后发现数据库没有启动,会自动拉起数据库,启动 keepalived 后 VIP 会自动漂移回来,检查VIP测试业务连接正常
[root@pgtest1 ~]# etcdctl endpoint status --cluster -w table
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 6 | 473973 | 473973 | |
| http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | true | false | 6 | 473973 | 473973 | |
| http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | false | false | 6 | 473973 | 473973 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@pgtest1 ~]# patronictl list
+---------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) --+----+-----------+
| pgtest1 | 192.168.58.10 | Replica | running | 19 | 0 |
| pgtest2 | 192.168.58.11 | Leader | running | 19 | |
| pgtest3 | 192.168.58.12 | Replica | running | 19 | 0 |
+---------+---------------+---------+---------+----+-----------+
[root@pgtest1 ~]# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
inet 192.168.58.10/24 brd 192.168.58.255 scope global ens33
valid_lft forever preferred_lft forever
inet 192.168.58.20/24 scope global secondary ens33:1
valid_lft forever preferred_lft forever
# 7. 考虑是否将 PostgreSQL 主库切回 pgtest1
[root@pgtest1 ~]# curl -s http://192.168.58.10:8008/switchover -XPOST -d '{"leader":"pgtest2","candidate":"pgtest1"}'
复制
2. 计划内只关闭PostgreSQL备节点(pgtest2)
在这个架构上,PostgreSQL备节点(pgtest2)同时也是 haproxy 和 keepalived 的备节点
# 1. pgtest2关闭keepalived和haproxy
[root@pgtest2 ~]# systemctl stop keepalived
[root@pgtest2 ~]# systemctl stop haproxy
# 2. pgtest2关闭 patroni 和 etcd,patroni 会把PostgreSQL自动关闭
[root@pgtest2 ~]# systemctl stop patroni
[root@pgtest2 ~]# systemctl stop etcd
# 3. pgtest2重启服务器
[root@pgtest2 ~]# reboot
# 4. 服务器重启后自动启动 etcd patroni haproxy keepalived exporter(all),启动 patroni 后发现数据库没有启动,会自动拉起数据库,检查数据库
[root@pgtest2 ~]# etcdctl endpoint status --cluster -w table
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 6 | 489460 | 489460 | |
| http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | true | false | 6 | 489460 | 489460 | |
| http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | false | false | 6 | 489460 | 489460 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@pgtest2 ~]# patronictl list
+---------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) --+----+-----------+
| pgtest1 | 192.168.58.10 | Leader | running | 20 | |
| pgtest2 | 192.168.58.11 | Replica | running | 20 | 0 |
| pgtest3 | 192.168.58.12 | Replica | running | 20 | 0 |
+---------+---------------+---------+---------+----+-----------+
复制
3. 计划内只关闭PostgreSQL备节点(pgtest3)
# 1. pgtest3关闭 patroni 和 etcd,patroni 会把PostgreSQL自动关闭
[root@pgtest3 ~]# systemctl stop patroni
[root@pgtest3 ~]# systemctl stop etcd
# 2. pgtest3重启服务器
[root@pgtest3 ~]# reboot
# 3. 服务器重启后自动启动 etcd patroni prometheus grafana exporter(all),启动 patroni 后发现数据库没有启动,会自动拉起数据库,检查数据库
[root@pgtest3 ~]# etcdctl endpoint status --cluster -w table
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 7 | 489858 | 489858 | |
| http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | false | false | 7 | 489858 | 489858 | |
| http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | true | false | 7 | 489858 | 489858 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
[root@pgtest3 ~]# patronictl list
+---------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) --+----+-----------+
| pgtest1 | 192.168.58.10 | Leader | running | 20 | |
| pgtest2 | 192.168.58.11 | Replica | running | 20 | 0 |
| pgtest3 | 192.168.58.12 | Replica | running | 20 | 0 |
+---------+---------------+---------+---------+----+-----------+
复制
4. 计划内重启其中2个服务器
因为 etcd 在3个节点的情况下,最多允许一个服务器故障,当停止2个服务器的etcd时,数据库将以只读模式运行。
5. 计划内重启所有服务器
# 1. 先关闭 Keepalived VIP,断开所有业务连接
Keepalived 备机:systemctl stop keepalived
Keepalived 主机:systemctl stop keepalived
# 2. 所有服务器 systemctl stop haproxy
# 3. patroni 设置 Disable auto failover
[root@pgtest1 ~]# patronictl pause
# 4. 所有服务器 systemctl stop patroni
[root@pgtest2 ~]# patronictl list
+--------+------+------+-------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) ----+
+--------+------+------+-------+----+-----------+
Maintenance mode: on
# 5. 所有服务器 systemctl stop etcd
# 6. 关闭 PostgreSQL 主库:
[root@pgtest1 ~]# su - postgres
Last login: Sun Oct 31 14:17:52 CST 2021 on pts/4
[postgres@pgtest1 ~]$ pg_ctl stop -m fast -w -D $PGDATA
waiting for server to shut down.... done
server stopped
[postgres@pgtest1 ~]$
# 7. 关闭 PostgreSQL 备库:pg_ctl stop -m fast -w -D $PGDATA
# 8. 关闭所有服务器 init 0
复制
启动所有服务器
# 1. 先启动主库服务器,自动启动 etcd patroni haproxy keepalived(VIP) exporter(all),数据库没有被拉起
# 2. 再启动两个备库服务器,节点2自动启动 etcd patroni haproxy keepalived exporter(all),节点3自动启动 etcd patroni prometheus grafana exporter(all),数据库没有被拉起
# 3. 如果不想数据库起来后直接对外服务,可以考虑关闭所有keepalived(VIP),等需要时再启用
# 4. Resume auto failover,会自动拉起所有数据库,但是 Leader 不一定起到哪个节点上了,好像是有节点权重能控制,不知道是啥
[root@pgtest1 ~]# patronictl resume
Success: cluster management is resumed
[root@pgtest1 ~]# patronictl list
+---------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) --+----+-----------+
| pgtest1 | 192.168.58.10 | Replica | running | 23 | 0 |
| pgtest2 | 192.168.58.11 | Leader | running | 23 | |
| pgtest3 | 192.168.58.12 | Replica | running | 23 | 0 |
+---------+---------------+---------+---------+----+-----------+
复制
--------------------------未完待更----------------------------
最后修改时间:2021-11-04 09:21:05
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
文章被以下合辑收录
评论
相关阅读
外国CTO也感兴趣的开源数据库项目——openHalo
小满未满、
1391次阅读
2025-04-21 16:58:09
9.9 分高危漏洞,尽快升级到 pgAdmin 4 v9.2 进行修复
严少安
383次阅读
2025-04-11 10:43:23
3月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
368次阅读
2025-04-15 14:48:05
转发有奖 | PostgreSQL 16 PGCM高级认证课程直播班招生中!
墨天轮小教习
176次阅读
2025-04-14 15:58:34
中国PostgreSQL培训认证体系新增PGAI应用工程师方向
开源软件联盟PostgreSQL分会
168次阅读
2025-05-06 10:21:13
SQL 优化之 OR 子句改写
xiongcc
133次阅读
2025-04-21 00:08:06
告别老旧mysql_fdw,升级正当时
NickYoung
113次阅读
2025-04-29 11:15:18
融合Redis缓存的PostgreSQL高可用架构
梧桐
112次阅读
2025-04-08 06:35:40
PG生态赢得资本市场青睐:Databricks收购Neon,Supabase融资两亿美元,微软财报点名PG
老冯云数
105次阅读
2025-05-07 10:06:22
PostgreSQL的dblink扩展模块使用方法
szrsu
98次阅读
2025-04-24 17:39:30
TA的专栏
目录