大家好,这次大表哥带来技术分享是 PGPOOL II 这个连接池自身的HA方案: Watch dog.
本文参考官网的 configuration Examples:
https://tatsuo-ishii.github.io/pgpool-II/current/example-cluster.html
Watch dog 是PGPool 的内置的监控进程不需要独立安装,打开相关的参数开关即可。
具体的配置步骤如下:
1)官方建议配置至少3台(奇数个)几点配置Watch dog 进程。
IP | pg pool role | pg instance role |
---|---|---|
10.67.38.50 | LEADER | primary database |
10.67.39.149 | STANDBY | standby database |
10.67.39.49 | STANDBY | standy database |
2)手动创建 pgpool_node_id 文件, 在路径下:${PGPOOL_HOME}/etc
这一步是必须的,否则启动会报错:If watchdog is enable, pgpool_node_id file is required
INFRA [root@wqdcsrv3352 etc]# /opt/postgreSQL/pgpool4.3/bin/pgpool 2022-06-30 14:53:25.867: main pid 69061: 2022-06-30 14:53:25FATAL: Pgpool node id file /opt/postgreSQL/pgpool4.3/etc/pgpool_node_id does not exist 2022-06-30 14:53:25.867: main pid 69061: 2022-06-30 14:53:25DETAIL: If watchdog is enable, pgpool_node_id file is required
INFRA [postgres@wqdcsrv3352 pgpool4.3]# vi pgpool_node_id INFRA [postgres@wqdcsrv3352 pgpool4.3]# cat pgpool_node_id 0 INFRA [postgres@wqdcsrv3354 pgpool]# vi pgpool_node_id INFRA [postgres@wqdcsrv3354 pgpool]# cat pgpool_node_id 1 INFRA [postgres@wqdcsrv3353 pgpool]# vi pgpool_node_id INFRA [postgres@wqdcsrv3353 pgpool]# cat pgpool_node_id 2
3)关于watch dog 核心参数的配置: 配置文件 ${PGPOOL_HOME}/etc/pgpool.conf
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
## watch dog的开关
use_watchdog = on
## watch 的节点 IP,PORT 信息
hostname0 = '10.67.38.50'
wd_port0 = 9000
pgpool_port0 = 9999
hostname1 = '10.67.39.149'
wd_port1 = 9000
pgpool_port1 = 9999
hostname2 = '10.67.39.49'
wd_port2 = 9000
pgpool_port2 = 9999
### 优先级的设置 这里我们设置成一致的 为1 ,
### 没有设置加密方式
### 进程文件设置在了 /tmp下面
wd_priority = 1
wd_authkey = ''
wd_ipc_socket_dir = '/tmp'
###VIP相关的设置 需要根据实际的 网卡设备名称 eth0
### 这个我们PGPOOL 是通过 PG POOL 启动的,所以 不需要 sudo 可以直接执行, 如果是 postgres等普通权限的用户,需要添加 /sbin/sudo
delegate_IP = '10.67.39.200'
if_up_cmd = '/sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = '/sbin/ip addr del $_IP_$/24 dev eth0'
arping_cmd = '/usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
wd_escalation_command = '/opt/postgreSQL/pgpool4.3/etc/escalation.sh' ### 这个脚本的执行发生在挂上VIP之前,需要把所有的节点VIP 卸载一遍,以防止脑裂的发生
###watch dog 的心跳方式配置:
###通过网卡端口监控, 10秒为超时
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
heartbeat_hostname0 = '10.67.38.50'
heartbeat_port0 = 9694
heartbeat_device0 = 'eth0'
heartbeat_hostname1 = '10.67.39.149'
heartbeat_port1 = 9694
heartbeat_device1 = 'eth0'
heartbeat_hostname2 = '10.67.39.49'
heartbeat_port2 = 9694
heartbeat_device2 = 'eth0'
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
- 需要修改一下 escalation.sh 这个shell 文件
修改成实际节点的IP和VIP
#!/bin/bash
# This script is run by wd_escalation_command to bring down the virtual IP on other pgpool nodes
# before bringing up the virtual IP on the new active pgpool node.
set -o xtrace
PGPOOLS=(10.67.38.50 10.67.39.149 10.67.39.49)
HOSTNAME=`hostname -i`
VIP=10.67.39.200
DEVICE=eth0
for pgpool in "${PGPOOLS[@]}"; do
[ "$HOSTNAME" = "$pgpool" ] && continue
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$pgpool -i ~/.ssh/id_rsa_pgpool "
/usr/bin/sudo /sbin/ip addr del $VIP/24 dev $DEVICE
"
done
exit 0
5)依次启动3个节点的 watch dog (这里我们用大权限的账户 root ), 观察 watch dog的 节点状态
INFRA [root@wqdcsrv3352 etc]# /opt/postgreSQL/pgpool4.3/bin/pgpool INFRA [root@wqdcsrv3353 etc]# /opt/postgreSQL/pgpool4.3/bin/pgpool INFRA [root@wqdcsrv3354 etc]# /opt/postgreSQL/pgpool4.3/bin/pgpool INFRA [postgres@wqdcsrv3352 etc]# /opt/postgreSQL/pgpool4.3/bin/pcp_watchdog_info -h localhost -p 9898 -U postgres Password: 3 3 YES 10.67.38.50:9999 Linux wqdcsrv3352.cn.infra 10.67.38.50 10.67.38.50:9999 Linux wqdcsrv3352.cn.infra 10.67.38.50 9999 9000 4 LEADER 0 MEMBER 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 9999 9000 7 STANDBY 0 MEMBER 10.67.39.49:9999 Linux wqdcsrv3353.cn.infra 10.67.39.49 9999 9000 7 STANDBY 0 MEMBER
6)手动测试VIP 漂移
我们尝试关闭 LEADER 节点上的PGPOOL : 10.67.38.50
INFRA [root@wqdcsrv3352 ~]# /opt/postgreSQL/pgpool4.3/bin/pgpool -m fast stop 2022-06-29 17:34:33.552: main pid 76158: 2022-06-29 17:34:33LOG: stop request sent to pgpool (pid: 73720). waiting for termination... .done.
我们可以从pgpool 的日志中,看到VIP已经被移除了
2022-06-29 17:34:33.552: main pid 73720: 2022-06-29 17:34:33LOG: terminating all child processes 2022-06-29 17:34:33.556: watchdog pid 73723: 2022-06-29 17:34:33LOG: Watchdog is shutting down 2022-06-29 17:34:33.556: watchdog_utility pid 76159: 2022-06-29 17:34:33LOG: watchdog: de-escalation started 2022-06-29 17:34:33.558: watchdog_utility pid 76159: 2022-06-29 17:34:33LOG: successfully released the delegate IP:"10.67.39.200" 2022-06-29 17:34:33.558: watchdog_utility pid 76159: 2022-06-29 17:34:33DETAIL: 'if_down_cmd' returned with success 2022-06-29 17:34:33.559: main pid 73720: 2022-06-29 17:34:33LOG: Pgpool-II system is shutdown
INFRA [root@wqdcsrv3352 ~]# ip addr | grep 10.67.39.200
我们这个时候发现VIP 10.67.39.200 已经漂移到了节点 149:
INFRA [root@wqdcsrv3354 ~]# ip addr |grep 10.67.39.200 inet 10.67.39.200/24 scope global eth0:0
此时我们再次观察PGPOOL 集群的状态:
INFRA [root@wqdcsrv3354 ~]# /opt/postgreSQL/pgpool4.3/bin/pcp_watchdog_info -h localhost -p 9898 -U postgres Password: 3 3 YES 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 9999 9000 4 LEADER 0 MEMBER 10.67.38.50:9999 Linux wqdcsrv3352.cn.infra 10.67.38.50 9999 9000 10 SHUTDOWN 0 MEMBER 10.67.39.49:9999 Linux wqdcsrv3353.cn.infra 10.67.39.49 9999 9000 7 STANDBY 0 MEMBER
我们手动起来 10.67.38.50 节点上的 pgpool:
INFRA [root@wqdcsrv3352 ~]# /opt/postgreSQL/pgpool4.3/bin/pgpool -f /opt/postgreSQL/pgpool4.3/etc/pgpool.conf -F /opt/postgreSQL/pgpool4.3/etc/pcp.conf > /tmp/pgpool.log
这个时候10.67.38.50 的节点 恢复成了 standby 的状态:
3 3 YES 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 9999 9000 4 LEADER 0 MEMBER 10.67.38.50:9999 Linux wqdcsrv3352.cn.infra 10.67.38.50 9999 9000 7 STANDBY 0 MEMBER 10.67.39.49:9999 Linux wqdcsrv3353.cn.infra 10.67.39.49 9999 9000 7 STANDBY 0 MEMBER
下面我们来用springboot的小程序测试一下 PGPOOL 的HA :
我们的程序的数据库配置是 这次是连接到 VIP 10.67.39.200 上面
连接池的配置是 最大允许 200 个连接,最小空闲的连接数是10
//数据库的地址以及端口号 spring.datasource.url=jdbc:postgresql://10.67.39.200:9999/postgres spring.datasource.username=app_ha_user spring.datasource.password=app_ha_user spring.datasource.driverClassName=org.postgresql.Driver spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.PostgreSQLDialect spring.jpa.properties.hibernate.hbm2ddl.auto=update spring.datasource.maximum-pool-size=200 spring.datasource.min-idle=10
测试生成很简单,就是网页输出一下数据库的IP地址:
String sql = "select inet_server_addr()"; // 通过jdbcTemplate查询数据库 String hostname = (String)jdbcTemplate.queryForObject( sql, String.class); return "Hello ,you are connecting to " + hostname;
启动测试程序后,观察初始化的连接池:
postgres=> show pool_processes; pool_pid | start_time | client_connection_count | database | username | backend_connection_time | pool_counter | status ----------+------------------------------------------------------+-------------------------+----------+-------------+-------------------------+--------------+--------------------- 80665 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80666 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 64166 | 2022-06-30 11:11:51 | 2 | | | | | Wait for connection 80668 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80669 | 2022-06-29 17:30:21 | 1 | | | | | Wait for connection 65854 | 2022-06-30 11:27:54 | 0 | | | | | Wait for connection 80671 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 80672 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 80673 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80674 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80675 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80676 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80677 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80678 | 2022-06-29 17:30:21 | 1 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 80679 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 80680 | 2022-06-29 17:30:21 (4:15 before process restarting) | 1 | | | | | Wait for connection 80681 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80682 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80683 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80684 | 2022-06-29 17:30:21 | 2 | | | | | Wait for connection 80685 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80686 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80687 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80688 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80689 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80690 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80691 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80692 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80693 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80694 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80695 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:30:30 | 1 | Execute command 80696 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection
尝试访问web界面: http://127.0.0.1:8066/ 我们可以看到 50这个节点是主库
关闭VIP所在节点的PGPOOL :
INFRA [root@wqdcsrv3354 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:ae:f7:e7 brd ff:ff:ff:ff:ff:ff inet 10.67.39.149/22 brd 10.67.39.255 scope global eth0 valid_lft forever preferred_lft forever inet 10.67.39.200/24 scope global eth0:0 valid_lft forever preferred_lft forever INFRA [root@wqdcsrv3354 ~]# /opt/postgreSQL/pgpool4.3/bin/pgpool -m fast stop 2022-06-30 11:34:08.821: main pid 66716: 2022-06-30 11:34:08LOG: stop request sent to pgpool (pid: 80655). waiting for termination... .done.
这个时候我们看到VIP 漂移到了 49这个节点:
INFRA [postgres@wqdcsrv3353 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:ae:99:07 brd ff:ff:ff:ff:ff:ff inet 10.67.39.49/22 brd 10.67.39.255 scope global eth0 valid_lft forever preferred_lft forever inet 10.67.39.200/24 scope global eth0:0 valid_lft forever preferred_lft forever
再次访问网页: http://127.0.0.1:8066/ 我们可以看到通过VIP 依然可以连接到 主库 50 这个节点.
我们可以观察到 测试程序的连接池进行了重连
2022-06-30 11:39:01,161 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@51849c83 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,164 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@46432d85 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,168 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@33d6c14a (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,169 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@64116a2c (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,169 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@6641e508 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,170 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@3931c340 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,170 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@3ce889bd (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,171 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@5347c406 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,172 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@3a4712c3 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,172 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@ddbed08 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.