使用逻辑复制时,会在主库创建复制槽来记录逻辑复制的相关信息,但是复制槽的信息不会被同步到备库,当数据库发生主备切换时,原备库没有复制槽信息,复制槽只能重建,导致故障期间的数据无法通过逻辑复制同步给订阅端。
有一个办法是在创建复制槽后,把主库$PGDATA/pg_replslot目录中的文件拷贝到备库,并重启备库,即可在备库中看到该复制槽
但是这又会产生一个新的问题,wal日志的清理依赖复制槽中的restart_lsn,备库的复制槽又不会更新,就导致备库的wal日志一直无法清理,借鉴patroni中使用函数pg_replication_slot_advance来推进备库复制槽的lsn,编写了以下脚本
脚本功能
读取主库复制槽的restart_lsn和pg_stat_replication中备库信息,使用函数pg_replication_slot_advance推进备库复制槽的lsn
脚本使用示例
执行脚本前,备库复制槽restart_lsn为从主库拷贝时的值,且restart_lsn不会更新,wal日志无法清理
[postgres@db pg_wal]$ ls -ltr |head -10 total 11763788 -rw------- 1 postgres postgres 16777216 Jun 6 11:08 000000010000000000000005.partial -rw------- 1 postgres postgres 41 Jun 6 11:08 00000002.history -rw------- 1 postgres postgres 83 Jun 6 14:20 00000003.history -rw------- 1 postgres postgres 16777216 Jun 6 14:20 000000020000000000000006 -rw------- 1 postgres postgres 16777216 Jun 6 16:18 000000030000000000000006 -rw------- 1 postgres postgres 16777216 Jun 6 16:18 000000030000000000000007.partial -rw------- 1 postgres postgres 125 Jun 6 16:18 00000004.history -rw------- 1 postgres postgres 16777216 Jun 6 16:35 000000040000000000000007 -rw------- 1 postgres postgres 167 Jun 6 16:35 00000005.history [postgres@db pg_wal]$ psql psql (12.7) Type "help" for help. postgres=# checkpoint ; CHECKPOINT postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn -----------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- idcsync_0f99_26 | pgoutput | logical | 16546 | idc_test | f | f | | | 2284 | 0/6050080 | 0/60500B9
复制
执行脚本后,备库复制槽restart_lsn更新,wal日志可以正常清理
[postgres@db pg_wal]$ ls -ltr |head -10 total 18874472 -rw------- 1 postgres postgres 41 Jun 6 11:08 00000002.history -rw------- 1 postgres postgres 83 Jun 6 14:20 00000003.history -rw------- 1 postgres postgres 125 Jun 6 16:18 00000004.history -rw------- 1 postgres postgres 167 Jun 6 16:35 00000005.history -rw------- 1 postgres postgres 209 Jun 6 16:39 00000006.history -rw------- 1 postgres postgres 251 Jun 6 16:45 00000007.history -rw------- 1 postgres postgres 16777216 Jun 12 09:06 0000000700000002000000C7 -rw------- 1 postgres postgres 16777216 Jun 12 09:07 0000000700000002000000C8 -rw------- 1 postgres postgres 16777216 Jun 12 09:07 0000000700000002000000C9 [postgres@db-19-65 pg_wal]$ du -sh 19G . [postgres@db pg_wal]$ psql psql (12.7) Type "help" for help. postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn -----------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- idcsync_0f99_26 | pgoutput | logical | 16546 | idc_test | f | f | | | 4622 | 2/C7E10468 | 2/C7E11770 (1 row)
复制
脚本运行日志
2023-06-12__11:06:15-- begin ...,vip_list_file /home/postgres/pg_replication_slot_advance_vip.list 2023-06-12__11:06:15--main-- get slots_list,ip 192.168.1.11:5433,sql [select string_agg(concat_ws('|',slot_name,restart_lsn,database),' ') from pg_replication_slots;] 2023-06-12__11:06:15--main-- slots_list [[idcsync_0f99_26|2/C7E11770|idc_test]] 2023-06-12__11:06:15--main-- get standby_list,ip 192.168.1.11:5433,sql [select string_agg(replace(application_name,'_',':'),' ') from pg_stat_replication where usename='repl';] 2023-06-12__11:06:15--main-- standby_list [[192.168.1.13:5432]] 2023-06-12__11:06:15--slot_advance-- set restart_lsn,ip 192.168.1.13:5432: slot_name idcsync_0f99_26,lsn 2/C7E11770; 2023-06-12__11:06:15--slot_advance-- sql [select case pg_is_in_recovery() when true then pg_replication_slot_advance('idcsync_0f99_26','2/C7E11770') end] (idcsync_0f99_26,2/C7E11770) 2023-06-12__11:06:15--clear_log clear log /home/postgres/pg_replication_slot_advance_log.20230605 rm: cannot remove ‘/home/postgres/pg_replication_slot_advance_log.20230605’: No such file or directory 2023-06-12__11:06:15-- end ...
复制
测试运行正常后添加定时任务,每20分钟执行一次该脚本
[postgres@db ~]$ crontab -l */20 * * * * . ~/.bash_profile; sh ~/pg_replication_slot_advance.sh
复制
查看运行日志
2023-06-14__09:40:01-- begin ...,vip_list_file /home/postgres/pg_replication_slot_advance_vip.list 2023-06-14__09:40:01--main-- get slots_list,ip 192.168.1.11:5433,sql [select string_agg(concat_ws('|',slot_name,restart_lsn,database),' ') from pg_replication_slots;] 2023-06-14__09:40:01--main-- slots_list [[idcsync_0f99_26|7/478DFEE0|idc_test]] 2023-06-14__09:40:01--main-- get standby_list,ip 192.168.1.11:5433,sql [select string_agg(replace(application_name,'_',':'),' ') from pg_stat_replication where usename='repl';] 2023-06-14__09:40:01--main-- standby_list [[192.168.1.13:5432]] 2023-06-14__09:40:01--slot_advance-- set restart_lsn,ip 192.168.1.13:5432: slot_name idcsync_0f99_26,lsn 7/478DFEE0; 2023-06-14__09:40:01--slot_advance-- sql [select case pg_is_in_recovery() when true then pg_replication_slot_advance('idcsync_0f99_26','7/478DFEE0') end] (idcsync_0f99_26,7/478DFEE0) 2023-06-14__09:40:02--clear_log clear log /home/postgres/pg_replication_slot_advance_log/pg_replication_slot_advance_log.20230607 rm: cannot remove ‘/home/postgres/pg_replication_slot_advance_log/pg_replication_slot_advance_log.20230607’: No such file or directory 2023-06-14__09:40:02-- end ...
复制
查看备库复制槽已更新,且wal日志正常清理
[postgres@db ~]$ psql psql (12.7) Type "help" for help. postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn -----------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- idcsync_0f99_26 | pgoutput | logical | 16546 | idc_test | f | f | | | 18627 | 7/46B02B50 | 7/478DFEE0 (1 row) [postgres@db pg_wal]$ ls -ltr |head -10 total 901156 -rw------- 1 postgres postgres 41 Jun 6 11:08 00000002.history -rw------- 1 postgres postgres 83 Jun 6 14:20 00000003.history -rw------- 1 postgres postgres 125 Jun 6 16:18 00000004.history -rw------- 1 postgres postgres 167 Jun 6 16:35 00000005.history -rw------- 1 postgres postgres 209 Jun 6 16:39 00000006.history -rw------- 1 postgres postgres 251 Jun 6 16:45 00000007.history -rw------- 1 postgres postgres 16777216 Jun 12 21:02 000000070000000700000046 -rw------- 1 postgres postgres 16777216 Jun 13 15:00 000000070000000700000047 -rw------- 1 postgres postgres 16777216 Jun 13 15:00 000000070000000700000048
复制
脚本如下
pg_replication_slot_advance.sh
#!/bin/bash
#set standby pg_replication_slots restart_lsn follow master
path=`pwd`
vip_list=$path/pg_replication_slot_advance_vip.list
log_name=$path/pg_replication_slot_advance_log
write_log(){
echo "`date +%Y-%m-%d__%H:%M:%S`--$*"
}
#connect to pg
pg_con(){
user=$1
host=$2
dbname=$3
passwd="$4"
sql="$5"
psql "postgresql://$user:$passwd@$host/$dbname" -Atc "$sql"
}
#set standby pg_replication_slots restart_lsn ues function "pg_replication_slot_advance"
slot_advance(){
user=$1
host=$2
dbname=$3
passwd="$4"
slot_name="$5"
lsn="$6"
sql="select case pg_is_in_recovery() when true then pg_replication_slot_advance('$slot_name','$lsn') end"
write_log "${FUNCNAME}-- set restart_lsn,ip $host:$port slot_name $slot_name,lsn $lsn;"
write_log "${FUNCNAME}-- sql [$sql]"
pg_con $user $host $dbname $passwd "$sql"
}
clear_log(){
clear_log_time=`date -d '7 day ago' +%Y%m%d`
write_log "${FUNCNAME} clear log $log_name.$clear_log_time"
rm $log_name.$clear_log_time
}
main(){
#get slot list(slot_name,restart_lsn,database)
sql="select string_agg(concat_ws('|',slot_name,restart_lsn,database),' ') from pg_replication_slots where slot_type='logical';"
write_log "${FUNCNAME}-- get slots_list,ip $host,sql [$sql]"
slots_list=`pg_con $user $host $dbname $passwd "$sql"`
write_log "${FUNCNAME}-- slots_list [[$slots_list]]"
if [ "$slots_list" ];then
#get standby list(ip,port)
sql="select string_agg(replace(application_name,'_',':'),' ') from pg_stat_replication where usename='repl' and application_name<>'pg_basebackup';"
write_log "${FUNCNAME}-- get standby_list,ip $host,sql [$sql]"
standby_list=`pg_con $user $host $dbname $passwd "$sql"`
write_log "${FUNCNAME}-- standby_list [[$standby_list]]"
for i in $slots_list
do
slot_name=`echo $i|awk -F '|' '{print $1}'`
lsn=`echo $i|awk -F '|' '{print $2}'`
dbname=`echo $i|awk -F '|' '{print $3}'`
for host in $standby_list
do
slot_advance $user $host $dbname $passwd $slot_name $lsn
done
done
else
write_log "${FUNCNAME} $vip slots count is 0"
fi
}
log_time=`date +%Y%m%d`
write_log " begin ...,vip_list_file $vip_list" >>$log_name.$log_time
for vip in `cat $vip_list`
do
host=$vip
user=username
dbname=database
passwd="password"
main >>$log_name.$log_time 2>&1
done
clear_log >>$log_name.$log_time 2>&1
write_log " end ..." >>$log_name.$log_time
复制
参考文档
Patroni 如何解决 PostgreSQL 集群中的逻辑复制槽故障转移问题 (percona.com)
patroni/patroni/postgresql/slots.py at master · zalando/patroni · GitHub
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
外国CTO也感兴趣的开源数据库项目——openHalo
小满未满、
507次阅读
2025-04-21 16:58:09
9.9 分高危漏洞,尽快升级到 pgAdmin 4 v9.2 进行修复
严少安
357次阅读
2025-04-11 10:43:23
3月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
337次阅读
2025-04-15 14:48:05
openHalo问世,全球首款基于PostgreSQL兼容MySQL协议的国产开源数据库
严少安
311次阅读
2025-04-07 12:14:29
转发有奖 | PostgreSQL 16 PGCM高级认证课程直播班招生中!
墨天轮小教习
152次阅读
2025-04-14 15:58:34
墨天轮PostgreSQL认证证书快递已发(2025年3月批)
墨天轮小教习
131次阅读
2025-04-03 11:43:25
SQL 优化之 OR 子句改写
xiongcc
95次阅读
2025-04-21 00:08:06
融合Redis缓存的PostgreSQL高可用架构
梧桐
90次阅读
2025-04-08 06:35:40
PostgreSQL拓展PGQ实现解析
chirpyli
87次阅读
2025-04-07 11:23:17
Mysql/Oracle/Postgresql快速批量生成百万级测试数据sql
hongg
75次阅读
2025-04-07 15:32:54