大家好,今天我们聊一聊PG16的新特性: logical decoding on standby server.
官方文档:
业界著名厂商crunchydata 早在今年6月份的blog中也公布这个PG16的新特性:
1.可以在只读的standby server上进行逻辑解码了
2.减少了primary server的压力
3.为下游数据同步和系统审计提供了新的高可用的方式
我们来手动做下面的一个实验:
Server IP | port | PG version | DB role |
---|---|---|---|
10.67.38.170 | 6100 | 16.1 | Primary |
10.67.38.171 | 6100 | 16.1 | Standby (streaming replication) |
10.67.38.173 | 6100 | 16.1 | Primary (logical replication from standby) |
我们选用2节点170和171作为1主1从的stream replication: wal_level是logical
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------
PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), 64-bit
(1 row)
postgres=# show wal_level;
wal_level
-----------
logical
(1 row)
postgres=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backen
d_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag
| flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+--------------+-----------------+-------------+---------------
----------------+--------------+-----------+-----------+-----------+-----------+------------+-----------------
+-----------------+-----------------+---------------+------------+-------------------------------
56985 | 16384 | repmgr | pref171 | 10.67.38.171 | | 47208 | 2023-12-10 17:
17:43.222998+08 | | streaming | 0/E004B38 | 0/E004B38 | 0/E004B38 | 0/E004B38 | 00:00:00.000321
| 00:00:00.000743 | 00:00:00.000809 | 0 | async | 2023-12-10 17:18:17.885619+08
(1 row)
我们先登录主库: 创建一个全新的数据库,同步的表,以及同步的账户
postgres=# create database db_pg16_local_decode template template0 encoding UTF8 lc_collate 'C' lc_ctype 'C.UTF-8';
CREATE DATABASE
postgres=# \c db_pg16_local_decode
You are now connected to database "db_pg16_local_decode" as user "postgres".
postgres=# create user logical_user replication login connection limit 10 encrypted password 'logical_user';
CREATE ROLE
postgres=# create table t_order(id int not null primary key , good_name varchar(200), location varchar(20));
CREATE TABLE
postgres=# grant select on t_order to logical_user;
GRANT
我们继续在主库创建一个发布者 publication:
db_pg16_local_decode=# CREATE PUBLICATION pg16_pub FOR TABLE t_order;
CREATE PUBLICATION
db_pg16_local_decode=# select * from pg_publication;
oid | pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot
-------+----------+----------+--------------+-----------+-----------+-----------+-------------+------------
16627 | pg16_pub | 10 | f | t | t | t | t | f
(1 row)
我们登陆streaming replication的从库,设置一下 hot_standby_feedback = on
(这个参数的打开,会产生一些负面的影响,比如影响主库的vacuum , 可以参考 https://www.cybertec-postgresql.com/en/what-hot_standby_feedback-in-postgresql-really-does/ )
postgres=# alter system set hot_standby_feedback = 'on';
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# show hot_standby_feedback;
hot_standby_feedback
----------------------
on
(1 row)
我们登录reporting的PG实例,创建目标数据库和同步的目标表
postgres=# create database pg16_target_report_db template template0 encoding UTF8 LC_Collate 'C' LC_CTYPE 'C.UTF-8';
CREATE DATABASE
postgres=# \c pg16_target_report_db
You are now connected to database "pg16_target_report_db" as user "postgres".
pg16_target_report_db=# create table t_order(id int not null primary key , good_name varchar(200), location varchar(20));
CREATE TABLE
接下来,我们创建一个订阅者: 这里我们指定的host的地址是standby server
pg16_target_report_db=# CREATE SUBSCRIPTION pg16_sub CONNECTION 'host=10.67.38.171 port=6100 dbname=db_pg16_local_decode application_name=logical_replication user=logical_user password=logical_user' publication pg16_pub;
...
这个命令会卡主,我们需要在主库上执行:
db_pg16_local_decode=# SELECT pg_log_standby_snapshot();
pg_log_standby_snapshot
-------------------------
0/10AF1340
(1 row)
这时reporting的PG,会返回创建订阅者成功的消息:
NOTICE: created replication slot "pg16_sub" on publisher
CREATE SUBSCRIPTION
这个时候,我们登陆standby database , 查看复制流的信息:
postgres=# SELECT pid, application_name, state, sync_state FROM pg_stat_replication;
pid | application_name | state | sync_state
-------+---------------------+-----------+------------
27777 | logical_replication | streaming | async
(1 row)
查看standby database上的逻辑复制槽的信息
postgres=# select slot_name,slot_type,database ,confirmed_flush_lsn from pg_replication_slots ;
slot_name | slot_type | database | confirmed_flush_lsn
-----------+-----------+----------------------+---------------------
pg16_sub | logical | db_pg16_local_decode | 0/10B1A950
(1 row)
我们来插入数据测试一下:
登陆主库
db_pg16_local_decode=# insert into t_order (id, good_name,location) values (1,'豆腐脑','TJ');
INSERT 0 1
db_pg16_local_decode=# insert into t_order (id, good_name,location) values (2,'煎饼果子','TJ');
INSERT 0 1
db_pg16_local_decode=# insert into t_order (id, good_name,location) values (3,'卤煮','BJ');
INSERT 0 1
db_pg16_local_decode=# insert into t_order (id, good_name,location) values (4,'小笼包','SH');
INSERT 0 1
登陆reporting 的PG: 我们发现数据已经同步到位
pg16_target_report_db=# select * from t_order;
id | good_name | location
----+-----------+----------
1 | 豆腐脑 | TJ
2 | 煎饼果子 | TJ
3 | 卤煮 | BJ
4 | 小笼包 | SH
(4 rows)
下面我们测试另外一个场景:
PG16之前的版本只能在主库创建逻辑槽,当数据库发生switchover或者failover的时候,新promote的主库上之前逻辑槽并不存在,需要DBA重新创建。
下面我们测试一下,数据库一旦发生主从切换,standby database promote成为主库之后,逻辑复制槽依然正常工作的场景。
我们手动进行一个switchover: 这里我们采用repmgr的swithover的命令
INFRA [postgres@ljzdccapp007 repmgr]# /opt/pgsql-16/bin/repmgr standby switchover -f /data/prod6100/repmgr/repmgr.conf --siblings-follow
NOTICE: executing switchover on node "pref171" (ID: 2)
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "pref171" (ID: 2) will be promoted to primary; current primary "pref170" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "pref170" (ID: 1)
NOTICE: issuing CHECKPOINT on node "pref170" (ID: 1)
DETAIL: executing server command "/opt/pgsql-16/bin/pg_ctl -D '/data/prod6100/data' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/11000028
NOTICE: promoting standby to primary
DETAIL: promoting server "pref171" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "pref171" (ID: 2) was successfully promoted to primary
NOTICE: node "pref171" (ID: 2) promoted to primary, node "pref170" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "pref171" is now primary and node "pref170" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
我们查看复制集的状态:我们看到主从已经切换成功
INFRA [postgres@ljzdccapp007 repmgr]# /opt/pgsql-16/bin/repmgr -f /data/prod6100/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------
1 | pref170 | standby | running | pref171 | default | 100 | 3 | host=10.67.38.170 port=6100 user=repmgr dbname=repmgr connect_timeout=2
2 | pref171 | primary | * running | | default | 100 | 4 | host=10.67.38.171 port=6100 user=repmgr dbname=repmgr connect_timeout=2
我们去新的主库观察逻辑复制槽:confirmed_flush_lsn 一直在不停的变化说明同步一直在进行
postgres=# select slot_name,slot_type,database ,confirmed_flush_lsn from pg_replication_slots ;
slot_name | slot_type | database | confirmed_flush_lsn
-----------+-----------+----------------------+---------------------
pg16_sub | logical | db_pg16_local_decode | 0/1100A3D8
(1 row)
postgres=# select slot_name,slot_type,database ,confirmed_flush_lsn from pg_replication_slots ;
slot_name | slot_type | database | confirmed_flush_lsn
-----------+-----------+----------------------+---------------------
pg16_sub | logical | db_pg16_local_decode | 0/1100AE58
(1 row)
我们新插入一些数据进行测试:
postgres=# \c db_pg16_local_decode
You are now connected to database "db_pg16_local_decode" as user "postgres".
db_pg16_local_decode=# insert into t_order (id, good_name,location) values (5,'车仔面','HK');
INSERT 0 1
db_pg16_local_decode=# insert into t_order (id, good_name,location) values (6,'蚵仔煎','TW');
INSERT 0 1
我们查看目标reporting的PG实例,车仔面和蚵仔煎已经同步到位 🙂
pg16_target_report_db=# select * from t_order;
id | good_name | location
----+-----------+----------
1 | 豆腐脑 | TJ
2 | 煎饼果子 | TJ
3 | 卤煮 | BJ
4 | 小笼包 | SH
5 | 车仔面 | HK
6 | 蚵仔煎 | TW
(6 rows)
这个时候可能有的同学会问,同步的架构又变成了 主库 logical decoding 到 reporting 数据库:
那么想要继续恢复 standby logical decoding 到 reporting 数据库的架构该如何做呢?
大致需要如下步骤:
1.我们创建一下disable状态的订阅者指向new standby WITH (enabled=false,copy_data=false)
pg16_target_report_db=# CREATE SUBSCRIPTION pg16_sub_new CONNECTION 'host=10.67.38.170 port=6100 dbname=db_pg16_local_decode application_name=logical_replication user=logical_user password=logical_user' publication pg16_pub WITH (enabled=false,copy_data=false);
NOTICE: created replication slot "pg16_sub_new" on publisher
CREATE SUBSCRIPTION
- 我们暂停指向主库的订阅者
pg16_target_report_db=# alter SUBSCRIPTION pg16_sub disable;
ALTER SUBSCRIPTION
3.查看同步的LSN的位置: 记录下LSN = 0/113449D8
confirmed_flush_lsn:表示订阅者消费wal中的LSN的位置点
db_pg16_local_decode=# select slot_name,slot_type,database ,confirmed_flush_lsn from pg_replication_slots ;
slot_name | slot_type | database | confirmed_flush_lsn
-----------+-----------+----------------------+---------------------
pg16_sub | logical | db_pg16_local_decode | 0/113449D8
(1 row)
我们可以在这个时候,模拟一些数据插入
db_pg16_local_decode=# insert into t_order (id, good_name,location) values (8,'叉烧饭','GZ');
INSERT 0 1
4.我们设置新的订阅者的 original LSN的位置是 0/113449D8
pg16_target_report_db=# select roident, subname, roname from pg_subscription sub, pg_replication_origin ro where 'pg_' || sub.oid = ro.roname;
roident | subname | roname
---------+--------------+----------
1 | pg16_sub | pg_24744
2 | pg16_sub_new | pg_24746
(2 rows)
pg16_target_report_db=# select pg_replication_origin_advance('pg_24750', '0/113449D8'::pg_lsn);
pg_replication_origin_advance
-------------------------------
(1 row)
5.激活新的订阅者
pg16_target_report_db=# alter SUBSCRIPTION pg16_sub_new enable;
ALTER SUBSCRIPTION
6.查看数据同步: 我们发现叉烧饭已经同步过来
pg16_target_report_db=# select * from t_order;
id | good_name | location
----+-----------+----------
1 | 豆腐脑 | TJ
2 | 煎饼果子 | TJ
3 | 卤煮 | BJ
4 | 小笼包 | SH
5 | 车仔面 | HK
6 | 蚵仔煎 | TW
8 | 叉烧饭 | GZ
(7 rows)
7.观察new standby 上的逻辑槽的运行状态正常
db_pg16_local_decode=# select slot_name,slot_type,database ,confirmed_flush_lsn from pg_replication_slots ;
slot_name | slot_type | database | confirmed_flush_lsn
--------------+-----------+----------------------+---------------------
pg16_sub_new | logical | db_pg16_local_decode | 0/113748A8
(1 row)
db_pg16_local_decode=# select slot_name,slot_type,database ,confirmed_flush_lsn from pg_replication_slots ;
slot_name | slot_type | database | confirmed_flush_lsn
--------------+-----------+----------------------+---------------------
pg16_sub_new | logical | db_pg16_local_decode | 0/11374A78
(1 row)
8.最后我们来删除指向主库的逻辑复制订阅者
pg16_target_report_db=# drop SUBSCRIPTION pg16_sub;
NOTICE: dropped replication slot "pg16_sub" on publisher
DROP SUBSCRIPTION
最后我们看看关于逻辑槽的监控:
在实际工作中,逻辑槽的消费订阅者有可能是不同版本的PG数据库,也有可能是flink CDC 这种程序来消费。
我们可以监控逻辑槽中的LSN的位置:
replication_lag 查询 from主库:pg_current_wal_lsn() - confirmed_flush_lsn
replication_lag 查询 from从库:pg_last_wal_replay_lsn() - confirmed_flush_lsn
db_pg16_local_decode=# select slot_name,slot_type,database,pg_last_wal_replay_lsn(),confirmed_flush_lsn, pg_last_wal_replay_lsn() - confirmed_flush_lsn AS replication_lag FROM pg_replication_slots;
slot_name | slot_type | database | pg_last_wal_replay_lsn | confirmed_flush_lsn | replication_lag
--------------+-----------+----------------------+------------------------+---------------------+-----------------
pg16_sub_new | logical | db_pg16_local_decode | 0/113F6890 | 0/113F6890 | 0
(1 row)
我们也可以监控一下复制集中wal 的 replay time:
db_pg16_local_decode=# select usename,application_name ,now() - reply_time as replay_lag from pg_stat_replication;
usename | application_name | replay_lag
--------------+---------------------+-----------------
logical_user | logical_replication | 00:00:01.475631
(1 row)
最后我们总结一下:
1)PG 16版本开始支持 standby logical decoding, 当 standby promote 成主库的时候,逻辑复制一般情况下不受影响(我们测试了switchover的情况)。
2)逻辑复制如果想要从primary上迁移到standby 上的步骤:
a.创建一个新的standby 的订阅者,初始状态是disable的,并且设置copy_data=false
b.停止原先指向主库的订阅者,并记录下之前同步截止的LSN (confirmed_flush_lsn from pg_replication_slots)
c.通过函数pg_replication_origin_advance 设置 standby 订阅者同步起点LSN的位置。(pg_replication_origin_advance 这个函数还可以解决逻辑同步冲突的问题)
d.激活standby 订阅者
e.查看standby订阅者的状态
f.删除老的指向primary的订阅者。
3)从LSN 和 replay_time 2个维度,减少了逻辑复制的延时lag的监控。
Have a fun 🙂 !
本文配图来自于 PG 业界著名厂商 Crunchy data。
参考文档 References:
https://www.postgresql.org/docs/current/sql-createsubscription.html#SQL-CREATESUBSCRIPTION-NOTES
https://www.crunchydata.com/blog/data-to-go-postgres-logical-replication
https://www.crunchydata.com/blog/logical-replication-on-standbys-in-postgres-16
https://www.percona.com/blog/how-to-start-logical-replication-in-postgresql-for-specific-tables-based-on-a-pg_dump/
https://www.postgresql.fastware.com/blog/addressing-replication-conflicts-using-alter-subscription-skip
https://www.postgresql.fastware.com/blog/how-to-handle-logical-replication-conflicts-in-postgresql