暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 物理流复制从库 - 冲突判定, 谁堵塞了wal replay, 等了多久

digoal 2020-03-10
877

作者

digoal

日期

2020-03-10

标签

PostgreSQL , 回放 , redo , 流复制


背景

流复制相关的冲突、回放控制参数:

  • hot_standby_feedback: 从库反馈快照, 上游vacuum时不回收最老快照之后产生的垃圾
  • vacuum_defer_cleanup_age: 主动不回收比这个年轻的垃圾
  • recovery_min_apply_delay: 主动不replay比这个时间新的wal
  • max_standby_streaming_delay: 当startup replay stream wal record时, 如果遇到回复冲突, startup最多等多久
  • max_standby_archive_delay: 当startup replay archive(restore command) wal record时, 如果遇到回复冲突, startup最多等多久

几个疑问:
1、冲突了多少次
2、分别是什么冲突导致
3、当前数据库redo回放有没有被冲突
4、谁和redo回放发生了冲突

冲突次数

db1=# select * from pg_stat_database_conflicts ; datid | datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock -------+-----------+------------------+------------+----------------+-----------------+---------------- 14187 | postgres | 0 | 0 | 0 | 0 | 0 16385 | db_video | 0 | 0 | 0 | 0 | 0 1 | template1 | 0 | 0 | 0 | 0 | 0 14186 | template0 | 0 | 0 | 0 | 0 | 0 16387 | db1 | 0 | 0 | 4 | 0 | 0 17527 | db123 | 0 | 0 | 0 | 0 | 0 (6 rows)

当前是否被冲突

当startup进程的等待事件为空, 表示它被堵塞了.

postgres=# select * from pg_stat_activity where backend_type ='startup' where wait_event is null; -[ RECORD 1 ]----+--------------------------------- datid | datname | pid | 21060 usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | 2020-02-29 00:26:28.478013+08 xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type | startup

当前startup等待事件

当startup在回放wal时, 它在等待什么事件, 通常有io等操作. 这个不是conflict堵塞.

postgres=# select * from pg_stat_activity where backend_type ='startup'; -[ RECORD 1 ]----+------------------------------ datid | datname | pid | 21060 usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | 2020-02-29 00:26:28.478013+08 xact_start | query_start | state_change | wait_event_type | IO wait_event | DataFileExtend state | backend_xid | backend_xmin | query | backend_type | startup

RecoveryWalAll 通常表示startup进程正在等待wal, 通常此时standby处于未delay状态.

postgres=# select * from pg_stat_activity where backend_type ='startup'; -[ RECORD 1 ]----+------------------------------ datid | datname | pid | 21060 usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | 2020-02-29 00:26:28.478013+08 xact_start | query_start | state_change | wait_event_type | Activity wait_event | RecoveryWalAll state | backend_xid | backend_xmin | query | backend_type | startup

startup 可能被哪个query堵塞了

通常是时间越早, 越可能是堵塞startup的query.

或者xmin, xid越早, 越可能是堵塞startup的query.

db1=# select *,xact_start,query_start,state,user,query from pg_stat_activity where datname=current_database() and state<>'idle' order by xact_start limit 5; -[ RECORD 1 ]----+--------------------------------------------------------------------------------------------------------------------------- datid | 16387 datname | db1 pid | 29015 usesysid | 10 usename | postgres application_name | psql client_addr | client_hostname | client_port | -1 backend_start | 2020-03-10 19:01:22.577305+08 xact_start | 2020-03-10 19:01:42.257888+08 query_start | 2020-03-10 19:01:43.750416+08 state_change | 2020-03-10 19:01:43.750577+08 wait_event_type | Client wait_event | ClientRead state | idle in transaction backend_xid | backend_xmin | 4556 query | select * from abc limit 1; backend_type | client backend xact_start | 2020-03-10 19:01:42.257888+08 query_start | 2020-03-10 19:01:43.750416+08 state | idle in transaction user | postgres query | select * from abc limit 1;

通常可以使用以下query进行查询

1、时间最老

select a.* from ( select *,row_number() over (partition by state order by xact_start) as rn from pg_stat_activity where datname=current_database() and pid<>pg_backend_pid() and state<>'idle' ) a, ( select * from pg_stat_activity where backend_type ='startup' and wait_event is null )b where a.rn <= 1 order by a.xact_start;

2、或 (事务号最老)

select a.* from ( select *,row_number() over (partition by state order by least(backend_xid::text::int8,backend_xmin::text::int8)) as rn from pg_stat_activity where datname=current_database() and pid<>pg_backend_pid() and state<>'idle' ) a, ( select * from pg_stat_activity where backend_type ='startup' and wait_event is null )b where a.rn <= 1 order by least(a.backend_xid::text::int8,a.backend_xmin::text::int8);

可能堵塞了wal apply 的 query 如下

-[ RECORD 1 ]----+------------------------------ datid | 16387 datname | db1 pid | 30448 usesysid | 10 usename | postgres application_name | psql client_addr | client_hostname | client_port | -1 backend_start | 2020-03-10 19:13:36.670184+08 xact_start | 2020-03-10 19:13:38.696822+08 query_start | 2020-03-10 19:13:40.856399+08 state_change | 2020-03-10 19:13:40.85716+08 wait_event_type | Client wait_event | ClientRead state | idle in transaction backend_xid | backend_xmin | 4561 query | select * from abc limit 1; backend_type | client backend rn | 1

注意, 目前只能找到疑似堵塞了startup replay的query, 不精确.

也许未来内核会支持这样的接口, 可以找到精确的堵塞wal replay的query.

堵塞了多少wal没有被replay

db1=# select pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(),pg_last_wal_replay_lsn(), pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(),pg_last_wal_replay_lsn())); pg_is_wal_replay_paused | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_size_pretty -------------------------+-------------------------+------------------------+---------------- f | 4/BCF4A338 | 4/BCF49EE8 | 1104 bytes (1 row)

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论