作者
digoal
日期
2020-03-10
标签
PostgreSQL , 回放 , redo , 流复制
背景
流复制相关的冲突、回放控制参数:
- hot_standby_feedback: 从库反馈快照, 上游vacuum时不回收最老快照之后产生的垃圾
- vacuum_defer_cleanup_age: 主动不回收比这个年轻的垃圾
- recovery_min_apply_delay: 主动不replay比这个时间新的wal
- max_standby_streaming_delay: 当startup replay stream wal record时, 如果遇到回复冲突, startup最多等多久
- max_standby_archive_delay: 当startup replay archive(restore command) wal record时, 如果遇到回复冲突, startup最多等多久
几个疑问:
1、冲突了多少次
2、分别是什么冲突导致
3、当前数据库redo回放有没有被冲突
4、谁和redo回放发生了冲突
冲突次数
db1=# select * from pg_stat_database_conflicts ;
datid | datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock
-------+-----------+------------------+------------+----------------+-----------------+----------------
14187 | postgres | 0 | 0 | 0 | 0 | 0
16385 | db_video | 0 | 0 | 0 | 0 | 0
1 | template1 | 0 | 0 | 0 | 0 | 0
14186 | template0 | 0 | 0 | 0 | 0 | 0
16387 | db1 | 0 | 0 | 4 | 0 | 0
17527 | db123 | 0 | 0 | 0 | 0 | 0
(6 rows)
当前是否被冲突
当startup进程的等待事件为空, 表示它被堵塞了.
postgres=# select * from pg_stat_activity where backend_type ='startup' where wait_event is null;
-[ RECORD 1 ]----+---------------------------------
datid |
datname |
pid | 21060
usesysid |
usename |
application_name |
client_addr |
client_hostname |
client_port |
backend_start | 2020-02-29 00:26:28.478013+08
xact_start |
query_start |
state_change |
wait_event_type |
wait_event |
state |
backend_xid |
backend_xmin |
query |
backend_type | startup
当前startup等待事件
当startup在回放wal时, 它在等待什么事件, 通常有io等操作. 这个不是conflict堵塞.
postgres=# select * from pg_stat_activity where backend_type ='startup';
-[ RECORD 1 ]----+------------------------------
datid |
datname |
pid | 21060
usesysid |
usename |
application_name |
client_addr |
client_hostname |
client_port |
backend_start | 2020-02-29 00:26:28.478013+08
xact_start |
query_start |
state_change |
wait_event_type | IO
wait_event | DataFileExtend
state |
backend_xid |
backend_xmin |
query |
backend_type | startup
RecoveryWalAll 通常表示startup进程正在等待wal, 通常此时standby处于未delay状态.
postgres=# select * from pg_stat_activity where backend_type ='startup';
-[ RECORD 1 ]----+------------------------------
datid |
datname |
pid | 21060
usesysid |
usename |
application_name |
client_addr |
client_hostname |
client_port |
backend_start | 2020-02-29 00:26:28.478013+08
xact_start |
query_start |
state_change |
wait_event_type | Activity
wait_event | RecoveryWalAll
state |
backend_xid |
backend_xmin |
query |
backend_type | startup
startup 可能被哪个query堵塞了
通常是时间越早, 越可能是堵塞startup的query.
或者xmin, xid越早, 越可能是堵塞startup的query.
db1=# select *,xact_start,query_start,state,user,query from pg_stat_activity where datname=current_database() and state<>'idle' order by xact_start limit 5;
-[ RECORD 1 ]----+---------------------------------------------------------------------------------------------------------------------------
datid | 16387
datname | db1
pid | 29015
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2020-03-10 19:01:22.577305+08
xact_start | 2020-03-10 19:01:42.257888+08
query_start | 2020-03-10 19:01:43.750416+08
state_change | 2020-03-10 19:01:43.750577+08
wait_event_type | Client
wait_event | ClientRead
state | idle in transaction
backend_xid |
backend_xmin | 4556
query | select * from abc limit 1;
backend_type | client backend
xact_start | 2020-03-10 19:01:42.257888+08
query_start | 2020-03-10 19:01:43.750416+08
state | idle in transaction
user | postgres
query | select * from abc limit 1;
通常可以使用以下query进行查询
1、时间最老
select a.* from
(
select *,row_number() over (partition by state order by xact_start) as rn
from pg_stat_activity
where datname=current_database()
and pid<>pg_backend_pid()
and state<>'idle'
) a,
(
select * from pg_stat_activity where backend_type ='startup' and wait_event is null
)b
where a.rn <= 1
order by a.xact_start;
2、或 (事务号最老)
select a.* from
(
select *,row_number() over (partition by state order by least(backend_xid::text::int8,backend_xmin::text::int8)) as rn
from pg_stat_activity
where datname=current_database()
and pid<>pg_backend_pid()
and state<>'idle'
) a,
(
select * from pg_stat_activity where backend_type ='startup' and wait_event is null
)b
where a.rn <= 1
order by least(a.backend_xid::text::int8,a.backend_xmin::text::int8);
可能堵塞了wal apply 的 query 如下
-[ RECORD 1 ]----+------------------------------
datid | 16387
datname | db1
pid | 30448
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2020-03-10 19:13:36.670184+08
xact_start | 2020-03-10 19:13:38.696822+08
query_start | 2020-03-10 19:13:40.856399+08
state_change | 2020-03-10 19:13:40.85716+08
wait_event_type | Client
wait_event | ClientRead
state | idle in transaction
backend_xid |
backend_xmin | 4561
query | select * from abc limit 1;
backend_type | client backend
rn | 1
注意, 目前只能找到疑似堵塞了startup replay的query, 不精确.
也许未来内核会支持这样的接口, 可以找到精确的堵塞wal replay的query.
堵塞了多少wal没有被replay
db1=# select pg_is_wal_replay_paused(),
pg_last_wal_receive_lsn(),pg_last_wal_replay_lsn(),
pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(),pg_last_wal_replay_lsn()));
pg_is_wal_replay_paused | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_size_pretty
-------------------------+-------------------------+------------------------+----------------
f | 4/BCF4A338 | 4/BCF49EE8 | 1104 bytes
(1 row)
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.