暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

为什么pg_basebackup或pg_start_backup好像hang住确没有开始拷贝文件 - checkpoint 的几种调度(checkpoint_completion_target)

digoal 2019-01-07
724

作者

digoal

日期

2019-01-07

标签

PostgreSQL , checkpoint , 调度 , lazy , immediate , pg_start_backup , pg_basebackup


背景

PostgreSQL支持在线全量备份与增量归档备份。在线全量备份实际上就是拷贝文件,增量备份则分为两种,一种是基于BLOCK lsn变化的BLOCK即增量备份,另一种是基于WAL的持续归档文件备份。

全量备份通常使用pg_basebackup客户端实现,或者使用SQL函数pg_start_backup()+COPY文件、打快照的方式实现。

全量备份开启前,需要对数据库做一次checkpoint,并强制开启full page write,确保partial block在后续可以通过wal进行恢复。备份结束时通过pg_stop_backup告知,关闭full page write(如果参数开启了FPW则不受影响)。

有时你可能会发现使用pg_basebackup或pg_start_backup接口时,好像hang住确没有开始拷贝文件。实际上是在做checkpoint,但是为什么这个checkpoint比较慢,但是直接SQL执行checkopint命令确很快呢?

原因是checkpoint分为调度和非调度模式。

调度模式的checkpoint和checkpoint_completion_target以及配置的max_wal_size区间大小有关,checkpoint_completion_target和max_wal_size越大,表示这个checkpoint将在这么大的区间内调度完成,所以总耗时会非常长,好处是减少CHECKPOINT带来的大量刷脏和FSYNC,从而减少抖动。

坏处就是你会发现checkpoint很漫长。

非调度模式的checkpoint,就是尽快完成检查点,会全速刷脏,不进行调度。好处是快,坏处是,如果脏页特别多,可能会有大量IO影响其他会话性能。

```
postgres=# show max_wal_size ;
max_wal_size


128GB
(1 row)

postgres=# show min_wal_size;
min_wal_size


32GB
(1 row)

postgres=# show checkpoint_completion_target ;
checkpoint_completion_target


0.1
(1 row)
```

代码中可以看到,checkpoint有如下flag来控制检查点行为。

* RequestCheckpoint * Called in backend processes to request a checkpoint * * flags is a bitwise OR of the following: * CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown. * CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery. * CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP, * ignoring checkpoint_completion_target parameter. * CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred * since the last one (implied by CHECKPOINT_IS_SHUTDOWN or * CHECKPOINT_END_OF_RECOVERY). * CHECKPOINT_WAIT: wait for completion before returning (otherwise, * just signal checkpointer to do it, and return). * CHECKPOINT_CAUSE_XLOG: checkpoint is requested due to xlog filling. * (This affects logging, and in particular enables CheckPointWarning.) */ void RequestCheckpoint(int flags)

start backup如何控制是使用快速checkpoint(非调度模式)、或者调度模式的checkpoint呢?

```
XLogRecPtr
do_pg_start_backup(const char backupidstr, bool fast, TimeLineID starttli_p,
StringInfo labelfile, DIR tblspcdir, List *tablespaces,
StringInfo tblspcmapfile, bool infotbssize,
bool needtblspcmapfile)
{

                     * Since the fact that we are executing do_pg_start_backup()  
                     * during recovery means that checkpointer is running, we can use  
                     * RequestCheckpoint() to establish a restartpoint.  
                     *  
                     * We use CHECKPOINT_IMMEDIATE only if requested by user (via  
                     * passing fast = true).  Otherwise this can take awhile.  
                     */  
                    RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |  
                                                      (fast ? CHECKPOINT_IMMEDIATE : 0));
复制

```

1、pg_basebackup客户端命令,通过-c参数控制(fast表示使用非调度模式checkpoint)

-c, --checkpoint=fast|spread set fast or spread checkpointing

2、pg_start_backup SQL函数,通过参数fast控制

postgres=# \df pg_start_backup List of functions Schema | Name | Result data type | Argument data types | Type ------------+-----------------+------------------+------------------------------------------------------------------------+------ pg_catalog | pg_start_backup | pg_lsn | label text, fast boolean DEFAULT false, exclusive boolean DEFAULT true | func (1 row)

小结

如果你需要快速的开始备份,可以使用fast(非调度模式)参数。

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论