postgresql 主从切换过程

原创 szrsu 2023-01-07

5762

一、切换简介

从PostgreSQL 12开始，在执行通过流复制来配置主备数据库的时候，不再需要配置额外配置recovery.conf文件了。取而代之的是在备库环境的$PGDATA路径下配置一个standby.signal文件，注意该文件是1个普通的文本文件，内容为空。
理解起来就是，该文件是一个标识文件。如果备库通过执行pg_ctl promote提升为主库的话，那么该文件将自动消失。

主备切换及注意事项:如果因为意外或故障导致主库不可用的情况下，可以直接将备库提升为主库对外提供服务。然后视具体情况看原来的主库是否需要重建，或者是否待故障恢复之后，可以直接作为新的备库，然后从新的主库(原备库)同步数据。

二、下面是模拟切换步骤

1、主库停止，模拟故障
pg_ctl stop -m fast

###通过pg_ctl stop -m fast停止原来的主库之后，数据库后台进程都没有了。

2、备库提升为新主库，对外提供服务
[postgres@pg12s data]$ pg_ctl promote
waiting for server to promote… done
server promoted
[postgres@pg12s data]$ ps -ef|grep postgres
postgres 199 1 0 11:24 ? 00:00:00 /usr/local/pg144/bin/postgres -D /usr/local/pg144/data
postgres 200 199 0 11:24 ? 00:00:00 postgres: logger
postgres 202 199 0 11:24 ? 00:00:00 postgres: checkpointer
postgres 203 199 0 11:24 ? 00:00:00 postgres: background writer
postgres 204 199 0 11:24 ? 00:00:00 postgres: stats collector
root 232 214 0 11:26 pts/0 00:00:00 su - postgres
postgres 233 232 0 11:26 pts/0 00:00:00 -bash
postgres 279 199 0 12:05 ? 00:00:00 postgres: walwriter
postgres 280 199 0 12:05 ? 00:00:00 postgres: autovacuum launcher
postgres 281 199 0 12:05 ? 00:00:00 postgres: archiver last was 000000010000000000000004.partial
postgres 282 199 0 12:05 ? 00:00:00 postgres: logical replication launcher
postgres 287 233 0 12:06 pts/0 00:00:00 ps -ef
postgres 288 233 0 12:06 pts/0 00:00:00 grep --color=auto postgres

###重要1：启动备库为新主库的命令是pg_ctl promote。提升备库为主库之后，可以看到，后台进程中不再有startup recovering，以及walreceiver streaming进程了。同时，多了postgres: walwriter 写进程。
###重要2：$PGDATA/standby.signal文件自动消失了。这是告诉PostgreSQL，我现在不再是备库了，我的身份是主库了。

3、新主库修改pg_hba.conf文件

修改新主库(原备库192.168.40.147)的$PGDATA/pg_hba.conf文件，在其中添加允许新备库(原主库192.168.40.133)可以通过replica用户访问数据库的条目信息。
host replication replica 192.168.40.133/24 md5

如果不做这一步配置的话，将来启动原主库为新备库的时候，可能会遇到下述错误。

2021-10-21 17:13:20.464 CST [11394] FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host “192.168.40.133”, user “replica”, SSL off
2021-10-21 17:13:20.466 CST [11395] FATAL: could not connect to the primary server: FATAL: no pg_hba.conf entry for replication connection from host “192.168.40.133”, user “replica”, SSL off

4、原主库新建PGDATA/standby.signal文件 -bash-4.2 pwd
/var/lib/pgsql/14/data
-bash-4.2$ touch standby.signal

-bash-4.2$ ll standby.signal
-rw-rw-r-- 1 postgres postgres 0 Oct 21 16:54 standby.signal

5、原主库修改$PGDATA/postgresql.auto.conf文件

#####注意，应该用单引号，而不是双引号。否则遇到下述错误FATAL: configuration file “postgresql.auto.conf” contains errors

修改$PGDATA/postgresql.auto.conf配置文件为下述正确的格式：

-bash-4.2$ cat postgresql.auto.conf

Do not edit this file manually!

It will be overwritten by the ALTER SYSTEM command.

primary_conninfo=‘user=replica password=replica host=192.168.40.147 port=5432’

-bash-4.2$ pg_ctl start -l ~/pg.log
waiting for server to start… stopped waiting
pg_ctl: could not start server
Examine the log output.

-bash-4.2$ tailf ~/pg.log

6、启动原主库，变为新备库
-bash-4.2$ pg_ctl start -l ~/pg.log
waiting for server to start… done
server started
-bash-4.2$ ps -ef|grep postgres
root 8116 8115 0 16:58 pts/0 00:00:00 su - postgres
postgres 8118 8116 0 16:58 pts/0 00:00:00 -bash
root 8598 8597 0 17:00 pts/2 00:00:00 su - postgres
postgres 8600 8598 0 17:00 pts/2 00:00:00 -bash
postgres 11368 8118 0 17:13 pts/0 00:00:00 tailf pg.log
postgres 11389 1 0 17:13 ? 00:00:00 /postgres/pg12.8/bin/postgres
postgres 11390 11389 0 17:13 ? 00:00:00 postgres: startup recovering 000000020000000000000003
postgres 11391 11389 0 17:13 ? 00:00:00 postgres: checkpointer
postgres 11392 11389 0 17:13 ? 00:00:00 postgres: background writer
postgres 11393 11389 0 17:13 ? 00:00:00 postgres: stats collector
postgres 11440 11389 0 17:13 ? 00:00:00 postgres: walreceiver streaming 0/3013AC8
postgres 12545 30411 0 17:18 pts/1 00:00:00 ps -ef
postgres 12546 30411 0 17:18 pts/1 00:00:00 grep --color=auto postgres
root 30410 30409 0 16:11 pts/1 00:00:00 su - postgres
postgres 30411 30410 0 16:11 pts/1 00:00:00 -bash
-bash-4.2$ tailf pg.log
2021-10-21 17:13:45.488 CST [11440] LOG: fetching timeline history file for timeline 2 from primary server
2021-10-21 17:13:45.493 CST [11440] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
2021-10-21 17:13:45.493 CST [11440] LOG: replication terminated by primary server
2021-10-21 17:13:45.493 CST [11440] DETAIL: End of WAL reached on timeline 1 at 0/30001C0.
2021-10-21 17:13:45.494 CST [11390] LOG: new target timeline is 2
2021-10-21 17:13:45.494 CST [11440] LOG: restarted WAL streaming at 0/3000000 on timeline 2
2021-10-21 17:13:45.539 CST [11390] LOG: redo starts at 0/30001C0

这样，就完成了一次主从数据库环境的切换操作了。

三、小结
随着新版本的发行，数据库的配置和使用也越来越简单顺手了。
备库提升为主库的命令：pg_ctl promote;
新主库(原备库)的pg_hba.conf文件，要开放允许流复制访问数据库的信息给原主库的IP地址；
原主库配置为新备库的时候，务必要创建PGDATA/standby.signal文件；原主库配置为新备库的时候，务必要修改PGDATA/postgresql.auto.conf文件，添加主库primary_conninfo的信息；

四、扩展
从PostgreSQL9.5版本开始提供pg_rewind命令pg_rewind 的功能是在主备切换后回退旧主库上多余的事务变更，以便可以作为新主的备机和新主建立复制关系。
通过pg_rewind可以在故障切换后快速恢复旧主，避免整库重建。对于大库，整库重建会很耗时间。

喜欢这篇文章的人还喜欢：
《openGauss 一主一备从5.0 LTS 版本升级至 6.0 LTS 版本实战》
《openGauss 6.0 主备切换 switchover和failover 实操》

最后修改时间：2024-11-08 15:48:42

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

postgresql 主从切换过程

Do not edit this file manually!

It will be overwritten by the ALTER SYSTEM command.

评论

相关阅读