1、环境说明
操作系统 | IP | PG版本 | hostname | ||
---|---|---|---|---|---|
主库 | CentOS Linux release 7.7.1908 | 192.168.91.133 | 12.9 | pg1 | |
从库 | CentOS Linux release 7.7.1908 | 192.168.91.134 | 12.9 | pg2 | |
从库 | CentOS Linux release 7.7.1908 | 192.168.91.135 | 12.9 | pg3 |
2、前提
主库安装数据库软件并初始化db
从库安装数据库软件
3、主库操作
3.1 主库放开防火墙(注意添加replication)
cat << EOF >> /postgresql/pg12/data/pg_hba.conf # TYPE DATABASE USER ADDRESS METHOD local all all trust host all all 127.0.0.1/32 trust host all all 0.0.0.0/0 md5 host replication all 0.0.0.0/0 md5 EOF
复制
3.2、主库配置归档
-- 修改参数 cat >> /postgresql/pg12/data/postgresql.conf <<"EOF" wal_level='replica' archive_mode='on' archive_command='test ! -f /postgresql/pg12/archive_log/%f && cp %p /postgresql/pg12/archive_log/%f' max_wal_senders=10 wal_keep_segments=256 wal_sender_timeout=60s EOF
复制
--重启主库 pg_ctl restart -D $PGDATA
复制
-- 查询参数 psql select * from pg_settings where name in ('wal_level','archive_mode','archive_command'); -- 切换归档 select pg_switch_wal();
复制
3.3、主库创建复制用户(创建用户需要加上replication选项。)
CREATE ROLE repl LOGIN ENCRYPTED PASSWORD 'repl' REPLICATION;
复制
4、 从库操作
4.1、在从库对主库进行备份
su - postgres postgres@pg2-> pg_basebackup -h192.168.91.133 -p5432 -Urepl -F p -P -R -D $PGDATA Password: 33703/33703 kB (100%), 1/1 tablespace
复制
4.2、启动从库
postgres@pg2-> pg_ctl start waiting for server to start....2021-11-24 09:14:21.329 CST [8912] LOG: starting PostgreSQL 12.9 on x86_64-pc-linux-gn u, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2021-11-24 09:14:21.329 CST [8912] LOG: listening on IPv4 address "0.0.0.0", port 5432 2021-11-24 09:14:21.329 CST [8912] LOG: listening on IPv6 address "::", port 5432 2021-11-24 09:14:21.330 CST [8912] LOG: listening on Unix socket "/postgresql/pg12/data/.s.PGSQL.5432" 2021-11-24 09:14:21.435 CST [8913] LOG: database system was interrupted; last known up at 2021-11-24 09:14:09 CST 2021-11-24 09:14:21.439 CST [8913] LOG: entering standby mode 2021-11-24 09:14:21.441 CST [8913] LOG: redo starts at 0/7000028 2021-11-24 09:14:21.442 CST [8913] LOG: consistent recovery state reached at 0/7000100 2021-11-24 09:14:21.462 CST [8912] LOG: database system is ready to accept read only connections 2021-11-24 09:14:21.516 CST [8917] LOG: started streaming WAL from primary at 0/8000000 on timeline 1 done server started
复制
5、查询复制状态
-- 主库查看wal日志发送状态 select * from pg_stat_replication; postgres=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 2451 usesysid | 16384 usename | repl application_name | walreceiver client_addr | 192.168.91.134 client_hostname | client_port | 59098 backend_start | 2021-11-24 09:14:21.499172+08 backend_xmin | state | streaming sent_lsn | 0/8000148 write_lsn | 0/8000148 flush_lsn | 0/8000148 replay_lsn | 0/8000148 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2021-11-24 09:57:28.166605+08 -- 从库查看wal日志接收状态 select * from pg_stat_wal_receiver; postgres=# select * from pg_stat_wal_receiver; -[ RECORD 1 ]---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- pid | 8917 status | streaming receive_start_lsn | 0/8000000 receive_start_tli | 1 received_lsn | 0/8000148 received_tli | 1 last_msg_send_time | 2021-11-24 09:58:18.267709+08 last_msg_receipt_time | 2021-11-24 09:58:18.265728+08 latest_end_lsn | 0/8000148 latest_end_time | 2021-11-24 09:19:14.039943+08 slot_name | sender_host | 192.168.91.133 sender_port | 5432 conninfo | user=repl password=******** dbname=replication host=192.168.91.133 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any -- 也可以通过该命令查看 pg_controldata | grep state postgres@pg1-> pg_controldata | grep state Database cluster state: in production postgres@pg2-> pg_controldata | grep state Database cluster state: in archive recovery -- 也可以查看这个,主库是f代表false ;备库是t,代表true select pg_is_in_recovery(); postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row) postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)
复制
pg_stat_replication是一个视图,主要用于监控一个基于流的设置,在这个系统视图中每个记录只代表一个slave。因此,可以看到谁处于连接状态,在做什么任务。pg_stat_replication也是检查slave是否处于连接状态的一个好方法。 每个字段代码的含义: • pid: 这代表负责流连接的wal_sender进程的进程ID。例如“postgres: walsender replhr 172.72.6.3(40056) streaming 0/7000148”。 • usesysid: 每个内部用户都有一个独一无二的编号。该系统的工作原理很像UNIX。 usesysid 是 (PostgreSQL) 用户连接到系统的唯一标识符。 • usename: (不是用户名, 注意少了 r),它存储与用户相关的 usesysid 的名字。这是客户端放入到连接字符串中的东西。 • application_name:这是同步复制的通常设置。它可以通过连接字符串传递到master。 • client_addr: 它会告诉您流连接从何而来。它拥有客户端的IP地址。 • client_hostname: 除了客户端的IP,您还可以这样做,通过它的主机名来标识客户端。您可以通过master上的postgresql.conf中的log_hostname启用DNS反向查找。 • client_port: 这是客户端用来和WALsender进行通信使用的TPC端口号。 如果不本地UNIX套接字被使用了将显示-1。 • backend_start: 它告诉我们slave什么时间创建了流连接。 • state: 此列告诉我们数据的连接状态。如果事情按计划进行,它应该包含流信息。 • sent_lsn:这代表发送到连接的最后的事务日志的位置。已经通过网络发送了多少WAL? • write_lsn: 这是写到standby系统磁盘上最后的事务日志位置。已向操作系统发送了多少WAL?( 尚未 flushing) • flush_lsn: 这是被刷新到standby系统的最后位置。(这里注意写和刷新之间的区别。写并不意味着刷新 。)已经有多少WAL已 flush 到磁盘? • replay_lsn: 这是slave上重放的最后的事务日志位置。已重放了多少WAL,因此对查询可见? • sync_priority: 这个字段是唯一和同步复制相关的。每次同步复制将会选择一个优先权 —sync_priority—会告诉您选择了那个优先权。 • sync_state: 最后您会看到slave在哪个状态。这个状态可以是async, sync, or potential。当有一个带有较高优先权的同步slave时,PostgreSQL会把slave 标记为 potential。 人们经常说 pg_stat_replication 视图是primary 端的,这是不对的。该视图的作用是揭示有关wal sender 进程的信息。换句话说:如果你正在运行级联复制,该视图意味着在 secondary 复制到其他slaves 的时候, secondary 端的 pg_stat_replication 上的也会显示entries ( 条目 )
复制
6、主从切换
6.1、主从Switchover切换
在保证主从数据一致的情况下:
1、在当前主库操作:
1、停止服务 pg_ctl stop 2、以“postgres”用户创建创建standby.signal文件 touch /postgresql/pg12/data/standby.signal cat >> /postgresql/pg12/data/postgresql.auto.conf <<"EOF" primary_conninfo = 'user=repl password=repl host=192.168.91.134 port=5432 sslmode=prefer sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any' EOF 3、启动服务 pg_ctl start
复制
2、在当前从库操作:
1、停止服务 pg_ctl stop 2、删除“standby.signal”文件 rm -rf /postgresql/pg12/data/standby.signal 3、修改postgresql.conf sed -i 's/primary_conninfo/#primary_conninfo/g' /postgresql/pg12/data/postgresql.auto.conf 4、启动服务 pg_ctl start
复制
6.2、主从Failover切换
主数据库是读写的,备数据库是只读的。当主库出现故障时,我们需要将备库提升为主库进行读写操作。
1)切换后,原主库以从库的身份启动:修复过程类似于重建
2)切换后,原主库以主库的身份启动:
切换方法跟版本有关系:
P12之前,有2种办法:
① pg_ctl promote:执行之后发现recovery.conf变为recovery.done文件,代表 切换成功
② 触发器方式,:备库配置 recovery.conf 文件的 trigger_file 参数,之后在备库主机上创建触发器文件
从P12开始 : 直接执行 select pg_promote(true,60)函数即可完成切换。
pg12开始新增了一个pg_promote()函数,让我们可以通过SQL命令激活备库。
pg_promote()语法:
pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60)
两个参数:
wait: 表示是否等待备库的 promotion 完成或者
wait_seconds 秒之后返回成功,默认值为 true。 wait_seconds: 等待时间,单位秒,默认 60
6.2.1、Failover切换过程
1)主库操作:关闭主库,模拟主库故障:
postgres@pg2-> pg_ctl stop waiting for server to shut down...2021-11-24 10:38:39.444 CST [9036] LOG: received fast shutdown request .2021-11-24 10:38:39.445 CST [9036] LOG: aborting any active transactions 2021-11-24 10:38:39.445 CST [9036] LOG: background worker "logical replication launcher" (PID 9044) exited with exit code 1 2021-11-24 10:38:39.445 CST [9038] LOG: shutting down .2021-11-24 10:38:40.579 CST [9036] LOG: database system is shut down done server stopped
复制
2)备库操作:激活备库为主库:
-- 不关闭主库,也可以执行该命令强制切换为主库角色,切换完成后,主从关系断开 postgres=# select pg_promote(true,60); 2021-11-24 10:38:58.558 CST [2582] LOG: received promote request 2021-11-24 10:38:58.558 CST [2582] LOG: redo done at 0/A000028 2021-11-24 10:38:58.558 CST [2582] LOG: last completed transaction was at log time 2021-11-24 10:18:13.734815+08 2021-11-24 10:38:58.559 CST [2582] LOG: selected new timeline ID: 2 2021-11-24 10:38:58.862 CST [2582] LOG: archive recovery complete pg_promote ------------ t (1 row) -- 查询是否切换成功 select pg_is_in_recovery(); postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row)
复制
6.2.2、原主库修复后降为备库
1、停止服务 pg_ctl stop 2、以“postgres”用户创建创建standby.signal文件 touch /postgresql/pg12/data/standby.signal cat >> /postgresql/pg12/data/postgresql.auto.conf <<"EOF" primary_conninfo = 'user=repl password=repl host=192.168.91.133 port=5432 sslmode=prefer sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any' EOF 3、启动服务 pg_ctl start
复制
7、新增从节点(扩容节点)
7.1、新增一台从库192.168.91.135,安装PG12.9
7.2、在从库对主库进行备份
[root@pg3 ~]# su - postgres Last login: Mon Nov 22 20:38:32 CST 2021 on pts/0 postgres@pg3-> pg_basebackup -h192.168.91.133 -p5432 -Urepl -F p -P -R -D $PGDATA Password: 66081/66081 kB (100%), 1/1 tablespace
复制
7.3、启动从库
postgres@pg3-> pg_ctl start waiting for server to start....2021-11-24 11:06:43.238 CST [8998] LOG: starting PostgreSQL 12.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2021-11-24 11:06:43.238 CST [8998] LOG: listening on IPv4 address "0.0.0.0", port 5432 2021-11-24 11:06:43.238 CST [8998] LOG: listening on IPv6 address "::", port 5432 2021-11-24 11:06:43.239 CST [8998] LOG: listening on Unix socket "/postgresql/pg12/data/.s.PGSQL.5432" 2021-11-24 11:06:43.380 CST [8999] LOG: database system was interrupted; last known up at 2021-11-24 11:05:22 CST 2021-11-24 11:06:43.388 CST [8999] LOG: entering standby mode 2021-11-24 11:06:43.390 CST [8999] LOG: redo starts at 0/B000028 2021-11-24 11:06:43.390 CST [8999] LOG: consistent recovery state reached at 0/B000100 2021-11-24 11:06:43.390 CST [8998] LOG: database system is ready to accept read only connections 2021-11-24 11:06:43.446 CST [9003] LOG: started streaming WAL from primary at 0/C000000 on timeline 2 done server started
复制
7.4、查询复制状态
--主库查看wal日志发送状态 select * from pg_stat_replication; --从库查看wal日志接收状态 select * from pg_stat_wal_receiver; -- 也可以通过该名称查看 pg_controldata | grep state -- 也可以查看这个,主库是f代表false ;备库是t,代表true select pg_is_in_recovery();
复制
8、修改为同步模式
在3个节点都修改如下参数:
postgres@pg1-> cat >> /postgresql/pg12/data/postgresql.conf <<"EOF" > synchronous_commit='on' > synchronous_standby_names='*' > EOF postgres@pg2-> cat >> /postgresql/pg12/data/postgresql.conf <<"EOF" > synchronous_commit='on' > synchronous_standby_names='*' > EOF postgres@pg3-> cat >> /postgresql/pg12/data/postgresql.conf <<"EOF" > synchronous_commit='on' > synchronous_standby_names='*' > EOF
复制
在3个节点重新加载配置:
postgres@pg1-> pg_ctl reload server signaled 2021-11-24 11:35:06.413 CST [2581] LOG: received SIGHUP, reloading configuration files postgres@pg1-> 2021-11-24 11:35:06.413 CST [2581] LOG: parameter "synchronous_standby_names" changed to "*" 2021-11-24 11:35:06.823 CST [2631] LOG: standby "walreceiver" is now a synchronous standby with priority 1 2021-11-24 11:35:06.823 CST [2631] STATEMENT: START_REPLICATION 0/A000000 TIMELINE 2 postgres@pg2-> pg_ctl reload server signaled 2021-11-24 11:35:03.595 CST [9102] LOG: received SIGHUP, reloading configuration files postgres@pg2-> 2021-11-24 11:35:03.596 CST [9102] LOG: parameter "synchronous_standby_names" changed to "*" postgres@pg3-> pg_ctl reload server signaled 2021-11-24 11:34:53.355 CST [8998] LOG: received SIGHUP, reloading configuration files postgres@pg3-> 2021-11-24 11:34:53.355 CST [8998] LOG: parameter "synchronous_standby_names" changed to "*"
复制
查询同步模式:
postgres=# select client_addr,state,sync_state from pg_stat_replication; client_addr | state | sync_state ----------------+-----------+------------ 192.168.91.134 | streaming | sync 192.168.91.135 | streaming | potential (2 rows)
复制
若要修改为异步模式,则注销以上参数即可:
sed -i 's|synchronous_commit|#synchronous_commit|g' /postgresql/pg12/data/postgresql.conf sed -i 's|synchronous_standby_names|#synchronous_standby_names|g' /postgresql/pg12/data/postgresql.conf pg_ctl reload
复制
9、级联流复制
1、关闭pg3上的pg postgres@pg3-> pg_ctl stop waiting for server to shut down....2021-11-24 12:55:52.291 CST [8998] LOG: received fast shutdown request 2021-11-24 12:55:52.291 CST [8998] LOG: aborting any active transactions 2021-11-24 12:55:52.291 CST [9003] FATAL: terminating walreceiver process due to administrator command 2021-11-24 12:55:52.292 CST [9000] LOG: shutting down 2021-11-24 12:55:52.294 CST [8998] LOG: database system is shut down done server stopped 2、删除实例目录 postgres@pg3-> rm -rf $PGDATA 3、在pg3上对pg2上的从库进行备份 postgres@pg3-> pg_basebackup -h192.168.91.134 -p5432 -Urepl -F p -P -R -D $PGDATA Password: WARNING: skipping special file "./.s.PGSQL.5432" WARNING: skipping special file "./.s.PGSQL.5432" 76875/76875 kB (100%), 1/1 tablespace 4、修改postgresql.auto.conf文件 postgres@pg3-> vi postgresql.auto.conf postgres@pg3-> more postgresql.auto.conf # Do not edit this file manually! # It will be overwritten by the ALTER SYSTEM command. primary_conninfo = 'user=repl password=repl host=192.168.91.134 port=5432 sslmode=prefer sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any' 5、启动pg3上的从库 postgres@pg3-> pg_ctl start waiting for server to start....2021-11-24 12:59:42.445 CST [9122] LOG: starting PostgreSQL 12.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2021-11-24 12:59:42.445 CST [9122] LOG: listening on IPv4 address "0.0.0.0", port 5432 2021-11-24 12:59:42.445 CST [9122] LOG: listening on IPv6 address "::", port 5432 2021-11-24 12:59:42.446 CST [9122] LOG: listening on Unix socket "/postgresql/pg12/data/.s.PGSQL.5432" 2021-11-24 12:59:42.554 CST [9123] LOG: database system was interrupted while in recovery at log time 2021-11-24 12:56:22 CST 2021-11-24 12:59:42.554 CST [9123] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2021-11-24 12:59:42.563 CST [9123] LOG: entering standby mode 2021-11-24 12:59:42.564 CST [9123] LOG: redo starts at 0/C000C20 2021-11-24 12:59:42.565 CST [9123] LOG: consistent recovery state reached at 0/C001548 2021-11-24 12:59:42.565 CST [9123] LOG: invalid record length at 0/C001548: wanted 24, got 0 2021-11-24 12:59:42.565 CST [9122] LOG: database system is ready to accept read only connections 2021-11-24 12:59:42.570 CST [9127] LOG: started streaming WAL from primary at 0/C000000 on timeline 2 done server started
复制
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。