暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

在postgresql中使用pg_rewind解决脑裂问题

原创 X丶 2022-10-11
1762

总结

在这个博客中,我将解释pg_rewind用例来解决数据库故障转移发生后脑裂的问题。

脑裂综合症

脑裂通常用于描述集群中的两个或多个节点彼此失去连接,但继续独立运行的场景。

流复制

流复制是PostgreSQL的一个标准特性,它可以将主服务器上更新的信息实时传输到备用服务器上,从而实现主服务器和备用服务器的数据库同步。

Repmgr扩展

repmgr是一个来自2ndQuadrant的开源工具集,2ndQuadrant是postgresql相关技术和服务的领先专家。该产品用于自动化、增强和管理PostgreSQL流复制。

repmgrd(守护进程)

repmgrd是一个运行在每个PostgreSQL节点上的守护进程,监视本地节点,以及(除非它是主节点)连接到repmgrd的上游服务器(主服务器或具有级联复制的另一个备用服务器),它可以配置为在上游主节点不可达的情况下提供故障转移功能,和/或为repmgr元数据库提供监视数据。

数据库体系结构

我用流复制技术配置了两个节点的PostgreSQL数据库,并安装了repmgr工具,实现了PostgreSQL中的自动故障转移。之后启动repmgrd守护进程。

[postgres@fahad archive]$ repmgr daemon status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+----------------------+---------+-----------+----------------------+---------+-------+---------+-------------------- 1 | athar.postgres-1.com | primary | * running | | running | 60943 | no | n/a 2 | fahad.postgres-2.com | standby | running | athar.postgres-1.com | running | 70493 | no | 2 second(s) ago
复制

场景:

为了进行故障转移,我强制关闭了主数据库上的postmaster进程。故障转移之后,旧的主节点被停止,另一个节点开始作为主节点工作。然后我手动重新启动旧的主节点,发现两个节点的角色都是主节点。

这是脑裂在PostgreSQL中发生的一个例子。

[postgres@fahad ~]$ repmgr cluster show --verbose NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf" INFO: connecting to database ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+----------------------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------- 1 | athar.postgres-1.com | primary | ! running | | default | 100 | 1 | host=192.168.202.228 user=postgres dbname=postgres connect_timeout=2 2 | fahad.postgres-2.com | primary | * running | | default | 100 | 2 | host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2 WARNING: following issues were detected - node "athar.postgres-1.com" (ID: 1) is running but the repmgr node record is inactive [postgres@athar ~]$ repmgr daemon status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+----------------------+---------+----------------------+----------+---------+------+---------+-------------------- 1 | athar.postgres-1.com | primary | * running | | running | 5084 | no | n/a 2 | fahad.postgres-2.com | standby | ! running as primary | | running | 2267 | no | n/a WARNING: following issues were detected - node "fahad.postgres-2.com" (ID: 2) is registered as standby but running as primary
复制

解决方案:

如上所述,我们已经看到了集群之间的脑裂。两个节点都是独立的主节点。

这里,我们需要旧的主服务器重新加入副本配置并倒带该数据库,以便从当前主服务器同步日志序列。这将丢弃在自动切换发生并启动后可能到达旧主服务器的任何事务。在倒带之前,只需停止node1上的所有数据库服务。
倒带试运行,以验证是否满足所有先决条件

[postgres@athar ~]$ repmgr node rejoin -d 'host=192.168.202.229 dbname=postgres user=postgres' --config-files=/var/lib/pgsql/12/data/postgresql.conf --verbose --force-rewind --dry-run NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf" INFO: replication slots in use, 2 free slots on node 20 INFO: replication connection to the rejoin target node was successful INFO: local and rejoin target system identifiers match DETAIL: system identifier is 6967965196868380905 NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2 DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/A000028 INFO: prerequisites for using pg_rewind are met INFO: temporary archive directory "/tmp/repmgr-config-archive-athar.postgres-1.com" created WARNING: specified file "/var/lib/pgsql/12/data//var/lib/pgsql/12/data/postgresql.conf" not found, skipping INFO: 0 files would have been copied to "/tmp/repmgr-config-archive-athar.postgres-1.com" INFO: temporary archive directory "/tmp/repmgr-config-archive-athar.postgres-1.com" deleted INFO: pg_rewind would now be executed DETAIL: pg_rewind command is: pg_rewind -D '/var/lib/pgsql/12/data' --source-server='host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2' INFO: prerequisites for executing NODE REJOIN are met
复制

运行倒带

[postgres@athar ~]$ repmgr node rejoin -d 'host=192.168.202.229 dbname=postgres user=postgres' --config-files=/var/lib/pgsql/12/data/postgresql.conf --verbose --force-rewind NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf" NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2 DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/A000028 INFO: prerequisites for using pg_rewind are met WARNING: specified file "/var/lib/pgsql/12/data//var/lib/pgsql/12/data/postgresql.conf" not found, skipping INFO: 0 files copied to "/tmp/repmgr-config-archive-athar.postgres-1.com" NOTICE: executing pg_rewind DETAIL: pg_rewind command is "pg_rewind -D '/var/lib/pgsql/12/data' --source-server='host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2'" NOTICE: 0 files copied to /var/lib/pgsql/12/data INFO: directory "/tmp/repmgr-config-archive-athar.postgres-1.com" deleted INFO: creating replication slot as user "postgres" NOTICE: setting node 1's upstream to node 2 WARNING: unable to ping "host=192.168.202.228 user=postgres dbname=postgres connect_timeout=2" DETAIL: PQping() returned "PQPING_NO_RESPONSE" NOTICE: starting server using "pg_ctl -w -D '/var/lib/pgsql/12/data' start" INFO: node "athar.postgres-1.com" (ID: 1) is pingable INFO: node "athar.postgres-1.com" (ID: 1) has attached to its upstream node NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 2
复制

回放后的群集状态

[postgres@athar ~]$ repmgr daemon status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+----------------------+---------+-----------+----------------------+---------+------+---------+-------------------- 1 | athar.postgres-1.com | standby | running | fahad.postgres-2.com | running | 6345 | no | 0 second(s) ago 2 | fahad.postgres-2.com | primary | * running | | running | 5184 | no | n/a
复制

原文标题:SPLIT BRAIN SOLUTION USING PG_REWIND IN POSTGRESQL
原文作者:Athar Ishteyaque
原文地址:https://blog.pythian.com/split-brain-solution-using-pg_rewind-in-postgresql/

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

目录
  • 总结
  • 脑裂综合症
  • 流复制
  • Repmgr扩展
  • repmgrd(守护进程)
  • 数据库体系结构
  • 场景:
  • 解决方案:
  • 运行倒带
  • 回放后的群集状态