[postgresql] repmgr集群添加新节点后发现异常,信息也不同步。
[pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
复制
- 尝试standby follow
按照文章
“[postgresql] repmgr集群 状态显示 - node “node*” (ID: %n) is not attached to its upstream node”
尝试repmgr v失败
[pg12@node3 ~]$ repmgr standby follow -f /home/pg12/conf/repmgr.conf ERROR: unable to retrieve record for local node 3 [pg12@node3 ~]$
复制
[pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2 [pg12@node3 ~]$
复制
显然无法使用standby follow达成同步。
- 尝试关闭节点,重新rejoin
如下即是尝试 关闭问题node,然后尝试使用repmgr node rejoin解决问题。
[pg12@node3 ~]$repmgr -f /home/pg12/conf/repmgr.conf node service --action=stop DETAIL: executing server command "/home/pg12/soft/bin/pg_ctl -D '/home/pg12/data' -W -m fast stop" [pg12@node3 ~]$ [pg12@node3 ~]$ repmgr node rejoin -f /home/pg12/conf/repmgr.conf -d 'host=192.168.5.101 dbname=repmgr user=repmgr' --config-fil es=postgresql.local.conf,postgresql.conf --verbose --force-rewind NOTICE: using provided configuration file "/home/pg12/conf/repmgr.conf" ERROR: unable to retrieve node record for the local node HINT: check the local node is registered with the current primary "node1" (ID: 1) Segmentation fault (core dumped) [pg12@node3 ~]$ [pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
复制
显然无法实现恢复。
- 最后尝试注销,重新备考建设
注销node3节点的standby注册,然后重新克隆clone并注册standby:
repmgr -f /home/pg12/conf/repmgr.conf standby unregister
repmgr -f /home/pg12/conf/repmgr.conf cluster show
[pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf standby unregister INFO: connecting to local standby INFO: connecting to primary database NOTICE: unregistering node 3 INFO: standby unregistration complete [pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2
复制
确认取消注册后清除目前数据库,进行克隆(repmgr standby clone):
rm -rf /home/pg12/data
repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data --dry-run
repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data
touch $PGDATA/standby.signal
此处$PGDATA 为设置环境变量, 此处就是/home/pg12/data。
[pg12@node3 ~]$ rm -rf /home/pg12/data
#--dry-run 选项为预检查,可以发现潜在一些问题
[pg12@node3 ~]$ repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data --dry-run
NOTICE: destination directory "/home/pg12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.5.101 user=repmgr dbname=repmgr
DETAIL: current installation size is 33 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 9 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met
#--dry-run 运行后没有什么报错,正式进行克隆
[pg12@node3 ~]$ repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data
NOTICE: destination directory "/home/pg12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.5.101 user=repmgr dbname=repmgr
DETAIL: current installation size is 33 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: creating directory "/home/pg12/data"...
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
pg_basebackup -l "repmgr base backup" -D /home/pg12/data -h 192.168.5.101 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /home/pg12/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
# 根据主库primary克隆备库后先不启动备库,在数据库目录创建空文件$PGDATA/standby.signal(限定于postgresql 12版本,这个识别文件作用是启动时使数据库判定自己为standby库)
[pg12@node3 ~]$ touch $PGDATA/standby.signal
[pg12@node3 ~]$ pg_ctl -D $PGDATA start
# 查看pg_is_in_recovery(),判断是否以备库身份启动。
[pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr repmgr -c 'select * from pg_is_in_recovery();'
pg_is_in_recovery
-------------------
t
(1 row)
复制
检查备库正常启动后,注册备库(repmgr standby register):
[pg12@node3 ~]$ repmgr standby register -f /home/pg12/conf/repmgr.conf
INFO: connecting to local node "node3" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
ERROR: local node not attached to primary node 1
HINT: specify the actual upstream node id with --upstream-node-id, or use -F/--force to continue anyway
[pg12@node3 ~]$ repmgr standby register -f /home/pg12/conf/repmgr.conf --force
INFO: connecting to local node "node3" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
WARNING: local node not attached to primary node 1
NOTICE: -F/--force supplied, continuing anyway
INFO: standby registration complete
NOTICE: standby node "node3" (ID: 3) successfully registered
[pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | ! node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- node "node3" (ID: 3) is not attached to its upstream node "node1" (ID: 1)
#确定成功组成,然后处理问题not attached to its upstream node
[pg12@node3 ~]$ repmgr standby follow -f /home/pg12/conf/repmgr.conf
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/F00A488, follow target lsn is 0/F00A488
NOTICE: setting node 3's upstream to node 1
NOTICE: stopping server using "/home/pg12/soft/bin/pg_ctl -D '/home/pg12/data' -w -m fast stop"
2023-05-26 21:56:37.093 CST [33220] LOG: received fast shutdown request
2023-05-26 21:56:37.093 CST [33220] LOG: aborting any active transactions
2023-05-26 21:56:37.093 CST [33225] FATAL: terminating walreceiver process due to administrator command
2023-05-26 21:56:37.094 CST [33222] LOG: shutting down
2023-05-26 21:56:37.096 CST [33220] LOG: database system is shut down
NOTICE: starting server using "/home/pg12/soft/bin/pg_ctl -w -D '/home/pg12/data' start"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node1" (ID: 1)
[pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
[pg12@node3 ~]$
复制
最后发现集群信息显示完美解决。
同时建表t1进行测试:
[pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr postgres -c 'select * from t1;' id | comm ----+-------- 1 | zhang3 (1 row) [pg12@node3 ~]$ psql -h 192.168.5.101 -U repmgr postgres -c 'insert into t1 values(2,'li4');' INSERT 0 1 [pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr postgres -c 'select * from t1;' id | comm ----+-------- 1 | zhang3 2 | li4 (2 rows)
复制
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
外国CTO也感兴趣的开源数据库项目——openHalo
小满未满、
1749次阅读
2025-04-21 16:58:09
3月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
377次阅读
2025-04-15 14:48:05
转发有奖 | PostgreSQL 16 PGCM高级认证课程直播班招生中!
墨天轮小教习
192次阅读
2025-04-14 15:58:34
中国PostgreSQL培训认证体系新增PGAI应用工程师方向
开源软件联盟PostgreSQL分会
186次阅读
2025-05-06 10:21:13
PG生态赢得资本市场青睐:Databricks收购Neon,Supabase融资两亿美元,微软财报点名PG
老冯云数
147次阅读
2025-05-07 10:06:22
华象新闻 | PostgreSQL 18 Beta 1、17.5、16.9、15.13、14.18、13.21 发布
严少安
142次阅读
2025-05-09 11:34:10
SQL 优化之 OR 子句改写
xiongcc
138次阅读
2025-04-21 00:08:06
告别老旧mysql_fdw,升级正当时
NickYoung
124次阅读
2025-04-29 11:15:18
PostgreSQL中文社区亮相于第八届数字中国峰会
PostgreSQL中文社区
109次阅读
2025-05-07 10:06:20
PostgreSQL的dblink扩展模块使用方法
szrsu
105次阅读
2025-04-24 17:39:30
热门文章
[postgresql] repmgr集群维护“服务暂停”(repmgr service pause)
2023-06-05 980浏览
[postgresql] repmgr集群 状态显示 - node "node*" (ID: %n) is not attached to its upstream node
2023-05-28 730浏览
[postgresql] 实现psql自动登录
2023-06-05 604浏览
[postgresql] 访问pgbouncer管理界面,并查看如连接池类型等信息
2023-06-05 572浏览
openGauss每日一练第15天 openGauss逻辑结构:表管理3
2022-12-13 411浏览
最新文章
[postgresql] 实现psql自动登录
2023-06-05 604浏览
[postgresql] 访问pgbouncer管理界面,并查看如连接池类型等信息
2023-06-05 572浏览
[postgresql] repmgr集群维护“服务暂停”(repmgr service pause)
2023-06-05 980浏览
[postgresql] repmgr集群 状态显示 - node "node*" (ID: %n) is not attached to its upstream node
2023-05-28 730浏览
[postgresql] repmgr集群手动切换报错 "data_directory" parameter in repmgr.conf on "XXX" is incorrectly
2023-05-26 299浏览