暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

[postgresql] repmgr集群 ERROR: unable to retrieve record for local node %n

原创 手机用户0512 2023-05-29
1213

[postgresql] repmgr集群添加新节点后发现异常,信息也不同步。

[pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
复制
  1. 尝试standby follow
    按照文章
    “[postgresql] repmgr集群 状态显示 - node “node*” (ID: %n) is not attached to its upstream node”
    尝试repmgr v失败
[pg12@node3 ~]$ repmgr standby follow -f /home/pg12/conf/repmgr.conf ERROR: unable to retrieve record for local node 3 [pg12@node3 ~]$
复制
[pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2 [pg12@node3 ~]$
复制

显然无法使用standby follow达成同步。

  1. 尝试关闭节点,重新rejoin
    如下即是尝试 关闭问题node,然后尝试使用repmgr node rejoin解决问题。
[pg12@node3 ~]$repmgr -f /home/pg12/conf/repmgr.conf node service --action=stop DETAIL: executing server command "/home/pg12/soft/bin/pg_ctl -D '/home/pg12/data' -W -m fast stop" [pg12@node3 ~]$ [pg12@node3 ~]$ repmgr node rejoin -f /home/pg12/conf/repmgr.conf -d 'host=192.168.5.101 dbname=repmgr user=repmgr' --config-fil es=postgresql.local.conf,postgresql.conf --verbose --force-rewind NOTICE: using provided configuration file "/home/pg12/conf/repmgr.conf" ERROR: unable to retrieve node record for the local node HINT: check the local node is registered with the current primary "node1" (ID: 1) Segmentation fault (core dumped) [pg12@node3 ~]$ [pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
复制

显然无法实现恢复。

  1. 最后尝试注销,重新备考建设

注销node3节点的standby注册,然后重新克隆clone并注册standby:
repmgr -f /home/pg12/conf/repmgr.conf standby unregister
repmgr -f /home/pg12/conf/repmgr.conf cluster show

[pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf standby unregister INFO: connecting to local standby INFO: connecting to primary database NOTICE: unregistering node 3 INFO: standby unregistration complete [pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2
复制

确认取消注册后清除目前数据库,进行克隆(repmgr standby clone):

rm -rf /home/pg12/data
repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data --dry-run
repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data
touch $PGDATA/standby.signal

此处$PGDATA 为设置环境变量, 此处就是/home/pg12/data。

[pg12@node3 ~]$ rm -rf /home/pg12/data #--dry-run 选项为预检查,可以发现潜在一些问题 [pg12@node3 ~]$ repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data --dry-run NOTICE: destination directory "/home/pg12/data" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.5.101 user=repmgr dbname=repmgr DETAIL: current installation size is 33 MB INFO: "repmgr" extension is installed in database "repmgr" INFO: parameter "max_wal_senders" set to 10 NOTICE: checking for available walsenders on the source node (2 required) INFO: sufficient walsenders available on the source node DETAIL: 2 required, 9 available NOTICE: checking replication connections can be made to the source server (2 required) INFO: required number of replication connections could be made to the source server DETAIL: 2 replication connections required NOTICE: standby will attach to upstream node 1 HINT: consider using the -c/--fast-checkpoint option INFO: all prerequisites for "standby clone" are met #--dry-run 运行后没有什么报错,正式进行克隆 [pg12@node3 ~]$ repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data NOTICE: destination directory "/home/pg12/data" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.5.101 user=repmgr dbname=repmgr DETAIL: current installation size is 33 MB NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: creating directory "/home/pg12/data"... NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: pg_basebackup -l "repmgr base backup" -D /home/pg12/data -h 192.168.5.101 -p 5432 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /home/pg12/data start HINT: after starting the server, you need to register this standby with "repmgr standby register" # 根据主库primary克隆备库后先不启动备库,在数据库目录创建空文件$PGDATA/standby.signal(限定于postgresql 12版本,这个识别文件作用是启动时使数据库判定自己为standby库) [pg12@node3 ~]$ touch $PGDATA/standby.signal [pg12@node3 ~]$ pg_ctl -D $PGDATA start # 查看pg_is_in_recovery(),判断是否以备库身份启动。 [pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr repmgr -c 'select * from pg_is_in_recovery();' pg_is_in_recovery ------------------- t (1 row)
复制

检查备库正常启动后,注册备库(repmgr standby register):

[pg12@node3 ~]$ repmgr standby register -f /home/pg12/conf/repmgr.conf INFO: connecting to local node "node3" (ID: 3) INFO: connecting to primary database WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1) ERROR: local node not attached to primary node 1 HINT: specify the actual upstream node id with --upstream-node-id, or use -F/--force to continue anyway [pg12@node3 ~]$ repmgr standby register -f /home/pg12/conf/repmgr.conf --force INFO: connecting to local node "node3" (ID: 3) INFO: connecting to primary database WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1) WARNING: local node not attached to primary node 1 NOTICE: -F/--force supplied, continuing anyway INFO: standby registration complete NOTICE: standby node "node3" (ID: 3) successfully registered [pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ! node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2 WARNING: following issues were detected - node "node3" (ID: 3) is not attached to its upstream node "node1" (ID: 1) #确定成功组成,然后处理问题not attached to its upstream node [pg12@node3 ~]$ repmgr standby follow -f /home/pg12/conf/repmgr.conf NOTICE: attempting to find and follow current primary INFO: timelines are same, this server is not ahead DETAIL: local node lsn is 0/F00A488, follow target lsn is 0/F00A488 NOTICE: setting node 3's upstream to node 1 NOTICE: stopping server using "/home/pg12/soft/bin/pg_ctl -D '/home/pg12/data' -w -m fast stop" 2023-05-26 21:56:37.093 CST [33220] LOG: received fast shutdown request 2023-05-26 21:56:37.093 CST [33220] LOG: aborting any active transactions 2023-05-26 21:56:37.093 CST [33225] FATAL: terminating walreceiver process due to administrator command 2023-05-26 21:56:37.094 CST [33222] LOG: shutting down 2023-05-26 21:56:37.096 CST [33220] LOG: database system is shut down NOTICE: starting server using "/home/pg12/soft/bin/pg_ctl -w -D '/home/pg12/data' start" NOTICE: STANDBY FOLLOW successful DETAIL: standby attached to upstream node "node1" (ID: 1) [pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2 [pg12@node3 ~]$
复制

最后发现集群信息显示完美解决。
同时建表t1进行测试:

[pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr postgres -c 'select * from t1;' id | comm ----+-------- 1 | zhang3 (1 row) [pg12@node3 ~]$ psql -h 192.168.5.101 -U repmgr postgres -c 'insert into t1 values(2,'li4');' INSERT 0 1 [pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr postgres -c 'select * from t1;' id | comm ----+-------- 1 | zhang3 2 | li4 (2 rows)
复制
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论