repmgr5.1 正常切换告警，不知道是啥问题，求各位大佬看看

我来答

匿名用户

10M

postgresql

备库测试执行无报错

repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind

pgsql@itpuxpg25:/postgresql/pgdata]$repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
NOTICE: checking switchover on node "itpuxpg25" (ID: 2) in --dry-run mode
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "192.168.101.24" succeeded
INFO: able to execute "repmgr" on remote host "192.168.101.24"
INFO: all sibling nodes are reachable via SSH
INFO: 3 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "itpuxpg25" (ID: 2) would be promoted to primary; current primary "itpuxpg24" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "itpuxpg24":
"/postgresql/pg12/bin/pg_ctl -D '/postgresql/pgdata' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met

在备库正式执行报错

[pgsql@itpuxpg25:/postgresql/pgdata]$repmgr -f /postgresql/pg12/repmgr.conf standby switchover --siblings-follow --force-rewind

NOTICE: executing switchover on node "itpuxpg25" (ID: 2)
NOTICE: local node "itpuxpg25" (ID: 2) will be promoted to primary; current primary "itpuxpg24" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "itpuxpg24" (ID: 1)
NOTICE: issuing CHECKPOINT on node "itpuxpg24" (ID: 1)
DETAIL: executing server command "/postgresql/pg12/bin/pg_ctl -D '/postgresql/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/1C000028
NOTICE: promoting standby to primary
DETAIL: promoting server "itpuxpg25" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "itpuxpg25" (ID: 2) was successfully promoted to primary
NOTICE: issuing CHECKPOINT on node "itpuxpg25" (ID: 2)
ERROR: unable to execute CHECKPOINT
ERROR: connection to database failed
DETAIL:
fe_sendauth: no password supplied
ERROR: unable to establish a replication connection to the rejoin target node
INFO: waiting for node "itpuxpg24" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: checking for record in node "itpuxpg25"'s "pg_stat_replication" table where "application_name" is "itpuxpg24"
INFO: waiting for node "itpuxpg24" (ID: 1) to connect to new primary; 6 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: checking for record in node "itpuxpg25"'s "pg_stat_replication" table where "application_name" is "itpuxpg24"

......

备库状态

[pgsql@itpuxpg26:/postgresql/pgdata]$repmgr -f /postgresql/pg12/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-----------+---------+-----------+-------------+----------+----------+----------+---------------------------------------------------------------------------------
1 | itpuxpg24 | primary | - failed | ? | default | 100 | | host=192.168.101.24 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | itpuxpg25 | standby | running | ? itpuxpg24 | default | 100 | 2 | host=192.168.101.25 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | itpuxpg26 | primary | * running | | default | 100 | 3 | host=192.168.101.26 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
4 | itpuxpg27 | witness | * running | itpuxpg26 | default | 0 | n/a | host=192.168.101.27 user=repmgr password=repmgr dbname=repmgr connect_timeout=2

下面是自己的配置文件信息

[pgsql@itpuxpg24:/postgresql/pgdata]$more pg_hba.conf

local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
host all all 0.0.0.0/0 md5
host replication repuser 0.0.0.0/0 md5
host all nobody 0.0.0.0/0 md5

local repmgr repmgr md5
host repmgr repmgr 127.0.0.1/32 md5
host repmgr repmgr 192.168.101.0/24 md5
local replication repmgr md5
host replication repmgr 127.0.0.1/32 md5
host replication repmgr 192.168.101.0/24 md5

[pgsql@itpuxpg24:/postgresql/pgdata]$more postgresql.conf

listen_addresses = '*'
port = 5432
max_connections = 500
shared_buffers = 4096MB
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f'
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = on
log_timezone = 'PRC'
datestyle = 'iso, mdy'
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
default_text_search_config = 'pg_catalog.english'
max_replication_slots=10
wal_log_hints=on
max_wal_senders = 10
wal_keep_segments = 256
wal_sender_timeout = 60s

我来答

添加附件

问题补充

2条回答

默认