主备需求:
通过repmgr5.2.1工具实现postgresql13.4的主备切换及激活管理(已实战于生产环境)
操作系统版本统一:centos7.7x64
主:192.168.52.7
备:192.168.52.8
root/postgres等密码统一设置好。
主要步骤:
安装postgresql
安装repmgr
主备库注册及切换测试
提示:各软件版本可根据实际情况选择即可,若希望尝鲜,那就选择最新即可。
1.安装postgresql
增加用户:
adduser postgres
wget
https://ftp.postgresql.org/pub/source/v13.4/postgresql-13.4.tar.gz
mkdir -p
/postgresql/{pgdata,archive,scripts,backup,pg13,soft}
chown -R postgres:postgres /postgresql
yum install -y cmake make gcc zlib gcc-c++
perl readline readline-devel zlib zlib-devel
3perl python36 tcl openssl ncurses-devel openldap pam
mv /root/postgresql-13.4.tar.gz ./
chown -R postgres:postgres /postgresql
su - postgres
cd /postgresql/soft/
tar -zxvf postgresql-13.4.tar.gz
cd postgresql-13.4
./configure --prefix=/postgresql/pg13
make -j 8 && make install
systemctl daemon-reload
systemctl enable PG13
systemctl start PG13
systemctl status PG13
[root@pg07 ~]# cat /etc/systemd/system/PG13.service
[Unit]
Description=PostgreSQL database server
Documentation=man:postgres(1)
After=network.target
[Service]
Type=forking
User=postgres
Group=postgres
Environment=PGPORT=5432
Environment=PGDATA=/postgresql/pgdata
OOMScoreAdjust=-1000
ExecStart=/postgresql/pg13/bin/pg_ctl start
-D ${PGDATA} -s -o "-p ${PGPORT}" -w -t 300
ExecStop=/postgresql/pg13/bin/pg_ctl stop
-D ${PGDATA} -s -m fast
ExecReload=/postgresql/pg13/bin/pg_ctl
reload -D ${PGDATA} -s
KillMode=mixed
KillSignal=SIGINT
TimeoutSec=0
[Install]
WantedBy=multi-user.target
cat .bash_profile内容:环境变量参考,也可用于/etc/profile中
PATH=$PATH:$HOME/.local/bin:$HOME/bin:/postgresql/pg13/bin
export PATH
export LANG=en_US.UTF-8
export PS1="[\u@\h \W]\$ "
export PGPORT=5432
export PGDATA=/postgresql/pgdata
export PGHOME=/postgresql/pg13
export
LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export PATH=$PGHOME/bin:$PATH:.
export DATE=`date +"%Y%m%d%H%M"`
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
pg库测试连接:
psql -U postgres -h 192.168.52.7
-d postgres -p5432
\l
/etc/profile内容:编译repmgr会用到
unset -f pathmunge
PATH=$PATH:/postgresql/pg13/bin
postgres密码:abcd1234
[root@pg07 repmgr-5.2.1]# su - postgres
Last login: Wed Sep 22 10:16:02 CST 2021 on
pts/0
[postgres@pg07 ~]$ pwd
/home/postgres
[postgres@pg07 ~]$ more .pgpass
192.168.52.7:5432:postgres:replicator:rep
192.168.52.8:5432:postgres:replicator:rep
pg07上:
su - postgres
mkdir –p $PGDATA/arch
/postgresql/pgdata/arch.sh内容:
test
! -f $PGDATA/arch/$1 && cp --preserve=timestamps $2 $PGDATA/arch/$1 ;
find $PGDATA/arch/ -type f -mtime +7 -exec rm -f {} \;
postgresql.conf内容:
listen_addresses = '*' #改成*便于将来增加浮动IP的监听
port=5432
data_directory = '/app/pgsql/data'
unix_socket_directories='/app/pgsql/data'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
wal_level = replica
archive_mode = on
archive_command = 'arch.sh %f %p'
wal_log_hints = on
hot_standby = on
[postgres@pg07 data]$ more
postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER
SYSTEM command.
max_connections = '500' #根据业务连接数分配
shared_buffers = '20GB' #根据操作系统的物理内存适当分配即可。
wal_buffers = '64MB'
primary_conninfo =
'user=postgres connect_timeout=2 host=192.168.52.8 application_name=pg07'
pg08上:(注意以下参数文件无需配置(此处只是列出参考内容),在配置standb clone的时候会从主库自动同步生成($PGDATA整个目录都会自动同步))
[postgres@pg08 data]$ more
postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER
SYSTEM command.
max_connections = '500'
shared_buffers = '20GB'
wal_buffers = '64MB'
primary_conninfo = 'user=postgres
connect_timeout=2 host=192.168.52.7 application_name=pg08'
[postgres@pg08 data]$ more postgresql.conf
listen_addresses = '*'
port=5432
unix_socket_directories='/app/pgsql/data'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
wal_level = replica
archive_mode = on
archive_command = 'arch.sh %f %p'
wal_log_hints = on
hot_standby = on
配置复制用户:(如使用管理员postgres,该步骤可忽略)
create role replicator with replication login;
\password replicator
rep
[postgres@pg07 ~]$ psql
psql (13.4)
Type "help" for help.
postgres=# \du
List of
roles
Role
name | Attributes | Member of
------------+------------------------------------------------------------+-----------
postgres
| Superuser, Create role, Create DB, Replication, Bypass RLS | {}
replicator | Replication
| {}
postgres=# alter
user replicator with superuser;
ALTER ROLE
postgres=# \du
List of
roles
Role
name | Attributes | Member of
------------+------------------------------------------------------------+-----------
postgres
| Superuser, Create role, Create DB, Replication, Bypass RLS | {}
replicator | Superuser, Replication
pg_hba.conf内容:
# "local" is for Unix domain
socket connections only
local
all all md5
# IPv4 local connections:
host
all all 127.0.0.1/32 md5
host
all all 10.62.0.0/16 md5
host
all all 172.16.18.0/24 md5
# IPv6 local connections:
host
all all ::1/128 md5
# Allow replication connections from
localhost, by a user with the
# replication privilege.
local
replication all md5
host
replication all 172.16.18.0/24 md5
host
replication all 192.168.52.0/24 trust
host
replication all 127.0.0.1/32 md5
host
replication all ::1/128 md5
启停pg:
pg_ctl stop
pg_ctl start
在线加载配置变更:pg_ctl reload
主备配置ssh互信:
[postgres@pg07 ~]$ cat /etc/hosts
127.0.0.1
localhost localhost.localdomain localhost4 localhost4.localdomain4
::1
localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.52.7 pg07
192.168.52.8 pg08
su - postgres进入postgres用户操作
pg07上操作:
# 生成秘钥到用户主目录下的.ssh文件夹下
ssh-keygen -t rsa
# 将秘钥拷贝到远程机器
ssh-copy-id -i .ssh/id_rsa.pub postgres@pg08
cat .ssh/id_*.pub |ssh postgres@192.168.52.8 'cat >>
.ssh/authorized_keys'
# # 验证是否授权完成:不提示密码,直接返回日期说明配置正确
ssh pg08 date
pg08上操作:
# 生成秘钥到用户主目录下的.ssh文件夹下
ssh-keygen -t rsa
# 将秘钥拷贝到远程机器
cat .ssh/id_*.pub |ssh postgres@192.168.52.7 'cat >>
.ssh/authorized_keys'
Benz_58858588
ssh-copy-id -i .ssh/id_rsa.pub postgres@pg07
# # 验证是否授权完成:不提示密码,直接返回日期说明配置正确
ssh pg07 date
主备测试链接:备库连主库
[postgres@pg08 ~]$ psql 'host=192.168.52.7
user=replicator dbname=postgres connect_timeout=2'
Password for user replicator:rep
2.编译repmgr-5.2.1
https://repmgr.org/download/repmgr-5.2.1.tar.gz
tar -zxvf repmgr-5.2.1.tar.gz
pg07上:
[postgres@pg07 ~]$ cat /etc/repmgr.conf
node_id=1
node_name=pg07
conninfo='host=192.168.52.7 user=replicator
dbname=postgres connect_timeout=2'
data_directory='/postgresql/pgdata'
pg_bindir='/postgresql/pg13/bin'
pg08上:
[postgres@pg08 ~]$ cat /etc/repmgr.conf
node_id=2
node_name=pg08
conninfo='host=192.168.52.8 user=replicator
dbname=postgres connect_timeout=2'
data_directory='/postgresql/pgdata'
pg_bindir='/postgresql/pg13/bin'
yum install flex -y
[root@pg07 repmgr-5.2.1]# ./configure
--prefix=/postgresql/pg13
[root@pg07 repmgr-5.2.1]# make &&
make install
主库注册:
[postgres@pg07 ~]$ pg_ctl start 如果在线修改配置pg_ctl reload即可
[postgres@pg07 ~]$ repmgr
primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension
"repmgr"
NOTICE: "repmgr" extension
successfully installed
NOTICE: primary node record (ID: 1)
registered
[postgres@pg07 ~]$ repmgr cluster show
ID |
Name | Role | Status | Upstream | Location | Priority | Timeline
| Connection string
----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------
1 | pg07
| primary | * running | |
default | 100 | 1
| host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2
备库注册:
[postgres@pg08 ~]$ repmgr
-h 192.168.52.7 -U replicator -d postgres -f /etc/repmgr.conf standby clone
--dry-run
NOTICE: destination directory
"/postgresql/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.52.7
user=replicator dbname=postgres
DETAIL: current installation size is 31 MB
INFO: "repmgr" extension is
installed in database "postgres"
WARNING: target data directory appears to
be a PostgreSQL data directory
DETAIL: target data directory is
"/postgresql/pgdata"
HINT: use -F/--force to overwrite the
existing data directory
INFO: replication slot usage not
requested; no replication slot will be
set up for this standby
INFO: parameter "max_wal_senders"
set to 10
NOTICE: checking for available walsenders
on the source node (2 required)
INFO: sufficient walsenders available on
the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections
can be made to the source server (2 required)
INFO: required number of replication
connections could be made to the source server
DETAIL: 2 replication connections required
WARNING: data checksums are not enabled and
"wal_log_hints" is "off"
DETAIL: pg_rewind requires
"wal_log_hints" to be enabled
NOTICE: standby will attach to upstream
node 1
HINT: consider using the
-c/--fast-checkpoint option
INFO: all prerequisites for "standby
clone" are met
[postgres@pg08 ~]$ repmgr
-h 192.168.52.7 -U replicator -d postgres -f /etc/repmgr.conf standby clone
NOTICE: destination directory
"/postgresql/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.52.7
user=replicator dbname=postgres
DETAIL: current installation size is 31 MB
ERROR: target data directory appears to be
a PostgreSQL data directory
DETAIL: target data directory is
"/postgresql/pgdata"
HINT: use -F/--force to overwrite the
existing data directory
[postgres@pg08 ~]$ rm
-rf /postgresql/pgdata/*
[postgres@pg08 ~]$repmgr
-h 192.168.52.7 -U replicator -d postgres -f /etc/repmgr.conf standby clone
NOTICE: destination directory
"/postgresql/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.52.7
user=replicator dbname=postgres
DETAIL: current installation size is 31 MB
INFO: replication slot usage not
requested; no replication slot will be
set up for this standby
NOTICE: checking for available walsenders
on the source node (2 required)
NOTICE: checking replication connections
can be made to the source server (2 required)
WARNING: data checksums are not enabled and
"wal_log_hints" is "off"
DETAIL: pg_rewind requires
"wal_log_hints" to be enabled
INFO: checking and correcting permissions
on existing directory "/postgresql/pgdata"
NOTICE: starting backup (using
pg_basebackup)...
HINT: this may take some time; consider
using the -c/--fast-checkpoint option
INFO: executing:
pg_basebackup -l "repmgr base backup" -D /postgresql/pgdata -h 192.168.52.7 -p 5432
-U replicator -X stream
WARNING:
skipping special file "./.s.PGSQL.5432"
WARNING:
skipping special file "./.s.PGSQL.5432"
NOTICE: standby clone (using pg_basebackup)
complete
NOTICE: you can now start your PostgreSQL
server
HINT: for example: pg_ctl -D
/postgresql/pgdata start
HINT: after starting the server, you need
to register this standby with "repmgr standby register"
[postgres@pg08 ~]$
pg_ctl -D /postgresql/pgdata start
waiting for server to start....2021-09-22
10:07:56.869 CST [2362] LOG: redirecting
log output to logging collector process
2021-09-22 10:07:56.869 CST [2362]
HINT: Future log output will appear in
directory "pg_log".
done
server started
[postgres@pg08 ~]$ repmgr
standby register
INFO: connecting to local node "pg08"
(ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied,
assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "pg08" (ID:
2) successfully registered
[postgres@pg08 ~]$ repmgr cluster show
ID |
Name | Role | Status | Upstream | Location | Priority | Timeline
| Connection string
----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------
1 | pg07
| primary | * running | |
default | 100 | 1
| host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2
2 | pg08
| standby | running | pg07 | default
| 100 | 1 | host=192.168.52.8 user=replicator
dbname=postgres connect_timeout=2
[postgres@pg08 ~]$ psql
psql (13.4)
Type "help" for help.
postgres=# select * from
pg_stat_replication;
pid
| usesysid | usename | application_name | client_addr | client_hostname |
client_port | backend_start | backend_xmin | state | sent_lsn
| write_lsn | flush_lsn | replay_lsn |
write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+--------------+-------+----------
+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+------------
(0 rows)
postgres=# \x
Expanded display is on.
+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+---
(0 rows)
postgres=# select * from
pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 2368
status | streaming
receive_start_lsn | 0/8000000
receive_start_tli | 1
written_lsn | 0/8000570
flushed_lsn | 0/8000570
received_tli | 1
last_msg_send_time | 2021-09-22 10:08:35.703193+08
last_msg_receipt_time | 2021-09-22
10:08:35.714032+08
latest_end_lsn | 0/8000570
latest_end_time | 2021-09-22 10:08:05.694295+08
slot_name |
sender_host | 192.168.52.7
sender_port | 5432
conninfo | user=replicator
passfile=/home/postgres/.pgpass channel_binding=disable connect_timeout=2
dbname=replication host=192.168.52.7 port=5432 application_name=pg08
fallback_application_name=walreceiver sslmode=disable sslcompression=0
ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres
target_session_attrs=any
3.主备库切换测试:
在备库做switchover
[postgres@pg08 ~]$ repmgr
standby switchover
在备库做promote激活
主库无法恢复的情况下,在备库操作激活动作。
[postgres@pg08 ~]$ repmgr
standby promote
NOTICE: promoting standby to primary
DETAIL: promoting server "pg08"
(ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter
"promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "pg08" (ID: 2) was
successfully promoted to primary
[postgres@pg08 ~]$ repmgr cluster show
ID |
Name | Role | Status | Upstream | Location | Priority | Timeline
| Connection string
----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------
1 | pg07
| primary | - failed | ? | default | 100
| | host=192.168.52.7 user=replicator
dbname=postgres connect_timeout=2
2 | pg08
| primary | * running | |
default | 100 | 4
| host=192.168.52.8 user=replicator dbname=postgres connect_timeout=2
WARNING: following issues were detected
-
unable to connect to node "pg07" (ID: 1)
HINT: execute with --verbose option to see
connection error messages
等主库恢复后,重新作为备库加入:
[postgres@pg07
~]$ repmgr
node rejoin -h192.168.52.8 -Upostgres -dpostgres --force-rewind --config-files=postgresql.conf,postgresql.auto.conf
--verbose
INFO: looking for configuration file in
/etc
INFO: configuration file found at:
"/etc/repmgr.conf"
NOTICE: pg_rewind execution required for
this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 4
forked off current database system timeline 3 before current recovery point
0/C000028
INFO: prerequisites for using pg_rewind are
met
INFO: 2 files copied to
"/tmp/repmgr-config-archive-pg07"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is
"/postgresql/pg13/bin/pg_rewind -D '/postgresql/pgdata'
--source-server='host=192.168.52.8 user=replicator dbname=postgres
connect_timeout=2'"
NOTICE: 2 files copied to
/postgresql/pgdata
INFO: directory
"/tmp/repmgr-config-archive-pg07" deleted
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.52.7
user=replicator dbname=postgres connect_timeout=2"
DETAIL: PQping() returned
"PQPING_NO_RESPONSE"
NOTICE: starting server using
"/postgresql/pg13/bin/pg_ctl -w -D
'/postgresql/pgdata' start"
INFO: node "pg07" (ID: 1) is
pingable
INFO: node "pg07" (ID: 1) has
attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
[postgres@pg07 ~]$ repmgr cluster show
ID |
Name | Role | Status | Upstream | Location | Priority | Timeline
| Connection string
----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------
1 | pg07
| standby | running | pg08 | default
| 100 | 3 | host=192.168.52.7 user=replicator
dbname=postgres connect_timeout=2
或者clone恢复(全库恢复时间长,不推荐,rejion失败的情况下可采用该方法):
repmgr standby clone -h 192.168.52.8 -Upostgres
最后主备又可正常switchover了。
[postgres@pg07 ~]$ psql
psql (13.4)
Type "help" for help.
postgres=# \l
List of
databases
Name | Owner
| Encoding | Collate |
Ctype | Access privileges
-----------+----------+----------+------------+------------+-----------------------
jyc
| postgres | UTF8 | en_US.utf8
| en_US.utf8 |
postgres
| postgres | UTF8 | en_US.utf8
| en_US.utf8 |
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
=c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
=c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
postgres=# \dt
Did not find any relations.
postgres=# pwd
postgres-# \c postgres
You are now connected to database
"postgres" as user "postgres".
postgres-# \dt
Did not find any relations.
postgres-# \dx
List of installed
extensions
Name | Version | Schema
| Description
---------+---------+------------+------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural
language
repmgr
| 5.2 | repmgr
| Replication manager for PostgreSQL
(2 rows)
postgres-# \du
List of
roles
Role
name | Attributes | Member of
------------+------------------------------------------------------------+-----------
postgres
| Superuser, Create role, Create DB, Replication, Bypass RLS | {}
replicator | Superuser, Replication | {}
postgres-# \dt+
Did not find any relations.
postgres-# \q
[postgres@pg07 ~]$ repmgr cluster show
ID |
Name | Role | Status | Upstream | Location | Priority | Timeline
| Connection string
----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------
1 | pg07
| primary | * running | |
default | 100 | 5
| host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2
2 | pg08
| standby | running | pg07 | default
| 100 | 5 | host=192.168.52.8 user=replicator
dbname=postgres connect_timeout=2
[postgres@pg07 ~]$
评论
