repmgr5.2.1+PostgreSQL13.4高可用集群安装部署
1.安装环境介绍
虚拟机版本: VMware Workstation Pro 12.1
虚拟机硬盘: 30G,动态分配
操作系统: centos7.6 x86_64
内存: 2G
CPU: 1
PostgreSQL: 13.4
repmgr: 5.2.1
安装介质:/opt/soft/pg13/postgresql-13.4.tar.gz /opt/soft/pg13/repmgr-5.2.1.tar.gz
2.repmgr架构
主机名 | IP | 角色 | 开发端口 | 服务 |
pgdb1 | 192.168.10.81 | primary | 1304 | pg13 and repmgr |
pgdb2 | 192.168.10.82 | standby | 1304 | pg13 and repmgr |
pgdb3 | 192.168.10.83 | standby | 1304 | pg13 and repmgr |
pgdb4 | 192.168.10.84 | witness | 1304 | pg13 and repmgr |
3.安装部署
3.1.关闭防火墙
在4个节点都执行,su - root
systemctl stop firewalld
systemctl disable firewalld
3.2.安装rpm包
在4个节点都执行,su - root
配置yum源后,安装常用的rpm包
yum -y install gcc gcc-c++ zlib-devel readline-devel perl-ExtUtils-Embed pam-devel libxml2-devel libxslt-devel openldap-devel python-devel openssl-devel cmake tcl tcl-devel flex bison bzip2 unzip zip net-tools dstat telnet
3.3.内核参数配置
在4个节点都执行,su - root
编辑/etc/sysctl.conf ,添加如下内容:
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.shmmni = 4096
kernel.sem = 50100 64128000 50100 1280
fs.file-max = 7672460
net.ipv4.ip_local_port_range = 9000 65000
net.core.rmem_default = 1048576
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
执行sysctl -p生效
3.4.创建组和用户
在4个节点都执行,su - root
groupadd postgres -g 1000
useradd postgres -g 1000 -u 1000
passwd postgres 密码为 XXXXX
3.5.修改环境变量
在4个节点都执行,su - postgres
vi ~/.bash_profile
export PGPORT=1304
export PG_HOME=/home/postgres/pg1304
export PGDATA=/home/postgres/pgdata1304
export LD_LIBRARY_PATH=$PG_HOME/lib
export PATH=$PG_HOME/bin:$PATH
3.6.源码编译安装软件
3.6.1.编译安装PostgreSQL软件
在4个节点都执行,su - root
chmod 777 /opt/soft/pg13
cd /opt/soft/pg13
chown postgres:postgres postgresql-13.4.tar.gz
在4个节点都执行,su - postgres
mkdir -p /home/postgres/pg1304
mkdir -p /home/postgres/pgdata1304
cd /opt/soft/pg13
tar -xzf postgresql-13.4.tar.gz
cd /opt/soft/pg13/postgresql-13.4
./configure --prefix=/home/postgres/pg1304
gmake world && gmake install-world
3.6.2.编译安装repmgr软件
在4个节点都执行,su - root
chmod 777 /opt/soft/pg13
cd /opt/soft/pg13
chown postgres:postgres repmgr-5.2.1.tar.gz
在4个节点都执行,su - postgres
cd /opt/soft/pg13
tar -zxvf repmgr-5.2.1.tar.gz
cd repmgr-5.2.1
./configure PG_CONFIG=/home/postgres/pg1304/bin/pg_config
make && make install
3.7.配置/etc/hosts
在4个节点都执行,su - root
vi /etc/hosts
192.168.10.81 pgdb1
192.168.10.82 pgdb2
192.168.10.83 pgdb3
192.168.10.84 pgdb4
3.8.配置SSH互信
在4个节点都执行,su - postgres
ssh-keygen -t rsa #每个节点都执行,不输入任何信息,一直回车
ssh-copy-id -i .ssh/id_rsa.pub postgres@pgdb1 #每个节点都执行,先敲yes,然后输入密码,密码为XXXXX
ssh-copy-id -i .ssh/id_rsa.pub postgres@pgdb2 #每个节点都执行,先敲yes,然后输入密码,密码为XXXXX
ssh-copy-id -i .ssh/id_rsa.pub postgres@pgdb3 #每个节点都执行,先敲yes,然后输入密码,密码为XXXXX
ssh-copy-id -i .ssh/id_rsa.pub postgres@pgdb4 #每个节点都执行,先敲yes,然后输入密码,密码为XXXXX
验证:
ssh pgdb1 date;ssh pgdb2 date;ssh pgdb3 date;ssh pgdb4 date
不输人密码是正常的,要输入密码配置异常,要重新配置。
3.9.流复制主数据库初始化配置
说明:部署repmgr,要进行2次数据库初始化,一次是初始化witness仲裁库,具体步骤详见本文最后的附录章节:附录“初始化witness仲裁库”;另外一次是搭建流复制时,在主节点进行主库初始化,具体步骤详见本章节。
3.9.1.流复制主数据库初始化
只在pgdb1节点,进行流复制主数据库初始化,su - postgres
数据库初始化
/home/postgres/pg1304/bin/initdb --pgdata=/home/postgres/pgdata1304 --pwprompt
输入密码:XXXXX
说明:
-W, --pwprompt prompt for a password for the new superuser
启动pg
/home/postgres/pg1304/bin/pg_ctl -D /home/postgres/pgdata1304 -l /home/postgres/pg.log start
3.9.2.修改postgresql.conf文件
在pgdb1节点执行,su - postgres
mkdir -p /home/postgres/arch
cd $PGDATA
cp postgresql.conf postgresql.conf.bak
vi postgresql.conf,添加如下内容:
listen_addresses ='0.0.0.0'
port = 1304
shared_preload_libraries = 'repmgr'
wal_log_hints = on
hba_file ='/home/postgres/repmgr/pg_hba.conf'
archive_mode = on
archive_command = 'cp %p /home/postgres/arch/%f'
wal_level = replica
max_wal_senders=20
wal_keep_size = 2048
restore_command = 'cp /home/postgres/arch/%f %p'
recovery_target_timeline='latest'
full_page_writes = on
说明:
wal_keep_segments:PostgreSQL13版本后,wal_keep_segments参数以及废弃了,由wal_keep_size替代此参数。wal_keep_size的单位为MB。
3.9.3.配置pg_hba.conf文件
在pgdb1、pgdb2、pgdb3 三个节点都执行,su - postgres
mkdir -p /home/postgres/ repmgr
vi /home/postgres/repmgr/pg_hba.conf,添加如下内容:
local all all trust
host repmgr repmgr 192.168.10.81/32 trust
host repmgr repmgr 192.168.10.82/32 trust
host repmgr repmgr 192.168.10.83/32 trust
host repmgr repmgr 192.168.10.84/32 trust
host all all 0/0 md5
host all all ::/0 md5
# forbid self-replication its own IP
local replication all reject
host replication all 127.0.0.0/8 reject
host replication all ::1/128 reject
# allow any standby connection
host replication repmgr 0.0.0.0/0 trust
3.10.配置PostgreSQL服务开机启动
在4个节点都执行
su - root
ls -l /opt/soft/pg13/postgresql-13.4/contrib/start-scripts
ls -l /etc/init.d/
cp /opt/soft/pg13/postgresql-13.4/contrib/start-scripts/linux /etc/init.d/postgresql-13
chmod +x /etc/init.d/postgresql-13
chkconfig postgresql-13 on
chkconfig --list
cp /etc/init.d/postgresql-13 /etc/init.d/postgresql-13.bak
vi /etc/init.d/postgresql-13
如下postgresql-13文件中部分内容:
修改前:
## EDIT FROM HERE
# Installation prefix
prefix=/usr/local/pgsql
# Data directory
PGDATA="/usr/local/pgsql/data"
# Who to run the postmaster as, usually "postgres". (NOT "root")
PGUSER=postgres
# Where to keep a log file
PGLOG="$PGDATA/serverlog"
# It's often a good idea to protect the postmaster from being killed by the
# OOM killer (which will tend to preferentially kill the postmaster because
# of the way it accounts for shared memory). To do that, uncomment these
# three lines:
#PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
#PG_MASTER_OOM_SCORE_ADJ=-1000
#PG_CHILD_OOM_SCORE_ADJ=0
# Older Linux kernels may not have /proc/self/oom_score_adj, but instead
# /proc/self/oom_adj, which works similarly except for having a different
# range of scores. For such a system, uncomment these three lines instead:
#PG_OOM_ADJUST_FILE=/proc/self/oom_adj
#PG_MASTER_OOM_SCORE_ADJ=-17
#PG_CHILD_OOM_SCORE_ADJ=0
## STOP EDITING HERE
修改后:
## EDIT FROM HERE
# Installation prefix
prefix=/home/postgres/pg1304
# Data directory
PGDATA="/home/postgres/pgdata1304"
# Who to run the postmaster as, usually "postgres". (NOT "root")
PGUSER=postgres
# Where to keep a log file
PGLOG="$PGDATA/serverlog"
# It's often a good idea to protect the postmaster from being killed by the
# OOM killer (which will tend to preferentially kill the postmaster because
# of the way it accounts for shared memory). To do that, uncomment these
# three lines:
#PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
#PG_MASTER_OOM_SCORE_ADJ=-1000
#PG_CHILD_OOM_SCORE_ADJ=0
# Older Linux kernels may not have /proc/self/oom_score_adj, but instead
# /proc/self/oom_adj, which works similarly except for having a different
# range of scores. For such a system, uncomment these three lines instead:
#PG_OOM_ADJUST_FILE=/proc/self/oom_adj
#PG_MASTER_OOM_SCORE_ADJ=-17
#PG_CHILD_OOM_SCORE_ADJ=0
## STOP EDITING HERE
说明:
1、只修改了2处:prefix和PGDATA
2、PGLOG:pg的启停日志文件
可以使用如下命令测试服务是否可用:
# service postgresql-13 start | stop | restart | reload |status
#systemctl disable postgresql-13
#systemctl enable postgresql-13
#systemctl status postgresql-13
#systemctl stop postgresql-13
#systemctl start postgresql-13
3.11.配置repmgr服务开机启动
在4个节点都执行
su -root
cd /etc/systemd/system
vi /etc/systemd/system/repmgrd.service ,手工编辑一个repmgrd.service文件,内容如下:
其中,标红部分要根据情况进行修改。
[root@pgdb1 system]# cat /etc/systemd/system/repmgrd.service
[Unit]
Description=A replication manager, and failover management tool for PostgreSQL
After=syslog.target
After=network.target
After=postgresql-13.service
[Service]
Type=forking
User=postgres
Group=postgres
# PID file
PIDFile=/tmp/repmgrd.pid
# Location of repmgr conf file:
Environment=REPMGRDCONF=/home/postgres/repmgr/repmgr.conf
Environment=PIDFILE=/tmp/repmgrd.pid
# Where to send early-startup messages from the server
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog
ExecStartPre=/bin/sleep 10
ExecStart=/home/postgres/pg1304/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose
ExecStop=/usr/bin/kill -TERM $MAINPID
ExecReload=/usr/bin/kill -HUP $MAINPID
# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300
[Install]
WantedBy=multi-user.target
[root@pgdb1 system]#
说明:ExecStartPre=/bin/sleep 10,休眠10秒可能较短,要根据情况调大。
可以使用如下命令测试服务是否可用:
su -root
/usr/bin/systemctl disable repmgrd.service
/usr/bin/systemctl enable repmgrd.service
/usr/bin/systemctl start repmgrd.service
/usr/bin/systemctl stop repmgrd.service
/usr/bin/systemctl status repmgrd.service
/usr/bin/systemctl daemon-reload
3.12.sudo权限设置
在4个节点都执行,su - root
ls -l /etc/sudoers.d/
echo "postgres ALL = (root) NOPASSWD:/usr/bin/systemctl start postgresql-13,/usr/bin/systemctl restart postgresql-13,/usr/bin/systemctl stop postgresql-13,/usr/bin/systemctl reload postgresql-13,/usr/bin/systemctl status postgresql-13,/usr/bin/systemctl disable repmgrd.service,/usr/bin/systemctl enable repmgrd.service,/usr/bin/systemctl start repmgrd.service,/usr/bin/systemctl stop repmgrd.service,/usr/bin/systemctl status repmgrd.service,/usr/bin/systemctl daemon-reload" | sudo tee /etc/sudoers.d/postgres
说明:操作之前,/etc/sudoers.d/目录是一个空目录
sudo权限配置好后,可以su - postgres,验证postgres用户是否有sudo权限
sudo /usr/bin/systemctl start postgresql-13
sudo /usr/bin/systemctl restart postgresql-13
sudo /usr/bin/systemctl stop postgresql-13
sudo /usr/bin/systemctl reload postgresql-13
sudo /usr/bin/systemctl status postgresql-13
sudo /usr/bin/systemctl disable repmgrd.service
sudo /usr/bin/systemctl enable repmgrd.service
sudo /usr/bin/systemctl start repmgrd.service
sudo /usr/bin/systemctl stop repmgrd.service
sudo /usr/bin/systemctl status repmgrd.service
sudo /usr/bin/systemctl daemon-reload
3.13.repmgr权限设置
在pgdb1节点,su - postgres
登录到主库psql命令行执行
3.13.1.repmgr扩展
create user repmgr superuser;
\c postgres repmgr
create database repmgr owner repmgr;
\c repmgr repmgr
drop schema public;
create schema repmgr authorization repmgr;
create extension repmgr with schema repmgr;
3.13.2.pg_rewind权限
\c repmgr postgres
grant execute on function pg_read_binary_file(text) to repmgr;
grant execute on function pg_read_binary_file(text, bigint, bigint) to repmgr;
grant execute on function pg_read_binary_file(text, bigint, bigint, boolean) to repmgr;
grant execute on function pg_ls_dir(text) to repmgr;
grant execute on function pg_ls_dir(text, boolean, boolean) to repmgr;
grant execute on function pg_stat_file(text) to repmgr;
grant execute on function pg_stat_file(text,boolean) to repmgr;
3.14.配置repmgr.conf文件
基础配置参数:
node_id: 集群中唯一的正整数
node_name: 集群中唯一的字符串,通常配置为主机名
conninfo: 数据库连接字符串,集群中的所有服务器都必须能够使用此字符串免密连接到本地节点。
data directory: 节点的数据目录,repmgr需要使用该数据目录对PostgreSQL服务进行操作管理。
在4个节点都执行,su - postgres
mkdir -p /home/postgres/ repmgr
在pgdb1节点,vi /home/postgres/repmgr/repmgr.conf,添加:
node_id=1
node_name='pgdb1'
conninfo='host=192.168.10.81 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/pgdata1304'
replication_user=' repmgr'
replication_type='physical'
repmgr_bindir='/home/postgres/pg1304/bin'
pg_bindir='/home/postgres/pg1304/bin'
在pgdb2节点,vi /home/postgres/repmgr/repmgr.conf,添加:
node_id=2
node_name='pgdb2'
conninfo='host=192.168.10.82 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/pgdata1304'
replication_user=' repmgr'
replication_type='physical'
repmgr_bindir='/home/postgres/pg1304/bin'
pg_bindir='/home/postgres/pg1304/bin'
在pgdb3节点,vi /home/postgres/repmgr/repmgr.conf,添加:
node_id=3
node_name='pgdb3'
conninfo='host=192.168.10.83 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/pgdata1304'
replication_user=' repmgr'
replication_type='physical'
repmgr_bindir='/home/postgres/pg1304/bin'
pg_bindir='/home/postgres/pg1304/bin'
在pgdb4节点,vi /home/postgres/repmgr/repmgr.conf,添加:
node_id=4
node_name='pgdb4'
conninfo='host=192.168.10.84 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/pgdata1304'
replication_user='repmgr'
replication_type='physical'
repmgr_bindir='/home/postgres/pg1304/bin'
pg_bindir='/home/postgres/pg1304/bin
3.15.standby clone
3.15.1.备库克隆standby clone
在pgdb2、pgdb3 两个节点执行,su - postgres
repmgr standby clone --dry-run #说明--dry-run 测试检查环境是否具备,不实际执行
repmgr standby clone
repmgr --host=192.168.10.81 --port=1304 --dbname=repmgr --username=repmgr --config-file=/home/postgres/repmgr/repmgr.conf standby clone --dry-run
repmgr --host=192.168.10.81 --port=1304 --dbname=repmgr --username=repmgr --config-file=/home/postgres/repmgr/repmgr.conf standby clone
3.15.2.创建standby.signal文件
在pgdb2、pgdb3 两个节点执行,su - postgres
cd $PGDATA
ls -l standby.signal
正常情况下,执行完standby clone后,会产生一个standby.signal空文件。如果没有需要touch创建一个空文件。
touch /home/postgres/pgdata1304/standby.signal
3.15.3.启动pg
在pgdb2、pgdb3 两个节点执行,su - postgres
启动pg
/home/postgres/pg1304/bin/pg_ctl -D /home/postgres/pgdata1304 -l /home/postgres/pg.log start
3.15.4.测试连接
在pgdb1、pgdb2、pgdb3 三个节点都执行,su - postgres
在pgdb1节点执行
psql 'host=192.168.10.82 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
psql 'host=192.168.10.83 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
在pgdb2节点执行
psql 'host=192.168.10.81 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
psql 'host=192.168.10.83 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
在pgdb3节点执行
psql 'host=192.168.10.81 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
psql 'host=192.168.10.82 port=1304 user=repmgr dbname=repmgr connect_timeout=2'
3.16.注册节点
--dry-run 测试检查环境是否具备,不实际执行
--force 强制注册
3.16.1.注册primary节点
在pgdb1节点执行,su - postgres
注册primary主节点
repmgr --config-file=/home/postgres/repmgr/repmgr.conf primary register --force
3.16.2.注册standby节点
在pgdb2、pgdb3节点执行,su - postgres
注册standby备节点
repmgr --config-file=/home/postgres/repmgr/repmgr.conf standby register --force --dry-run
repmgr --config-file=/home/postgres/repmgr/repmgr.conf standby register --force
3.16.3.注册witness节点
说明:需要初始化witness仲裁库后,才能注册witness节点。
初始化witness仲裁库具体步骤详见本文最后的附录章节:附录“初始化witness仲裁库”。
完成初始化witness仲裁库后,再继续注册witness节点。
在pgdb4节点执行,su – postgres
注册witness节点
repmgr --config-file=/home/postgres/repmgr/repmgr.conf witness register --force --host=192.168.10.81 -U repmgr -d repmgr --verbose --dry-run
repmgr --config-file=/home/postgres/repmgr/repmgr.conf witness register --force --host=192.168.10.81 -U repmgr -d repmgr --verbose
说明:
--host:填写主节点的IP地址。
3.17.检查集群状态
在4个节点中任意节点都可以执行,su – postgres
repmgr --config-file=/home/postgres/repmgr/repmgr.conf cluster show
repmgr --config-file=/home/postgres/repmgr/repmgr.conf cluster show --compact
3.18.repmgr failover
3.18.1.配置automatic failover
在pgdb1、pgdb2、pgdb3三个节点执行,su - postgres
vi /home/postgres/repmgr/repmgr.conf,添加如下内容:
failover='automatic'
promote_command='/home/postgres/pg1304/bin/repmgr standby promote --config-file=/home/postgres/repmgr/repmgr.conf --log-to-file'
follow_command='/home/postgres/pg1304/bin/repmgr standby follow --config-file=/home/postgres/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10
standby_disconnect_on_failover=true
在pgdb1、pgdb2、pgdb3、pgdb4四个节点执行,su - postgres
vi /home/postgres/repmgr/repmgr.conf,添加如下内容:
service_start_command = 'sudo systemctl start postgresql-13'
service_stop_command = 'sudo systemctl stop postgresql-13'
service_restart_command = 'sudo systemctl restart postgresql-13'
service_reload_command = 'sudo systemctl reload postgresql-13'
repmgrd_service_start_command = '/home/postgres/pg1304/bin/repmgrd -f /home/postgres/repmgr/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize'
repmgrd_service_stop_command = 'kill `cat /tmp/repmgrd.pid`'
3.18.2.启动repmgr守护进程
在4个节点中都可以执行,su – postgres
ps -ef|grep -i repmgrd
repmgr --config-file=/home/postgres/repmgr/repmgr.conf daemon start --dry-run
启动repmgrd进程
repmgr --config-file=/home/postgres/repmgr/repmgr.conf daemon start
或者
/home/postgres/pg1304/bin/repmgrd -d -f /home/postgres/repmgr/repmgr.conf
停止repmgrd进程
repmgr --config-file=/home/postgres/repmgr/repmgr.conf daemon stop
3.18.3.failover测试
测试说明:主节点在pgdb1节点,停掉主节点pg实例,观察primary节点是否failover到其他standby节点。
su - postgres,每个节点都执行,看进程是否存在,看repmgr集群状态是否正常。
ps f -u postgres
repmgr --config-file=/home/postgres/repmgr/repmgr.conf cluster show --compact
停止1节点
pg_ctl stop -D /home/postgres/pgdata1304 --mode=immediate
每个节点都执行,看repmgr集群状态
repmgr --config-file=/home/postgres/repmgr/repmgr.conf cluster show --compact
3.19.repmgr switchover
在pgdb2或pgdb3节点执行, su - postgres,切换角色
观察集群状态
repmgr --config-file=/home/postgres/repmgr/repmgr.conf cluster show –compact
角色切换,在备库执行
repmgr --config-file=/home/postgres/repmgr/repmgr.conf standby switchover --siblings-follow --dry-run
repmgr --config-file=/home/postgres/repmgr/repmgr.conf standby switchover --siblings-follow
观察集群状态
repmgr --config-file=/home/postgres/repmgr/repmgr.conf cluster show –compact
4.附录
4.1.初始化witness仲裁库
只在pgdb4节点进行仲裁库初始化,su - postgres
4.1.1.初始化仲裁库
初始化仲裁库
/home/postgres/pg1304/bin/initdb --pgdata=/home/postgres/pgdata1304 --pwprompt
输入密码: XXXXX
4.1.2.配置仲裁库postgresql.conf文件
vi /home/postgres/pgdata1304/postgresql.conf,添加如下内容:
listen_addresses ='0.0.0.0'
port = 1304
shared_preload_libraries = 'repmgr'
wal_log_hints = on
hba_file ='/home/postgres/repmgr/pg_hba.conf'
4.1.3.配置仲裁库的pg_hba.conf文件
mkdir -p /home/postgres/ repmgr
vi /home/postgres/repmgr/pg_hba.conf,添加如下内容:
local all all trust
host repmgr repmgr 192.168.10.81/32 trust
host repmgr repmgr 192.168.10.82/32 trust
host repmgr repmgr 192.168.10.83/32 trust
host repmgr repmgr 192.168.10.84/32 trust
host all all 0/0 md5
host all all ::/0 md5
# forbid self-replication its own IP
local replication all reject
host replication all 127.0.0.0/8 reject
host replication all ::1/128 reject
# allow any standby connection
host replication repmgr 0.0.0.0/0 trust
4.1.4.启动pg仲裁库
/home/postgres/pg1304/bin/pg_ctl -D /home/postgres/pgdata1304 -l /home/postgres/pg.log start
4.1.5.仲裁库中创建用户
psql
create user repmgr superuser;
create database repmgr owner repmgr;




