暂无图片
暂无图片
1
暂无图片
暂无图片
1
暂无图片

通过repmgr5.2.1工具实现postgresql13.4的主备切换及激活管理

原创 jieguo 2021-11-10
695

主备需求:

通过repmgr5.2.1工具实现postgresql13.4的主备切换及激活管理(已实战于生产环境)

 

操作系统版本统一:centos7.7x64

主:192.168.52.7

备:192.168.52.8

root/postgres等密码统一设置好。

 

主要步骤:

安装postgresql

安装repmgr

主备库注册及切换测试

 

提示:各软件版本可根据实际情况选择即可,若希望尝鲜,那就选择最新即可。

1.安装postgresql

增加用户:

adduser postgres

wget https://ftp.postgresql.org/pub/source/v13.4/postgresql-13.4.tar.gz

mkdir -p /postgresql/{pgdata,archive,scripts,backup,pg13,soft}

chown -R postgres:postgres /postgresql

yum install -y cmake make gcc zlib gcc-c++ perl readline readline-devel zlib zlib-devel  3perl python36 tcl openssl ncurses-devel openldap pam

mv /root/postgresql-13.4.tar.gz ./

chown -R postgres:postgres /postgresql

 

su - postgres

cd /postgresql/soft/

tar -zxvf postgresql-13.4.tar.gz

cd postgresql-13.4

./configure --prefix=/postgresql/pg13

make -j 8 && make install

 

 

systemctl daemon-reload

systemctl enable PG13

systemctl start PG13

systemctl status PG13

 

[root@pg07 ~]# cat /etc/systemd/system/PG13.service

[Unit]

Description=PostgreSQL database server

Documentation=man:postgres(1)

After=network.target

 

[Service]

Type=forking

User=postgres

Group=postgres

Environment=PGPORT=5432

Environment=PGDATA=/postgresql/pgdata

OOMScoreAdjust=-1000

ExecStart=/postgresql/pg13/bin/pg_ctl start -D ${PGDATA} -s -o "-p ${PGPORT}" -w -t 300

ExecStop=/postgresql/pg13/bin/pg_ctl stop -D ${PGDATA} -s -m fast

ExecReload=/postgresql/pg13/bin/pg_ctl reload -D ${PGDATA} -s

KillMode=mixed

KillSignal=SIGINT

TimeoutSec=0

 

[Install]

WantedBy=multi-user.target

 

cat .bash_profile内容:环境变量参考,也可用于/etc/profile中

PATH=$PATH:$HOME/.local/bin:$HOME/bin:/postgresql/pg13/bin

 

export PATH

export LANG=en_US.UTF-8

export PS1="[\u@\h \W]\$ "

export PGPORT=5432

export PGDATA=/postgresql/pgdata

export PGHOME=/postgresql/pg13

export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH

export PATH=$PGHOME/bin:$PATH:.

export DATE=`date +"%Y%m%d%H%M"`

export MANPATH=$PGHOME/share/man:$MANPATH

export PGHOST=$PGDATA

export PGUSER=postgres

export PGDATABASE=postgres

 

 

pg库测试连接:

psql -U postgres -h 192.168.52.7 -d postgres -p5432

\l

 

 

/etc/profile内容:编译repmgr会用到

unset -f pathmunge

PATH=$PATH:/postgresql/pg13/bin

 

 

postgres密码:abcd1234

[root@pg07 repmgr-5.2.1]# su - postgres

Last login: Wed Sep 22 10:16:02 CST 2021 on pts/0

[postgres@pg07 ~]$ pwd

/home/postgres

[postgres@pg07 ~]$ more .pgpass

192.168.52.7:5432:postgres:replicator:rep

192.168.52.8:5432:postgres:replicator:rep

 

 

pg07上:

su - postgres

mkdir –p $PGDATA/arch

/postgresql/pgdata/arch.sh内容:

test ! -f $PGDATA/arch/$1 && cp --preserve=timestamps $2 $PGDATA/arch/$1 ; find $PGDATA/arch/ -type f -mtime +7 -exec rm -f {} \;

postgresql.conf内容:

listen_addresses = '*' #改成*便于将来增加浮动IP的监听

port=5432

data_directory = '/app/pgsql/data'

unix_socket_directories='/app/pgsql/data'

logging_collector = on

log_directory = 'pg_log'

log_filename = 'postgresql-%a.log'

log_truncate_on_rotation = on

wal_level = replica

archive_mode = on

archive_command = 'arch.sh %f %p'

wal_log_hints = on

hot_standby = on

 

 

[postgres@pg07 data]$ more postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

max_connections = '500' #根据业务连接数分配

shared_buffers = '20GB' #根据操作系统的物理内存适当分配即可。

wal_buffers = '64MB'

primary_conninfo = 'user=postgres connect_timeout=2 host=192.168.52.8 application_name=pg07'

pg08上:(注意以下参数文件无需配置(此处只是列出参考内容),在配置standb clone的时候会从主库自动同步生成($PGDATA整个目录都会自动同步))

[postgres@pg08 data]$ more postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

max_connections = '500'

shared_buffers = '20GB'

wal_buffers = '64MB'

primary_conninfo = 'user=postgres connect_timeout=2 host=192.168.52.7 application_name=pg08'

 

[postgres@pg08 data]$ more postgresql.conf

listen_addresses = '*'

port=5432

unix_socket_directories='/app/pgsql/data'

logging_collector = on

log_directory = 'pg_log'

log_filename = 'postgresql-%a.log'

log_truncate_on_rotation = on

wal_level = replica

archive_mode = on

archive_command = 'arch.sh %f %p'

wal_log_hints = on

hot_standby = on

 

配置复制用户:(如使用管理员postgres,该步骤可忽略)
create role replicator with replication login;

\password replicator

rep

 

[postgres@pg07 ~]$ psql

psql (13.4)

Type "help" for help.

 

postgres=# \du

                                    List of roles

 Role name  |                         Attributes                         | Member of

------------+------------------------------------------------------------+-----------

 postgres   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

 replicator | Replication                                                | {}

 

postgres=# alter user replicator with superuser;

ALTER ROLE

postgres=# \du

                                    List of roles

 Role name  |                         Attributes                         | Member of

------------+------------------------------------------------------------+-----------

 postgres   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

 replicator | Superuser, Replication  

pg_hba.conf内容:

# "local" is for Unix domain socket connections only

local   all             all                                     md5

# IPv4 local connections:

host    all             all             127.0.0.1/32            md5

host    all             all             10.62.0.0/16            md5

host    all             all             172.16.18.0/24          md5

# IPv6 local connections:

host    all             all             ::1/128                 md5

# Allow replication connections from localhost, by a user with the

# replication privilege.

local   replication     all                                     md5

host    replication     all             172.16.18.0/24          md5

host    replication     all             192.168.52.0/24         trust

host    replication     all             127.0.0.1/32            md5

host    replication     all             ::1/128                 md5

启停pg:

pg_ctl stop

pg_ctl start

在线加载配置变更:pg_ctl reload

 

主备配置ssh互信:

[postgres@pg07 ~]$ cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.52.7  pg07

192.168.52.8  pg08

 

su - postgres进入postgres用户操作

pg07上操作:

# 生成秘钥到用户主目录下的.ssh文件夹下
ssh-keygen -t rsa
# 将秘钥拷贝到远程机器
ssh-copy-id -i .ssh/id_rsa.pub postgres@pg08

cat .ssh/id_*.pub |ssh postgres@192.168.52.8 'cat >> .ssh/authorized_keys'
# # 验证是否授权完成:不提示密码,直接返回日期说明配置正确
ssh pg08 date

pg08上操作:

# 生成秘钥到用户主目录下的.ssh文件夹下
ssh-keygen -t rsa
# 将秘钥拷贝到远程机器

cat .ssh/id_*.pub |ssh postgres@192.168.52.7 'cat >> .ssh/authorized_keys'

Benz_58858588
ssh-copy-id -i .ssh/id_rsa.pub postgres@pg07
# # 验证是否授权完成:不提示密码,直接返回日期说明配置正确
ssh pg07 date

 

主备测试链接:备库连主库

[postgres@pg08 ~]$ psql 'host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2'

Password for user replicator:rep

2.编译repmgr-5.2.1

https://repmgr.org/download/repmgr-5.2.1.tar.gz

 

tar -zxvf repmgr-5.2.1.tar.gz

pg07上:

[postgres@pg07 ~]$ cat /etc/repmgr.conf

node_id=1

node_name=pg07

conninfo='host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2'

data_directory='/postgresql/pgdata'

pg_bindir='/postgresql/pg13/bin'

 

pg08上:

[postgres@pg08 ~]$ cat /etc/repmgr.conf

node_id=2

node_name=pg08

conninfo='host=192.168.52.8 user=replicator dbname=postgres connect_timeout=2'

data_directory='/postgresql/pgdata'

pg_bindir='/postgresql/pg13/bin'

 

yum install flex -y

[root@pg07 repmgr-5.2.1]# ./configure --prefix=/postgresql/pg13

[root@pg07 repmgr-5.2.1]# make && make install

 

主库注册:

[postgres@pg07 ~]$ pg_ctl start  如果在线修改配置pg_ctl reload即可

[postgres@pg07 ~]$ repmgr primary register

INFO: connecting to primary database...

NOTICE: attempting to install extension "repmgr"

NOTICE: "repmgr" extension successfully installed

NOTICE: primary node record (ID: 1) registered

[postgres@pg07 ~]$ repmgr cluster show

 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                   

----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------

 1  | pg07 | primary | * running |          | default  | 100      | 1        | host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2

 

备库注册:

[postgres@pg08 ~]$ repmgr -h 192.168.52.7 -U replicator -d postgres -f /etc/repmgr.conf standby clone --dry-run

NOTICE: destination directory "/postgresql/pgdata" provided

INFO: connecting to source node

DETAIL: connection string is: host=192.168.52.7 user=replicator dbname=postgres

DETAIL: current installation size is 31 MB

INFO: "repmgr" extension is installed in database "postgres"

WARNING: target data directory appears to be a PostgreSQL data directory

DETAIL: target data directory is "/postgresql/pgdata"

HINT: use -F/--force to overwrite the existing data directory

INFO: replication slot usage not requested;  no replication slot will be set up for this standby

INFO: parameter "max_wal_senders" set to 10

NOTICE: checking for available walsenders on the source node (2 required)

INFO: sufficient walsenders available on the source node

DETAIL: 2 required, 10 available

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: required number of replication connections could be made to the source server

DETAIL: 2 replication connections required

WARNING: data checksums are not enabled and "wal_log_hints" is "off"

DETAIL: pg_rewind requires "wal_log_hints" to be enabled

NOTICE: standby will attach to upstream node 1

HINT: consider using the -c/--fast-checkpoint option

INFO: all prerequisites for "standby clone" are met

[postgres@pg08 ~]$ repmgr -h 192.168.52.7 -U replicator -d postgres -f /etc/repmgr.conf standby clone

NOTICE: destination directory "/postgresql/pgdata" provided

INFO: connecting to source node

DETAIL: connection string is: host=192.168.52.7 user=replicator dbname=postgres

DETAIL: current installation size is 31 MB

ERROR: target data directory appears to be a PostgreSQL data directory

DETAIL: target data directory is "/postgresql/pgdata"

HINT: use -F/--force to overwrite the existing data directory

[postgres@pg08 ~]$ rm -rf /postgresql/pgdata/*

[postgres@pg08 ~]$repmgr -h 192.168.52.7 -U replicator -d postgres -f /etc/repmgr.conf standby clone

NOTICE: destination directory "/postgresql/pgdata" provided

INFO: connecting to source node

DETAIL: connection string is: host=192.168.52.7 user=replicator dbname=postgres

DETAIL: current installation size is 31 MB

INFO: replication slot usage not requested;  no replication slot will be set up for this standby

NOTICE: checking for available walsenders on the source node (2 required)

NOTICE: checking replication connections can be made to the source server (2 required)

WARNING: data checksums are not enabled and "wal_log_hints" is "off"

DETAIL: pg_rewind requires "wal_log_hints" to be enabled

INFO: checking and correcting permissions on existing directory "/postgresql/pgdata"

NOTICE: starting backup (using pg_basebackup)...

HINT: this may take some time; consider using the -c/--fast-checkpoint option

INFO: executing:

  pg_basebackup -l "repmgr base backup"  -D /postgresql/pgdata -h 192.168.52.7 -p 5432 -U replicator -X stream

WARNING:  skipping special file "./.s.PGSQL.5432"

WARNING:  skipping special file "./.s.PGSQL.5432"

NOTICE: standby clone (using pg_basebackup) complete

NOTICE: you can now start your PostgreSQL server

HINT: for example: pg_ctl -D /postgresql/pgdata start

HINT: after starting the server, you need to register this standby with "repmgr standby register"

[postgres@pg08 ~]$ pg_ctl -D /postgresql/pgdata start

waiting for server to start....2021-09-22 10:07:56.869 CST [2362] LOG:  redirecting log output to logging collector process

2021-09-22 10:07:56.869 CST [2362] HINT:  Future log output will appear in directory "pg_log".

 done

server started

[postgres@pg08 ~]$ repmgr standby register

INFO: connecting to local node "pg08" (ID: 2)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)

INFO: standby registration complete

NOTICE: standby node "pg08" (ID: 2) successfully registered

[postgres@pg08 ~]$ repmgr cluster show

 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                   

----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------

 1  | pg07 | primary | * running |          | default  | 100      | 1        | host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2

 2  | pg08 | standby |   running | pg07     | default  | 100      | 1        | host=192.168.52.8 user=replicator dbname=postgres connect_timeout=2

[postgres@pg08 ~]$ psql

psql (13.4)

Type "help" for help.

 

postgres=# select * from pg_stat_replication;

 pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn

| write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time

-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+--------------+-------+----------

+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+------------

(0 rows)

 

postgres=# \x

Expanded display is on.

+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+---

(0 rows)

 

postgres=# select * from pg_stat_wal_receiver;

-[ RECORD 1 ]---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

pid                   | 2368

status                | streaming

receive_start_lsn     | 0/8000000

receive_start_tli     | 1

written_lsn           | 0/8000570

flushed_lsn           | 0/8000570

received_tli          | 1

last_msg_send_time    | 2021-09-22 10:08:35.703193+08

last_msg_receipt_time | 2021-09-22 10:08:35.714032+08

latest_end_lsn        | 0/8000570

latest_end_time       | 2021-09-22 10:08:05.694295+08

slot_name             |

sender_host           | 192.168.52.7

sender_port           | 5432

conninfo              | user=replicator passfile=/home/postgres/.pgpass channel_binding=disable connect_timeout=2 dbname=replication host=192.168.52.7 port=5432 application_name=pg08 fallback_application_name=walreceiver sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any

3.主备库切换测试:

在备库做switchover

[postgres@pg08 ~]$ repmgr standby switchover

在备库做promote激活

主库无法恢复的情况下,在备库操作激活动作。

[postgres@pg08 ~]$ repmgr standby promote

NOTICE: promoting standby to primary

DETAIL: promoting server "pg08" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "pg08" (ID: 2) was successfully promoted to primary

[postgres@pg08 ~]$ repmgr cluster show

 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                    

----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------

 1  | pg07 | primary | - failed  | ?        | default  | 100      |          | host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2

 2  | pg08 | primary | * running |          | default  | 100      | 4        | host=192.168.52.8 user=replicator dbname=postgres connect_timeout=2

 

WARNING: following issues were detected

  - unable to connect to node "pg07" (ID: 1)

 

HINT: execute with --verbose option to see connection error messages

 

等主库恢复后,重新作为备库加入:

[postgres@pg07 ~]$ repmgr node rejoin -h192.168.52.8 -Upostgres -dpostgres --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose

 

INFO: looking for configuration file in /etc

INFO: configuration file found at: "/etc/repmgr.conf"

NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2

DETAIL: rejoin target server's timeline 4 forked off current database system timeline 3 before current recovery point 0/C000028

INFO: prerequisites for using pg_rewind are met

INFO: 2 files copied to "/tmp/repmgr-config-archive-pg07"

NOTICE: executing pg_rewind

DETAIL: pg_rewind command is "/postgresql/pg13/bin/pg_rewind -D '/postgresql/pgdata' --source-server='host=192.168.52.8 user=replicator dbname=postgres connect_timeout=2'"

NOTICE: 2 files copied to /postgresql/pgdata

INFO: directory "/tmp/repmgr-config-archive-pg07" deleted

NOTICE: setting node 1's upstream to node 2

WARNING: unable to ping "host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/postgresql/pg13/bin/pg_ctl  -w -D '/postgresql/pgdata' start"

INFO: node "pg07" (ID: 1) is pingable

INFO: node "pg07" (ID: 1) has attached to its upstream node

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

 

[postgres@pg07 ~]$ repmgr cluster show

 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                   

----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------

 1  | pg07 | standby |   running | pg08     | default  | 100      | 3        | host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2

 

或者clone恢复(全库恢复时间长,不推荐,rejion失败的情况下可采用该方法):

repmgr standby clone -h 192.168.52.8 -Upostgres

 

最后主备又可正常switchover了。

 

[postgres@pg07 ~]$ psql

psql (13.4)

Type "help" for help.

 

postgres=# \l

                                 List of databases

   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges  

-----------+----------+----------+------------+------------+-----------------------

 jyc       | postgres | UTF8     | en_US.utf8 | en_US.utf8 |

 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |

 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +

           |          |          |            |            | postgres=CTc/postgres

 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +

           |          |          |            |            | postgres=CTc/postgres

(4 rows)

 

postgres=# \dt

Did not find any relations.

postgres=# pwd

postgres-# \c postgres

You are now connected to database "postgres" as user "postgres".

postgres-# \dt

Did not find any relations.

postgres-# \dx

                    List of installed extensions

  Name   | Version |   Schema   |            Description            

---------+---------+------------+------------------------------------

 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language

 repmgr  | 5.2     | repmgr     | Replication manager for PostgreSQL

(2 rows)

 

postgres-# \du

                                    List of roles

 Role name  |                         Attributes                         | Member of

------------+------------------------------------------------------------+-----------

 postgres   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

 replicator | Superuser, Replication                                     | {}

 

postgres-# \dt+

Did not find any relations.

postgres-# \q

[postgres@pg07 ~]$ repmgr cluster show

 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                   

----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------------

 1  | pg07 | primary | * running |          | default  | 100      | 5        | host=192.168.52.7 user=replicator dbname=postgres connect_timeout=2

 2  | pg08 | standby |   running | pg07     | default  | 100      | 5        | host=192.168.52.8 user=replicator dbname=postgres connect_timeout=2

[postgres@pg07 ~]$

最后修改时间:2021-11-10 10:17:36
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

墨天轮福利君
暂无图片
3年前
评论
暂无图片 0
您好,您的文章已入选合格奖,10墨值奖励已经到账请查收! ❤️我们还会实时派发您的流量收益。
3年前
暂无图片 点赞
评论