暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

基于repmgr的postgresql主备高可用方案

数据库架构 2019-09-20
509

本文比较基础,主要介绍postgresql开源高可用工具repmgr的部署和使用,初学者可以根据本文步骤一步一步做下去,废话不多说,直接进入主题,本文以两台机器为例。

 

1.两台机器分别编译安装postgresql,步骤略。

 

2.主库配置

    vi postgresql.conf
    wal_log_hints=on
    archive_mode=on
    archive_command=’test ! -f pgarch/%f && cp %p pgarch/%f’

    创建管理用户和库

      createuser -s repmgr
      createdb repmgr -O repmgr

      vi pg_hba.conf
      local replication repmgr trust
      host replication repmgr 127.0.0.1/32 trust
      host replication repmgr 192.168.1.1/32 trust
      host replication repmgr 192.168.1.2/32 trust
      local repmgr repmgr trust
      host repmgr repmgr 127.0.0.1/32 trust
      host repmgr repmgr 192.168.1.1/32 trust
      host repmgr repmgr 192.168.1.2/32 trust

      备库连接测试

        psql 'host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2'

        主库创建/etc/repmgr.conf

          node_id=1
          node_name=node1
          conninfo='host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2'
          data_directory='/pgdata'

          注册主库

            repmgr -f etc/repmgr.conf primary register

            查看:

              repmgr -f /etc/repmgr.conf cluster show
              SELECT * FROM repmgr.nodes;

               

              3.备库配置

              创建/etc/repmgr.conf

                node_id=2
                node_name=node2
                conninfo='host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2'
                data_directory='/pgdata'

                克隆备库,内部使用的是pg_basebackup来进行克隆,并且会自动创建recovery.conf文件

                  repmgr -h 192.168.1.1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run
                  repmgr -h 192.168.1.1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone

                  启动并注册备库

                    repmgr -f /etc/repmgr.conf standby register

                    查看集群状态

                      repmgr -f /etc/repmgr.conf cluster show
                      [postgres@node2 pgdata]$ repmgr cluster show
                      ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
                      ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------
                      1 | node1 | primary | * running | | default | 100 | 3 | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2
                      2 | node2 | standby | running | node1 | default | 100 | 3 | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2

                      查看复制状态

                        select * from pg_stat_replication;
                        select * from pg_stat_wal_receiver;

                        切换测试,在备库操作,注意,切换操作需要配置主机间互信。

                          [postgres@DB2 .ssh]$ repmgr standby switchover
                          NOTICE: executing switchover on node "node2" (ID: 2)
                          NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
                          NOTICE: stopping current primary node "node1" (ID: 1)
                          NOTICE: issuing CHECKPOINT
                          DETAIL: executing server command "pg_ctl -D '/pgdata' -W -m fast stop"
                          INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
                          INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
                          INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
                          NOTICE: current primary has been cleanly shut down at location 0/14000028
                          NOTICE: promoting standby to primary
                          DETAIL: promoting server "node2" (ID: 2) using "pg_ctl -w -D '/pgdata' promote"
                          waiting for server to promote.... done
                          server promoted
                          NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
                          NOTICE: STANDBY PROMOTE successful
                          DETAIL: server "node2" (ID: 2) was successfully promoted to primary
                          INFO: local node 1 can attach to rejoin target node 2
                          DETAIL: local node's recovery point: 0/14000028; rejoin target node's fork point: 0/14000098
                          NOTICE: setting node 1's upstream to node 2
                          WARNING: unable to ping "host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2"
                          DETAIL: PQping() returned "PQPING_NO_RESPONSE"
                          NOTICE: starting server using "pg_ctl -w -D '/pgdata' start"
                          NOTICE: NODE REJOIN successful
                          DETAIL: node 1 is now attached to node 2
                          NOTICE: switchover was successful
                          DETAIL: node "node2" is now primary and node "node1" is attached as standby
                          NOTICE: STANDBY SWITCHOVER has completed successfully

                          此时查看状态,已经切换完成:

                            [postgres@GCCX4TMP .ssh]$ repmgr cluster show
                            ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
                            ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------
                            1 | node1 | standby | running | node2 | default | 100 | 3 | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2
                            2 | node2 | primary | * running | | default | 100 | 4 | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2


                            看上面的切换日志其实可以看到切换的一个流程:

                            ①停止主库

                            ②备库promte为主库

                            ③原主库执行rejoin操作:repmgr node rejoin -d 'host=192.168.1.1 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose

                             

                            这里说下repmgr node rejoin操作,执行该命令之前先删除recovery.conf文件,并且要求数据库之前是干净的关闭,达到一个一致性状态。然后repmgr会检查数据库能否加入,如果不能的话就会使用pg_rewind进行恢复操作,至于pg_rewind的原理和用法见我上一篇文章。

                             

                            后面的文章我们再介绍一下如何通过repmgrd实现auto failover

                             

                            好吧,加油吧,扫码关注公众号更多干货等着你。



                            文章转载自数据库架构,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                            评论