暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Mysql8官方分布式数据库MGR最佳实践,竟遇大坑

MySQL数据库运维技术栈 2021-05-06
2875

MGR简介

MGR是mysql Group Replication简称,中文名称是Mysql组复制,MGR是MySQL官方于2016年12月推出的一个全新的高可用与高扩展的解决方案,提供了高可用、高扩展、高可靠的MySQL集群服务,目前只支持MYSQL5.7和mysql8.0版本。

MGR优点

  1. 高一致性:基于原生复制和paxos协议的组复制技术。

  2. 高容错性:有自动检测机制,当出现宕机后,会自动剔除问题节点,在2N+1个节点集群中,集群 只要N +1个节点还存活着,数据库就能稳定的对外提供服务。

  3. 高扩展性:可以在线增加和移除节点。

  4. 高灵活性:可以在单主模式和多主模式自由切换。



MGR缺点

MGR技术比较新,稳定性方面还需要业界验证。



Mysql8分布式数据库MGR最佳实践

环境准备

3台虚拟机搭建3节点的MGR集群,Mysql数据库用的8.0.21版本。

IP客户端访问端口MGR成员通信端口server_idOS版本
192.168.112.1313309100611Centos 7.6
192.168.112.1323309100612Centos 7.6
192.168.112.1353309100613Centos 7.6

创建目录

    mkdir -p data/mysql/mysql8/{data,log,conf,run}/3309
    mkdir -p data/mysql/mysql8/log/3309/{redo,undo,relay}
    chown -R mysql:mysql data/mysql/mysql8/mysql8
    chmod 750 data/mysql/mysql8/{data,log,conf,run}
    复制

    复制

    配置参数

      [mysqld]
      ##basic settings###
      server-id=1
      port = 3309
      character-set-server = utf8mb4
      collation-server = utf8mb4_unicode_ci
      skip-name-resolve
      max_connections=1000
      max_user_connections=800
      max_allowed_packet=512M
      max_connect_errors=100000
      datadir = data/mysql/mysql8/data/3309
      socket = data/mysql/mysql8/run/3309/mysql.sock
      pid_file = data/mysql/mysql8/run/3309/mysql.pid
      transaction_isolation = READ-COMMITTED
      lower_case_table_names=1
      default_time_zone =+8:00
      open_files_limit=65535
      log_timestamps=system
      wait_timeout=900
      interactive_timeout=900


      ##innodb setting##
      innodb_buffer_pool_size = 256M
      innodb_buffer_pool_instances = 1
      innodb_io_capacity=2000
      innodb_flush_method=O_DIRECT


      innodb_log_group_home_dir = data/mysql/mysql8/log/3309/redo
      innodb_log_file_size = 128M
      innodb_log_files_in_group=4
      innodb_log_buffer_size = 32M


      innodb_undo_directory = data/mysql/mysql8/log/3309/undo
      innodb_undo_tablespaces = 4
      innodb_undo_log_truncate=1
      innodb_max_undo_log_size=1G


      innodb_flush_neighbors=0
      innodb_flush_log_at_trx_commit = 1
      innodb_print_all_deadlocks = 1
      innodb_online_alter_log_max_size=128M


      innodb_lock_wait_timeout=10
      innodb_file_per_table=ON
      innodb_doublewrite=ON


      ##log settings##
      log-error = data/mysql/mysql8/log/3309/error.log
      log-bin = data/mysql/mysql8/log/3309/mysql_bin.log
      slow_query_log = 1
      slow_query_log_file = data/mysql/mysql8/log/3309/mysql_slow_query.log
      long_query_time = 10


      ##replication settings##
      gtid-mode=on
      enforce-gtid-consistency=true
      master-info-repository=TABLE
      relay-log-info-repository=TABLE
      sync-master-info=1000
      relay_log_recovery = 1
      relay-log=/data/mysql/mysql8/log/3309/relay/mysql-relay-bin


      #binlog
      log_bin=/data/mysql/mysql8/log/3309/binlog
      expire_logs_days=10
      max_binlog_cache_size=1024M
      sync_binlog=1


      ##MGR settings
      binlog_checksum = NONE
      log_slave_updates = ON
      binlog_format=row
      #transaction_write_set_extraction ='XXHASH64'
      #loose-group_replication_group_name = '38f34157-cbe8-4623-a7bd-054cc5c2de0b'
      #loose-group_replication_start_on_boot = off
      #loose-group_replication_local_address = '192.168.112.131:10061'
      #loose-group_replication_group_seeds ='192.168.112.131:10061,192.168.112.132:10061,192.168.112.135:10061'
      #loose-group_replication_bootstrap_group = off
      #loose-group_replication_ip_whitelist = '192.168.112.131/24,192.168.112.132/24,192.168.112.135/24'


      [client]
      port = 3309
      socket = data/mysql/mysql8/run/3309/mysql.sock
      复制

      复制

      在这里需要注意的是地方,由于MGR的插件,mysql默认是没有安装的,所以在这里关于MGR的配置参数,都注释掉了,等数据库实例启动之后,再开启。

      每个节点的server-id需要设置成不一样。


      初始化数据库

        /data/mysql-8.0.21/bin/mysqld --defaults-file=/data/mysql/mysql8/conf/3309/my.cnf --initialize --basedir=/data/mysql-8.0.21/ --datadir=/data/mysql/mysql8/data/3309 --user=mysql --initialize-insecure --ssl --explicit_defaults_for_timestamp


        如果要启用SSL安全连接,执行如下命令
        /data/mysql-8.0.21/bin/mysql_ssl_rsa_setup --basedir=/data/mysql-8.0.21 --datadir=/data/mysql/mysql8/data/3309
        复制

        复制

        启停mysql实例

        复制
          /data/mysql-8.0.21/bin/mysqld_safe --defaults-file=/data/mysql/mysql8/conf/3309/my.cnf &


          /data/mysql-8.0.21/bin/mysqladmin -uroot --socket=/data/mysql/mysql8/run/3309/mysql.sock shutdown &
          复制


          登录实例

          复制
            /data/mysql-8.0.21/bin/mysql -uroot --socket=/data/mysql/mysql8/run/3309/mysql.sock
            复制

            MGR插件安装

              [root@localhost] 14:08:44 [(none)]>install plugin group_replication soname 'group_replication.so';
              Query OK, 0 rows affected (0.20 sec)


              [root@localhost] 14:09:06 [(none)]>show plugins;
              +---------------------------------+----------+--------------------+----------------------+---------+
              | Name | Status | Type | Library | License |
              +---------------------------------+----------+--------------------+----------------------+---------+
              | binlog | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | mysql_native_password | ACTIVE | AUTHENTICATION | NULL | GPL |
              | sha256_password | ACTIVE | AUTHENTICATION | NULL | GPL |
              | caching_sha2_password | ACTIVE | AUTHENTICATION | NULL | GPL |
              | sha2_cache_cleaner | ACTIVE | AUDIT | NULL | GPL |
              | CSV | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | MEMORY | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | InnoDB | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | INNODB_TRX | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_CMP | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_CMP_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_CMPMEM | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_CMP_PER_INDEX | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_CMP_PER_INDEX_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_BUFFER_PAGE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_BUFFER_PAGE_LRU | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_BUFFER_POOL_STATS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_TEMP_TABLE_INFO | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_METRICS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_FT_DEFAULT_STOPWORD | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_FT_DELETED | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_FT_BEING_DELETED | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_FT_CONFIG | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_FT_INDEX_CACHE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_FT_INDEX_TABLE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_TABLES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_TABLESTATS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_INDEXES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_TABLESPACES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_COLUMNS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_VIRTUAL | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_CACHED_INDEXES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | INNODB_SESSION_TEMP_TABLESPACES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
              | MyISAM | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | MRG_MYISAM | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | PERFORMANCE_SCHEMA | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | TempTable | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | ARCHIVE | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | BLACKHOLE | ACTIVE | STORAGE ENGINE | NULL | GPL |
              | FEDERATED | DISABLED | STORAGE ENGINE | NULL | GPL |
              | ngram | ACTIVE | FTPARSER | NULL | GPL |
              | mysqlx_cache_cleaner | ACTIVE | AUDIT | NULL | GPL |
              | mysqlx | ACTIVE | DAEMON | NULL | GPL |
              | group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |
              +---------------------------------+----------+--------------------+----------------------+---------+
              45 rows in set (0.00 sec)
              复制



              复制

              安装好插件之后,将MGR的参数注释去掉,重启mysql实例。

              设置MGR单主模式

              在所有节点上执行以下命令,创建复制用户

                set sql_log_bin=0;
                create user 'repl'@'%' identified with mysql_native_password by 'repl';
                grant replication slave,replication client on *.* to 'repl'@'%';
                flush privileges;
                set sql_log_bin=1;
                复制

                复制

                在192.168.112.131服务上设置主节点

                  set global group_replication_single_primary_mode=on;
                  set global group_replication_bootstrap_group=ON;
                  CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
                  start group_replication;
                  set global group_replication_bootstrap_group=OFF;
                  复制

                  复制

                  设置好之后,可以查看MGR复制成员状态

                    [root@localhost] 14:44:39 [(none)]>select * from performance_schema.replication_group_members;
                    +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
                    | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
                    +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
                    | group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | mysql | 3309 | ONLINE | PRIMARY | 8.0.21 |
                    +---------------------------+--------------------------------------+----
                    复制
                    ---------+-------------+--------------+-------------+----------------+


                    复制

                    设置好主之后,就开始设置剩下2个从节点了,执行命令都是一样的。

                      CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
                      start group_replication;
                      复制

                      复制

                      到这里,遇到了大坑,来看看

                        [root@localhost] 14:44:40 [(none)]>select * from performance_schema.replication_group_members;
                        +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
                        | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
                        +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
                        | group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | mysql | 3309 | RECOVERING | SECONDARY | 8.0.21 |
                        | group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | mysql | 3309 | RECOVERING | SECONDARY | 8.0.21 |
                        | group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | mysql | 3309 | ONLINE | PRIMARY | 8.0.21 |
                        +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
                        3 rows in set (0.00 sec)
                        复制

                        复制

                        2个从节点的状态一直是RECOVERING状态,正常时ONLINE状态的。这个状态持续了有10几分钟,看mysql的错误日志看到以下错误信息

                        复制
                          2020-09-15T14:24:57.555493+08:00 19 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'repl@mysql:3309' - retry-time: 60 retries: 1 message: Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection. Error_code: MY-002061
                          2020-09-15T14:24:57.557460+08:00 18 [ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
                          2020-09-15T14:24:57.557541+08:00 18 [ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
                          2020-09-15T14:25:57.284156+08:00 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
                          复制


                          重点内容在error connecting to master 'repl@mysql:3309',无法连接到master,即网络不通,在这里MGR竟然用的主机名来和集群成员进行通信,关键是我的主机名都是一样的,因为虚拟级时克隆出来的。


                          在这里有2个解决方案
                          方案一
                          修改主机名称,并且在MGR每个集群节点上的/etc/hosts中,添加主机和IP的对应关系

                          方案二
                          让MGR集群节点用IP地址进行通信,可以通过配置以下2个参数

                            report_host=192.168.112.135
                            report_port=3309
                            复制

                            复制

                            我用的是方案二,在my.cnf参数文件中分别添加以下内容

                              在192.168.112.131服务器上
                              report_host=192.168.112.131
                              report_port=3309


                              在192.168.112.132服务器上
                              report_host=192.168.112.132
                              report_port=3309


                              在192.168.112.135服务器上
                              report_host=192.168.112.135
                              report_port=3309
                              复制

                              复制

                              添加完成之后,就重启Mysql实例,重新启动集群。

                              设置主节点

                                set global group_replication_bootstrap_group=ON;
                                start group_replication;
                                set global group_replication_bootstrap_group=OFF;
                                复制

                                复制

                                设置从节点

                                复制
                                  start group_replication;
                                  复制

                                  查看集群状态

                                    [root@localhost] 15:02:23 [(none)]>select * from performance_schema.replication_group_members;
                                    +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                    | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
                                    +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                    | group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | SECONDARY | 8.0.21 |
                                    | group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | SECONDARY | 8.0.21 |
                                    | group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |
                                    +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                    3 rows in set (0.00 sec)
                                    复制



                                    复制


                                    这次集群节点状态都是online了。

                                    单主切换到多主模式

                                    在所有节点执行

                                      stop group_replication;
                                      set global group_replication_single_primary_mode=OFF;
                                      set global group_replication_enforce_update_everywhere_checks=ON;
                                      复制

                                      复制

                                      选择任意节点执行

                                        SET GLOBAL group_replication_bootstrap_group=ON;
                                        START GROUP_REPLICATION;
                                        SET GLOBAL group_replication_bootstrap_group=OFF;
                                        复制

                                        复制

                                        剩余节点执行

                                          START GROUP_REPLICATION;
                                          复制

                                          复制

                                          查看集群状态

                                            [root@localhost] 17:45:57 [db1]>select * from performance_schema.replication_group_members;
                                            +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                            | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
                                            +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                            | group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | PRIMARY | 8.0.21 |
                                            | group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | PRIMARY | 8.0.21 |
                                            | group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |
                                            +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                            3 rows in set (0.01 sec)
                                            复制

                                            复制

                                            多主切换到多单模式

                                            在所有节点执行

                                              stop group_replication;
                                              set global group_replication_enforce_update_everywhere_checks=off;
                                              复制

                                              复制

                                              选择主节点数据库执行

                                                set global group_replication_single_primary_mode=on;
                                                set global group_replication_bootstrap_group=ON;
                                                start group_replication;
                                                set global group_replication_bootstrap_group=OFF;
                                                复制

                                                复制

                                                剩余节点执行

                                                  set global group_replication_single_primary_mode=on;
                                                  START GROUP_REPLICATION;
                                                  复制

                                                  复制

                                                  查看集群状态

                                                    [root@localhost] 17:51:12 [db1]>select * from performance_schema.replication_group_members;
                                                    +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                                    | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
                                                    +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                                    | group_replication_applier | 188f118c-f71b-11ea-899b-000c29387845 | 192.168.112.135 | 3309 | ONLINE | SECONDARY | 8.0.21 |
                                                    | group_replication_applier | 577b5cdb-f71a-11ea-9f03-000c29231183 | 192.168.112.132 | 3309 | ONLINE | SECONDARY | 8.0.21 |
                                                    | group_replication_applier | c1a60f79-f719-11ea-97db-000c29cc2388 | 192.168.112.131 | 3309 | ONLINE | PRIMARY | 8.0.21 |
                                                    +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
                                                    3 rows in set (0.00 sec)
                                                    复制


                                                    文章转载自MySQL数据库运维技术栈,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                                    评论