暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Orchestrator:03 高可用方案ProxySQL篇

万能修实验室 2021-07-14
4638



Orchestrator是一个对MySQL复制提供高可用、拓扑、管理和可视化的工具,它能够主动发现当前拓扑结构和主从复制状态,也可以重构当前的拓扑结构,识别各种故障并自动恢复。
但它最难的还是发音:OK丝翠特儿


上周测试一把Orchestrator结合vip的组合,基本可以做到简单、透明、迅速的主从切换。但由于需要用SSH远程执行绑定IP操作,对操作系统具有一定的入侵性。
今天主要测试Orchestrator结合ProxySQL,两者都是最近开始流行的开源产品,而且设计的还都挺奇葩。

切换时ProxySQL的问题
切换一般分两类:主库故障导致的Failover或者提升一个Slave为新Master。 

第一种情况由于主库已无法提供服务,ProxySQL会检测到read_only变化切换流量到新主库上,

第二种情况Orchestrator会执行graceful-master-takeover,修改新的拓扑与read_only参数。但是由于原来的主节点切换为slave后并不会自动start slave,会导致ProxySQL查询该节点的数据有延迟。


ProxySQL如何识别主库
ProxySQL本身是无法知道MySQL集群拓扑结构的,但是它可以通过监控MySQL上的变量read_only,找到 read_only = off的服务器当做主库写入。


Orchestrator的作用

如果发生故障导致主从切换,Orch会自动改变拓扑结构,在切换时修改新master的read_only参数为off。因此可以在Orch修改拓扑时,通过提供的Hooks第一时间主动远程修改ProxySQL中mysql_servers表中记录当前数据库的状态,而不必等待ProxySQL检测。下文中会在以下2个Hooks中调用自定义脚本。

  PreGracefulTakeoverProcesses

    executed on planned, graceful master takeover, immediately before the master goes read-only.

  PostMasterFailoverProcesses

    executed at the end of a successful master recovery.

根据作者命名描述,第一个叫优雅切换,相比来说第二个Failover算是不雅?


实验内容分3部分:

  • 环境搭建

  • 手动切换测试

  • 故障切换测试


闪亮的灯球儿转起来,下面就测试下Orchestrator+ProxySQL,能否像郭德纲+于谦一样法力无边。(今天没有车)



没下班的小伙伴们还在等什么,举起双手马上按步骤自己做一套,周末会是如此美好。






1 环境搭建

# 安装ProxySQL:
1 下载最新版本:

    wget https://github.com/sysown/proxysql/releases/download/v2.0.6/proxysql-2.0.6-1-centos67.x86_64.rpm
    yum localinstall proxysql-2.0.6-1-centos67.x86_64.rpm -y


    2 启动:

      /etc/init.d/proxysql start


      或者

        service proxysql start


        3 管理员用户连接:

        (默认管理端口是6032,客户端服务端口是6033。默认的用户名密码都是 admin。)

          mysql -uadmin -padmin -h127.0.0.1 -P6032


          # 配置ProxySQL:
          1 后端DB 分2组:600写入组,601只读组

            INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.7.90.170',601,3306,1000,10);
            INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.7.90.171',601,3306,1000,10);
            INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.7.90.169',601,3306,1000,0);
            INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.7.90.169',600,3306,1000,0);


              查看状态,都是online即可

              (admin@127.0.0.1:6032)[(none)]select * from mysql_servers;
              +--------------+-------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
              | hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
              +--------------+-------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
              | 601 | 10.7.90.170 | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | |
              | 601          | 10.7.90.171 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 10                  | 0       | 0              |         |
              | 601 | 10.7.90.169 | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | |
              | 600          | 10.7.90.169 | 3306 | 0         | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              |         |
              +--------------+-------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
              4 rows in set (0.00 sec)


              2 配置主从分组信息

              (600写入组writer_hostgroup,601只读组reader_hostgroup):

                INSERT INTO mysql_replication_hostgroups VALUES (600,601,'read_only','proxy01');


                  (admin@127.0.0.1:6032)[(none)]select * from mysql_replication_hostgroups;
                  +------------------+------------------+------------+---------+
                  | writer_hostgroup | reader_hostgroup | check_type | comment |
                  +------------------+------------------+------------+---------+
                  | 600 | 601 | read_only | proxy01 |
                  +------------------+------------------+------------+---------+


                  3 在MySQL主库上,配置账号并授权

                    GRANT ALL ON *.* TO 'proxysql'@'%' IDENTIFIED BY 'proxysql';


                    4 在proxysql主机端配置管理用户,默认组600

                    (对应上面的用户proxysql)

                      INSERT INTO mysql_users(username,password,default_hostgroup) VALUES ('proxysql','proxysql',600);


                      (admin@127.0.0.1:6032)[(none)]select * from mysql_users;
                      +----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+---------+
                      | username | password | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections | comment |
                      +----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+---------+
                      | proxysql | proxysql | 1 | 0 | 600 | NULL | 0 | 1 | 0 | 1 | 1 | 10000 | |
                      +----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+---------+
                      1 row in set (0.00 sec)


                      5 在各个mysql上,添加监控的用户:

                        GRANT SELECT ON *.* TO 'monitor'@'%' IDENTIFIED BY 'monitor';
                        flush privileges;


                        6 在proxysql主机端配置监控用户(对应上面的monitor用户)

                          set mysql-monitor_username='monitor';
                          set mysql-monitor_password='monitor';


                          7 配置proxysql的转发规则

                            insert into mysql_query_rules(rule_id,active,match_digest,destination_hostgroup,apply)values(1,1,'^SELECT.*FOR UPDATE$',600,1);
                            insert into mysql_query_rules(rule_id,active,match_digest,destination_hostgroup,apply)values(2,1,'^SELECT',601,1);


                            (admin@127.0.0.1:6032)[(none)]select rule_id,active,match_digest,destination_hostgroup,apply from mysql_query_rules;
                            +---------+--------+----------------------+-----------------------+-------+
                            | rule_id | active | match_digest | destination_hostgroup | apply |
                            +---------+--------+----------------------+-----------------------+-------+
                            | 1 | 1 | ^SELECT.*FOR UPDATE$ | 600 | 1 |
                            | 2 | 1 | ^SELECT | 601 | 1 |
                            +---------+--------+----------------------+-----------------------+-------+
                            2 rows in set (0.00 sec)


                            8 检查当前连接:

                              (admin@127.0.0.1:6032)[(none)]select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
                              +-----------+-------------+----------+--------------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
                              | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
                              +-----------+-------------+----------+--------------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
                              | 600 | 10.7.90.170 | 3306 | OFFLINE_HARD | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 404 |
                              | 600       | 10.7.90.169 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 1           | 12      | 0                 | 168             | 44              | 844        |
                              | 601 | 10.7.90.171 | 3306 | ONLINE | 0 | 17 | 17 | 0 | 17 | 9593248 | 0 | 270863177 | 3029685816 | 832 |
                              | 601       | 10.7.90.170 | 3306     | ONLINE       | 0        | 18       | 18     | 0       | 18          | 9626628 | 0                 | 271768373       | 3027802009      | 404        |
                              | 601 | 10.7.90.169 | 3306 | ONLINE | 0 | 1 | 18 | 0 | 17 | 9419650 | 0 | 265805226 | 2928707968 | 844 |
                              +-----------+-------------+----------+--------------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
                              5 rows in set (0.00 sec)


                              9 更新配置

                                load mysql users to runtime;
                                load mysql servers to runtime;
                                load mysql query rules to runtime;
                                load mysql variables to runtime;
                                load admin variables to runtime;


                                save mysql users to disk;
                                save mysql servers to disk;
                                save mysql query rules to disk;
                                save mysql variables to disk;
                                save admin variables to disk;


                                10 测试读写分离是否生效

                                  [root@zabbix ~]# mysql -uproxysql -pproxysql -h 127.0.0.1 -P6033 -e "START TRANSACTION; SELECT @@hostname; ROLLBACK;"
                                  Warning: Using a password on the command line interface can be insecure.
                                  +------------+
                                  | @@hostname |
                                  +------------+
                                  | mysql1 |
                                  +------------+
                                  [root@zabbix ~]# mysql -uproxysql -pproxysql -h 127.0.0.1 -P6033 -e "SELECT @@hostname;"
                                  Warning: Using a password on the command line interface can be insecure.
                                  +------------+
                                  | @@hostname |
                                  +------------+
                                  | mysql1 |
                                  +------------+
                                  [root@zabbix ~]# mysql -uproxysql -pproxysql -h 127.0.0.1 -P6033 -e "SELECT @@hostname;"
                                  Warning: Using a password on the command line interface can be insecure.
                                  +------------+
                                  | @@hostname |
                                  +------------+
                                  | slave2 |
                                  +------------+
                                  [root@zabbix ~]# mysql -uproxysql -pproxysql -h 127.0.0.1 -P6033 -e "SELECT @@hostname;"
                                  Warning: Using a password on the command line interface can be insecure.
                                  +------------+
                                  | @@hostname |
                                  +------------+
                                  | slave1 |
                                  +------------+


                                  # 建立远程管理用户:

                                    ---------------------------- 添加新的管理用户 ----------------------------
                                    admin> select @@admin-admin_credentials; # 当前用户名和密码
                                    admin> set admin-admin_credentials='admin:admin;orchuser:Orchuser_2019';
                                    admin> load admin variables to runtime; # 使修改立即生效
                                    admin> save admin variables to disk; # 使修改永久保存到磁盘
                                    admin> select @@admin-admin_credentials;
                                    +------------------------------------+
                                    | @@admin-admin_credentials |
                                    +------------------------------------+
                                    | admin:admin;orchuser:Orchuser_2019 |
                                    +------------------------------------+


                                    # Orch配置:

                                    在配置文件中以下2个Hooks中加入调用ProxySQL脚本

                                      vi etc/orchestrator.conf.json


                                      "PreGracefulTakeoverProcesses": [
                                      "/usr/local/orchestrator/pre_proxy_failover.sh >> usr/local/orchestrator/proxysql_failover.log",
                                      "sudo -u orchestrator usr/local/orchestrator/pre_del_vip.sh {failedHost} >> usr/local/orchestrator/orc_vip_failover.log",
                                      "echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> tmp/recovery.log"
                                      ],


                                      "PostMasterFailoverProcesses": [
                                      "/usr/local/orchestrator/post_proxy_failover.sh >> usr/local/orchestrator/proxysql_failover.log",
                                      "sudo -u orchestrator usr/local/orchestrator/post_vip_failover.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> usr/local/orchestrator/orc_vip_failover.log"
                                        ],

                                      两个脚本功能,由于目前还在测试阶段就不放具体内容了: 

                                      • pre_proxy_failover.sh   

                                        --手动切换前调用,修改当前主库的状态为OFFLINE_SOFT

                                      • post_proxy_failover.sh  

                                        --切换完成后调用,删除离线的原主库,添加新Master

                                      2 手动切换测试


                                      # 当前的拓扑结构:169主库


                                      # ProxySQL上的读写分组:主库169是读写组,其他都是只读组



                                      # 开始压测,类型读写混合


                                      # 可以看到3个节点的QPS和执行命令情况


                                      # 手动切换,提升170 为主库


                                      # 切换成功,主从无延迟和异常


                                      # 切换期间迅速观察ProxySQL,数据有3次变化


                                      1.   Orch切换前,先把当前主库169状态修改为 OFFLINE_SOFT



                                          2.  Orch切换后,删除原主库的读写主机169记录:

                                        DELETE FROM mysql_servers WHERE hostgroup_id=600 AND hostname="10.7.90.169"

                                            

                                           3.  插入新读写主机170的数据。由于ProxySQL检测主从有延迟,还要再将原主库的只读组数据设置为ONLINE

                                          INSERT INTO mysql_servers(hostgroup_id,hostname,port,status) values (600, "10.7.90.170", 3306, "ONLINE");
                                          UPDATE mysql_servers SET status="ONLINE" WHERE hostgroup_id=601 AND hostname="10.7.90.169"



                                          # 以上操作在日志里都有记录:

                                          先调用PreGracefulTakeoverProcesses中的脚本

                                          切换后调用PostMasterFailoverProcesses中的脚本


                                          3 故障切换测试


                                          # 当前的拓扑结构:170主库


                                          # ProxySQL上的读写分组:主库170是读写组,其他都是只读组


                                          # 开始压测,类型读写混合


                                          # 可以看到3个节点的QPS和执行命令情况


                                          # 直接在170上杀掉MySQL进程


                                          # 可以看到,171已经自动提升为新主库,而170被踢出复制


                                          # 启动170的MySQL,重新CHANGE MASTER加入新拓扑中


                                            [root@localhost][(none)]> CHANGE MASTER TO MASTER_HOST='10.7.90.171', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='*****';
                                            Query OK, 0 rows affected, 2 warnings (0.08 sec)


                                            [root@localhost][(none)]> start slave;
                                            Query OK, 0 rows affected (0.07 sec)


                                            [root@localhost][(none)]> show slave status\G
                                            *************************** 1. row ***************************
                                            Slave_IO_State: Waiting for master to send event
                                            Master_Host: 10.7.90.171
                                            Master_User: repl
                                            Master_Port: 3306
                                            Connect_Retry: 60
                                            Master_Log_File: binlog.000011
                                            Read_Master_Log_Pos: 770543986
                                            Relay_Log_File: relay.000002
                                            Relay_Log_Pos: 405
                                            Relay_Master_Log_File: binlog.000011
                                            Slave_IO_Running: Yes
                                            Slave_SQL_Running: Yes
                                            .
                                                         略
                                            .
                                            Retrieved_Gtid_Set:
                                            Executed_Gtid_Set: 08999140-b588-11e7-b7ed-005056b45d5a:1-659715,
                                            0b82f00b-74fc-11e8-985c-005056827b12:1-440569,
                                            52650eac-dac7-11e7-aacf-005056827f22:1-90849
                                            Auto_Position: 1
                                            Replicate_Rewrite_DB:
                                            Channel_Name:
                                            Master_TLS_Version:
                                            1 row in set (0.00 sec)


                                            # 当前拓扑已恢复为1主2从:


                                            # ProxySQL的分组:171 读写,其他只读


                                            # 操作日志记录,只调用了PostMasterFailoverProcesses:


                                            # 再次压测,读写都集中在当前主库171上




                                            以上的测试内容都是比较理想、简单的场景,而在实际业务环境中会复杂很多,各种意外状况都可能发生。感谢Orch提供各种Hooks,让我们可以通过定制化脚本不断完善它,最终进化成符合自己业务的高可用方案。


                                            到站 收工



                                            文章转载自万能修实验室,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                            评论