
Orchestrator是一个对MySQL复制提供高可用、拓扑、管理和可视化的工具,它能够主动发现当前拓扑结构和主从复制状态,也可以重构当前的拓扑结构,识别各种故障并自动恢复。
但它最难的还是发音:OK丝翠特儿
今天主要测试Orchestrator结合ProxySQL,两者都是最近开始流行的开源产品,而且设计的还都挺奇葩。
切换时ProxySQL的问题
切换一般分两类:主库故障导致的Failover或者提升一个Slave为新Master。
第一种情况由于主库已无法提供服务,ProxySQL会检测到read_only变化切换流量到新主库上,
第二种情况Orchestrator会执行graceful-master-takeover,修改新的拓扑与read_only参数。但是由于原来的主节点切换为slave后并不会自动start slave,会导致ProxySQL查询该节点的数据有延迟。
ProxySQL如何识别主库
ProxySQL本身是无法知道MySQL集群拓扑结构的,但是它可以通过监控MySQL上的变量read_only,找到 read_only = off的服务器当做主库写入。
Orchestrator的作用
如果发生故障导致主从切换,Orch会自动改变拓扑结构,在切换时修改新master的read_only参数为off。因此可以在Orch修改拓扑时,通过提供的Hooks第一时间主动远程修改ProxySQL中mysql_servers表中记录当前数据库的状态,而不必等待ProxySQL检测。下文中会在以下2个Hooks中调用自定义脚本。
PreGracefulTakeoverProcesses
executed on planned, graceful master takeover, immediately before the master goes read-only.
PostMasterFailoverProcesses
executed at the end of a successful master recovery.
根据作者命名描述,第一个叫优雅切换,相比来说第二个Failover算是不雅?
实验内容分3部分:
环境搭建
手动切换测试
故障切换测试
闪亮的灯球儿转起来,下面就测试下Orchestrator+ProxySQL,能否像郭德纲+于谦一样法力无边。(今天没有车)
没下班的小伙伴们还在等什么,举起双手马上按步骤自己做一套,周末会是如此美好。


1 环境搭建
# 安装ProxySQL:
1 下载最新版本:
wget https://github.com/sysown/proxysql/releases/download/v2.0.6/proxysql-2.0.6-1-centos67.x86_64.rpmyum localinstall proxysql-2.0.6-1-centos67.x86_64.rpm -y
2 启动:
/etc/init.d/proxysql start
或者
service proxysql start
3 管理员用户连接:
(默认管理端口是6032,客户端服务端口是6033。默认的用户名密码都是 admin。)
mysql -uadmin -padmin -h127.0.0.1 -P6032
# 配置ProxySQL:
1 后端DB 分2组:600写入组,601只读组
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.7.90.170',601,3306,1000,10);INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.7.90.171',601,3306,1000,10);INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.7.90.169',601,3306,1000,0);INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('10.7.90.169',600,3306,1000,0);
查看状态,都是online即可
(admin@127.0.0.1:6032)[(none)]select * from mysql_servers;+--------------+-------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |+--------------+-------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+| 601 | 10.7.90.170 | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | || 601 | 10.7.90.171 | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | || 601 | 10.7.90.169 | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | || 600 | 10.7.90.169 | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | |+--------------+-------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+4 rows in set (0.00 sec)
2 配置主从分组信息
(600写入组writer_hostgroup,601只读组reader_hostgroup):
INSERT INTO mysql_replication_hostgroups VALUES (600,601,'read_only','proxy01');
(admin@127.0.0.1:6032)[(none)]select * from mysql_replication_hostgroups;+------------------+------------------+------------+---------+| writer_hostgroup | reader_hostgroup | check_type | comment |+------------------+------------------+------------+---------+| 600 | 601 | read_only | proxy01 |+------------------+------------------+------------+---------+
3 在MySQL主库上,配置账号并授权
GRANT ALL ON *.* TO 'proxysql'@'%' IDENTIFIED BY 'proxysql';
4 在proxysql主机端配置管理用户,默认组600
(对应上面的用户proxysql)
INSERT INTO mysql_users(username,password,default_hostgroup) VALUES ('proxysql','proxysql',600);(admin@127.0.0.1:6032)[(none)]select * from mysql_users;+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+---------+| username | password | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections | comment |+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+---------+| proxysql | proxysql | 1 | 0 | 600 | NULL | 0 | 1 | 0 | 1 | 1 | 10000 | |+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+---------+1 row in set (0.00 sec)
5 在各个mysql上,添加监控的用户:
GRANT SELECT ON *.* TO 'monitor'@'%' IDENTIFIED BY 'monitor';flush privileges;
6 在proxysql主机端配置监控用户(对应上面的monitor用户)
set mysql-monitor_username='monitor';set mysql-monitor_password='monitor';
7 配置proxysql的转发规则
insert into mysql_query_rules(rule_id,active,match_digest,destination_hostgroup,apply)values(1,1,'^SELECT.*FOR UPDATE$',600,1);insert into mysql_query_rules(rule_id,active,match_digest,destination_hostgroup,apply)values(2,1,'^SELECT',601,1);(admin@127.0.0.1:6032)[(none)]select rule_id,active,match_digest,destination_hostgroup,apply from mysql_query_rules;+---------+--------+----------------------+-----------------------+-------+| rule_id | active | match_digest | destination_hostgroup | apply |+---------+--------+----------------------+-----------------------+-------+| 1 | 1 | ^SELECT.*FOR UPDATE$ | 600 | 1 || 2 | 1 | ^SELECT | 601 | 1 |+---------+--------+----------------------+-----------------------+-------+2 rows in set (0.00 sec)
8 检查当前连接:
(admin@127.0.0.1:6032)[(none)]select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;+-----------+-------------+----------+--------------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+| hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |+-----------+-------------+----------+--------------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+| 600 | 10.7.90.170 | 3306 | OFFLINE_HARD | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 404 || 600 | 10.7.90.169 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1 | 12 | 0 | 168 | 44 | 844 || 601 | 10.7.90.171 | 3306 | ONLINE | 0 | 17 | 17 | 0 | 17 | 9593248 | 0 | 270863177 | 3029685816 | 832 || 601 | 10.7.90.170 | 3306 | ONLINE | 0 | 18 | 18 | 0 | 18 | 9626628 | 0 | 271768373 | 3027802009 | 404 || 601 | 10.7.90.169 | 3306 | ONLINE | 0 | 1 | 18 | 0 | 17 | 9419650 | 0 | 265805226 | 2928707968 | 844 |+-----------+-------------+----------+--------------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+5 rows in set (0.00 sec)
9 更新配置
load mysql users to runtime;load mysql servers to runtime;load mysql query rules to runtime;load mysql variables to runtime;load admin variables to runtime;save mysql users to disk;save mysql servers to disk;save mysql query rules to disk;save mysql variables to disk;save admin variables to disk;
10 测试读写分离是否生效
[root@zabbix ~]# mysql -uproxysql -pproxysql -h 127.0.0.1 -P6033 -e "START TRANSACTION; SELECT @@hostname; ROLLBACK;"Warning: Using a password on the command line interface can be insecure.+------------+| @@hostname |+------------+| mysql1 |+------------+[root@zabbix ~]# mysql -uproxysql -pproxysql -h 127.0.0.1 -P6033 -e "SELECT @@hostname;"Warning: Using a password on the command line interface can be insecure.+------------+| @@hostname |+------------+| mysql1 |+------------+[root@zabbix ~]# mysql -uproxysql -pproxysql -h 127.0.0.1 -P6033 -e "SELECT @@hostname;"Warning: Using a password on the command line interface can be insecure.+------------+| @@hostname |+------------+| slave2 |+------------+[root@zabbix ~]# mysql -uproxysql -pproxysql -h 127.0.0.1 -P6033 -e "SELECT @@hostname;"Warning: Using a password on the command line interface can be insecure.+------------+| @@hostname |+------------+| slave1 |+------------+
# 建立远程管理用户:
---------------------------- 添加新的管理用户 ----------------------------admin> select @@admin-admin_credentials; # 当前用户名和密码admin> set admin-admin_credentials='admin:admin;orchuser:Orchuser_2019';admin> load admin variables to runtime; # 使修改立即生效admin> save admin variables to disk; # 使修改永久保存到磁盘admin> select @@admin-admin_credentials;+------------------------------------+| @@admin-admin_credentials |+------------------------------------+| admin:admin;orchuser:Orchuser_2019 |+------------------------------------+
# Orch配置:
在配置文件中以下2个Hooks中加入调用ProxySQL脚本
vi etc/orchestrator.conf.json"PreGracefulTakeoverProcesses": ["/usr/local/orchestrator/pre_proxy_failover.sh >> usr/local/orchestrator/proxysql_failover.log","sudo -u orchestrator usr/local/orchestrator/pre_del_vip.sh {failedHost} >> usr/local/orchestrator/orc_vip_failover.log","echo 'Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> tmp/recovery.log"],"PostMasterFailoverProcesses": ["/usr/local/orchestrator/post_proxy_failover.sh >> usr/local/orchestrator/proxysql_failover.log","sudo -u orchestrator usr/local/orchestrator/post_vip_failover.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> usr/local/orchestrator/orc_vip_failover.log"],
两个脚本功能,由于目前还在测试阶段就不放具体内容了:
pre_proxy_failover.sh
--手动切换前调用,修改当前主库的状态为OFFLINE_SOFT
post_proxy_failover.sh
--切换完成后调用,删除离线的原主库,添加新Master
2 手动切换测试
# 当前的拓扑结构:169主库

# ProxySQL上的读写分组:主库169是读写组,其他都是只读组

# 开始压测,类型读写混合

# 可以看到3个节点的QPS和执行命令情况

# 手动切换,提升170 为主库

# 切换成功,主从无延迟和异常

# 切换期间迅速观察ProxySQL,数据有3次变化
Orch切换前,先把当前主库169状态修改为 OFFLINE_SOFT

2. Orch切换后,删除原主库的读写主机169记录:
DELETE FROM mysql_servers WHERE hostgroup_id=600 AND hostname="10.7.90.169"

3. 插入新读写主机170的数据。由于ProxySQL检测主从有延迟,还要再将原主库的只读组数据设置为ONLINE
INSERT INTO mysql_servers(hostgroup_id,hostname,port,status) values (600, "10.7.90.170", 3306, "ONLINE");UPDATE mysql_servers SET status="ONLINE" WHERE hostgroup_id=601 AND hostname="10.7.90.169"

# 以上操作在日志里都有记录:
先调用PreGracefulTakeoverProcesses中的脚本
切换后调用PostMasterFailoverProcesses中的脚本

3 故障切换测试
# 当前的拓扑结构:170主库

# ProxySQL上的读写分组:主库170是读写组,其他都是只读组

# 开始压测,类型读写混合

# 可以看到3个节点的QPS和执行命令情况

# 直接在170上杀掉MySQL进程

# 可以看到,171已经自动提升为新主库,而170被踢出复制

# 启动170的MySQL,重新CHANGE MASTER加入新拓扑中

[root@localhost][(none)]> CHANGE MASTER TO MASTER_HOST='10.7.90.171', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='*****';Query OK, 0 rows affected, 2 warnings (0.08 sec)[root@localhost][(none)]> start slave;Query OK, 0 rows affected (0.07 sec)[root@localhost][(none)]> show slave status\G*************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: 10.7.90.171Master_User: replMaster_Port: 3306Connect_Retry: 60Master_Log_File: binlog.000011Read_Master_Log_Pos: 770543986Relay_Log_File: relay.000002Relay_Log_Pos: 405Relay_Master_Log_File: binlog.000011Slave_IO_Running: YesSlave_SQL_Running: Yes.略.Retrieved_Gtid_Set:Executed_Gtid_Set: 08999140-b588-11e7-b7ed-005056b45d5a:1-659715,0b82f00b-74fc-11e8-985c-005056827b12:1-440569,52650eac-dac7-11e7-aacf-005056827f22:1-90849Auto_Position: 1Replicate_Rewrite_DB:Channel_Name:Master_TLS_Version:1 row in set (0.00 sec)
# 当前拓扑已恢复为1主2从:

# ProxySQL的分组:171 读写,其他只读

# 操作日志记录,只调用了PostMasterFailoverProcesses:

# 再次压测,读写都集中在当前主库171上


以上的测试内容都是比较理想、简单的场景,而在实际业务环境中会复杂很多,各种意外状况都可能发生。感谢Orch提供各种Hooks,让我们可以通过定制化脚本不断完善它,最终进化成符合自己业务的高可用方案。
到站 收工





