MHA部署
半同步复制
要求
MySQL 5.5及以上版本
变量have_dynamic_loading为YES
异步复制已经存在
概念简介
异步复制:
MySQL默认的复制即是异步的,主库在执行完客户端提交的事务后会立即将结果返给给客户端,并不关心从库是否已经接收并处理,这样就会有一个问题,主如果crash掉了,此时主上已经提交的事务可能并没有传到从上,如果此时,强行将从提升为主,可能导致新主上的数据不完整
半同步复制:
主库在执行完客户端提交的事务后不是立刻返回给客户端,而是等待至少一个从库接收到并写到relay log中才返回给客户端。相对于异步复制,半同步复制提高了数据的安全性,同时它也造成了一定程度的延迟,这个延迟最少是一个TCP/IP往返的时间。所以,半同步复制最好在低延时的网络中使
开启半同步复制
1. 加载插件
用户需要有SUPER权限
主: mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
从: mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
2. 查看插件是否已经加载成功
mysql> SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME LIKE '%semi%';
3. 启动半同步复制
主: mysql> SET GLOBAL rpl_semi_sync_master_enabled = 1;
从: mysql> SET GLOBAL rpl_semi_sync_slave_enabled = 1;
或者在配置文件中指定:
主:
plugin-load=rpl_semi_sync_master=semisync_master.so
rpl_semi_sync_master_enabled=1复制
从:
plugin-load=rpl_semi_sync_slave=semisync_slave.so
rpl_semi_sync_slave_enabled=1复制
在有的高可用架构下,master和slave需同时启动,以便在切换后能继续使用半同步复制:
plugin-load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
rpl-semi-sync-master-enabled = 1
rpl-semi-sync-slave-enabled = 1复制
4. 重启从上的IO线程
mysql> STOP SLAVE IO_THREAD;
mysql> START SLAVE IO_THREAD;
5. 验证半同步是否在运行
主: mysql> show status like 'Rpl_semi_sync_master_status';
从: mysql> show status like 'Rpl_semi_sync_slave_status';
6.gtid开启
在my.cnf配置中加入
gtid-mode=on
enforce-gtid-consistency=true
master-info-repository=TABLE
relay-log-info-repository=TABLE复制
搭建MHA
一. 配置ssh无密钥证书
1. Master上生成秘钥
cd root/.ssh
ssh-keygen -t dsa -P '' -f id_dsa #生成秘钥
cat id_dsa.pub >> authorized_keys #复制秘钥复制
2. Master生成完秘钥后,在Candicate master及Slave中按相同方式生成秘钥
3. 构建通用的authorized_keys
在Master上(172.16.150.179)
cd root/.ssh
scp id_dsa.pub root@172.16.150.180:/root/.ssh/id_dsa.pub.179
scp id_dsa.pub root@172.16.150.181:/root/.ssh/id_dsa.pub.179复制
以相同方式将180和181上的id_dsa.pub文件传至179
cat id_dsa.pub.180 >> authorized_keys
cat id_dsa.pub.181 >> authorized_keys复制
4. 验证三台机器间是否实现了ssh免密码互通
二. 安装MHA
1. 安装MHA Node
安装Node前需要安装依赖
yum install perl-DBD-MySQL -y
yum install cpan复制在所有节点安装mha Node
tar zxvf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56
perl Makefile.PL
make && make install复制
2. 安装MHA Manager(179上)
MHA Manager依赖模块
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
复制安装Manager
tar zxvf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56
perl Makefile.PL
make && make install复制复制相关脚本
cd tmp/mha4mysql-manager-0.56/samples/scripts/
cp * usr/local/bin/复制
3. 配置MHA
创建目录
mkdir -p etc/masterha
cp tmp/mha4mysql-manager-0.56/samples/conf/app1.cnf etc/masterha/
vim etc/masterha/app1.cnf复制配置文件如下:
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
master_binlog_dir=/mysql/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=123456
user=root
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=root
report_script=/usr/local/bin/send_report
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 172.16.150.179 -s 172.16.150.180
ssh_user=root
[server1]
hostname=172.16.150.179
candidate_master=1
[server2]
hostname=172.16.150.180
candidate_master=1
[server3]
hostname=172.16.150.181复制在每个Slave节点上设置relay log的清除方式
mysql> set global relay_log_purge=0;
复制修改脚本文件,配置脚本中的vip切换
vim usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '172.16.159.101/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig bond0:$key $vip";
my $arp = "/usr/sbin/arping -I bond0 -s 172.16.159.101 -b -c 1 172.16.159.254 ";
my $ssh_stop_vip = "/sbin/ifconfig bond0:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
`ssh $ssh_user\@$new_master_host \" $arp \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";复制
4. 验证MHA基础功能
检查ssh配置
masterha_check_ssh --conf=/etc/masterha/app1.cnf
复制正常的返回结果
[root@localhost conf]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Thu Oct 19 10:15:11 2017 - [warning] Global configuration file etc/masterha_default.cnf not found. Skipping.
Thu Oct 19 10:15:11 2017 - [info] Reading application default configuration from etc/masterha/app1.cnf..
Thu Oct 19 10:15:11 2017 - [info] Reading server configuration from etc/masterha/app1.cnf..
Thu Oct 19 10:15:11 2017 - [info] Starting SSH connection tests..
Thu Oct 19 10:15:12 2017 - [debug]
Thu Oct 19 10:15:11 2017 - [debug] Connecting via SSH from root@172.16.150.179(172.16.150.179:22) to root@172.16.150.180(172.16.150.180:22)..
Thu Oct 19 10:15:12 2017 - [debug] ok.
Thu Oct 19 10:15:12 2017 - [debug] Connecting via SSH from root@172.16.150.179(172.16.150.179:22) to root@172.16.150.181(172.16.150.181:22)..
Thu Oct 19 10:15:12 2017 - [debug] ok.
Thu Oct 19 10:15:12 2017 - [debug]
Thu Oct 19 10:15:12 2017 - [debug] Connecting via SSH from root@172.16.150.180(172.16.150.180:22) to root@172.16.150.179(172.16.150.179:22)..
Thu Oct 19 10:15:12 2017 - [debug] ok.
Thu Oct 19 10:15:12 2017 - [debug] Connecting via SSH from root@172.16.150.180(172.16.150.180:22) to root@172.16.150.181(172.16.150.181:22)..
Thu Oct 19 10:15:12 2017 - [debug] ok.
Thu Oct 19 10:15:13 2017 - [debug]
Thu Oct 19 10:15:12 2017 - [debug] Connecting via SSH from root@172.16.150.181(172.16.150.181:22) to root@172.16.150.179(172.16.150.179:22)..
Thu Oct 19 10:15:13 2017 - [debug] ok.
Thu Oct 19 10:15:13 2017 - [debug] Connecting via SSH from root@172.16.150.181(172.16.150.181:22) to root@172.16.150.180(172.16.150.180:22)..
Thu Oct 19 10:15:13 2017 - [debug] ok.
Thu Oct 19 10:15:13 2017 - [info] All SSH connection tests passed successfully.复制检查整个复制环境状况
masterha_check_repl --conf=/etc/masterha/app1.cnf
复制正常的返回结果
[root@localhost conf]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Oct 19 10:18:17 2017 - [warning] Global configuration file etc/masterha_default.cnf not found. Skipping.
Thu Oct 19 10:18:17 2017 - [info] Reading application default configuration from etc/masterha/app1.cnf..
Thu Oct 19 10:18:17 2017 - [info] Reading server configuration from etc/masterha/app1.cnf..
Thu Oct 19 10:18:17 2017 - [info] MHA::MasterMonitor version 0.56.
Thu Oct 19 10:18:18 2017 - [info] GTID failover mode = 1
Thu Oct 19 10:18:18 2017 - [info] Dead Servers:
Thu Oct 19 10:18:18 2017 - [info] Alive Servers:
Thu Oct 19 10:18:18 2017 - [info] 172.16.150.179(172.16.150.179:3306)
Thu Oct 19 10:18:18 2017 - [info] 172.16.150.180(172.16.150.180:3306)
Thu Oct 19 10:18:18 2017 - [info] 172.16.150.181(172.16.150.181:3306)
Thu Oct 19 10:18:18 2017 - [info] Alive Slaves:
Thu Oct 19 10:18:18 2017 - [info] 172.16.150.180(172.16.150.180:3306) Version=5.6.24-log (oldest major version between slaves) log-bin:enabled
Thu Oct 19 10:18:18 2017 - [info] GTID ON
Thu Oct 19 10:18:18 2017 - [info] Replicating from 172.16.150.179(172.16.150.179:3306)
Thu Oct 19 10:18:18 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Oct 19 10:18:18 2017 - [info] 172.16.150.181(172.16.150.181:3306) Version=5.6.24-log (oldest major version between slaves) log-bin:enabled
Thu Oct 19 10:18:18 2017 - [info] GTID ON
Thu Oct 19 10:18:18 2017 - [info] Replicating from 172.16.150.179(172.16.150.179:3306)
Thu Oct 19 10:18:18 2017 - [info] Current Alive Master: 172.16.150.179(172.16.150.179:3306)
Thu Oct 19 10:18:18 2017 - [info] Checking slave configurations..
Thu Oct 19 10:18:18 2017 - [info] read_only=1 is not set on slave 172.16.150.180(172.16.150.180:3306).
Thu Oct 19 10:18:18 2017 - [info] read_only=1 is not set on slave 172.16.150.181(172.16.150.181:3306).
Thu Oct 19 10:18:18 2017 - [info] Checking replication filtering settings..
Thu Oct 19 10:18:18 2017 - [info] binlog_do_db= , binlog_ignore_db=
Thu Oct 19 10:18:18 2017 - [info] Replication filtering check ok.
Thu Oct 19 10:18:18 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Oct 19 10:18:18 2017 - [info] Checking SSH publickey authentication settings on the current master..
Thu Oct 19 10:18:18 2017 - [info] HealthCheck: SSH to 172.16.150.179 is reachable.
Thu Oct 19 10:18:18 2017 - [info]
172.16.150.179(172.16.150.179:3306) (current master)
+--172.16.150.180(172.16.150.180:3306)
+--172.16.150.181(172.16.150.181:3306)
Thu Oct 19 10:18:18 2017 - [info] Checking replication health on 172.16.150.180..
Thu Oct 19 10:18:18 2017 - [info] ok.
Thu Oct 19 10:18:18 2017 - [info] Checking replication health on 172.16.150.181..
Thu Oct 19 10:18:18 2017 - [info] ok.
Thu Oct 19 10:18:18 2017 - [info] Checking master_ip_failover_script status:
Thu Oct 19 10:18:18 2017 - [info] usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.150.179 --orig_master_ip=172.16.150.179 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.150.220/24===
Checking the Status of the script.. OK
Thu Oct 19 10:18:18 2017 - [info] OK.
Thu Oct 19 10:18:18 2017 - [warning] shutdown_script is not defined.
Thu Oct 19 10:18:18 2017 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.复制
5. 启动MHA
开启MHA Manager:
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < dev/null > var/log/masterha/app1/manager.log 2>&1 &
复制检查MHA Manager状态:
masterha_check_status --conf=/etc/masterha/app1.cnf
返回:
app1 (pid:2816) is running(0:PING_OK), master:172.16.150.179复制查看日志:
tailf -n 100 var/log/masterha/app1/manager.log
复制关闭MHA:
masterha_stop --conf=/etc/masterha/app1.cnf
复制
6. 在线切换
关闭mha manager
masterha_stop --conf=/etc/masterha/app1.cnf
复制手动切换
masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.16.150.179 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1000
复制在线切换必须满足的条件:
所有slave的IO线程都在运行;
所有slave的SQL线程都在运行;
所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒;
在master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒;
恢复步骤
1. 重新启动宕机的数据库
2. 将宕机的机器作为新的从加入集群
找出故障时所处的位置:
grep 'All other slaves should start replication from here' /var/log/masterha/app1/manager.log
复制将找出的CHANGE MASTER... 语句作为主从复制语句,在新从库上执行:
CHANGE MASTER TO MASTER_HOST='172.16.150.180', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='root', MASTER_PASSWORD='xxx';
复制开启主从:
[root@localhost][two]> CHANGE MASTER TO MASTER_HOST='172.16.150.180', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='root', MASTER_PASSWORD='123456';
Query OK, 0 rows affected, 2 warnings (0.59 sec)
[root@localhost][two]> start slave;
Query OK, 0 rows affected (0.18 sec)复制验证主从:
[root@localhost][two]> start slave;
复制在manager节点修改配置文件:
vim /etc/masterha/app1.cnf
#加入以下配置 (mha会在切换时切除无效的server配置信息,重启前需要把这些配置添加回去)
[server1]
hostname=172.16.150.179
candidate_master=1复制
3. 还原主从关系(可选)
验证MHA复制状态:
masterha_check_status --conf=/etc/masterha/app1.cnf
复制手动切换主从关系
masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.16.150.179 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1000
复制