mysql5.6部署MHA

真菌的博客 2021-04-20

281

MHA部署

半同步复制

要求

MySQL 5.5及以上版本
变量have_dynamic_loading为YES
异步复制已经存在

概念简介

异步复制：
MySQL默认的复制即是异步的，主库在执行完客户端提交的事务后会立即将结果返给给客户端，并不关心从库是否已经接收并处理，这样就会有一个问题，主如果crash掉了，此时主上已经提交的事务可能并没有传到从上，如果此时，强行将从提升为主，可能导致新主上的数据不完整
半同步复制：
主库在执行完客户端提交的事务后不是立刻返回给客户端，而是等待至少一个从库接收到并写到relay log中才返回给客户端。相对于异步复制，半同步复制提高了数据的安全性，同时它也造成了一定程度的延迟，这个延迟最少是一个TCP/IP往返的时间。所以，半同步复制最好在低延时的网络中使

开启半同步复制

1. 加载插件

用户需要有SUPER权限

主：
mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';

从：
mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';

2. 查看插件是否已经加载成功

mysql> SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME LIKE '%semi%';

3. 启动半同步复制

主：
mysql> SET GLOBAL rpl_semi_sync_master_enabled = 1;

从：
mysql> SET GLOBAL rpl_semi_sync_slave_enabled = 1;

或者在配置文件中指定：

主：

plugin-load=rpl_semi_sync_master=semisync_master.so   
rpl_semi_sync_master_enabled=1

复制

从：

plugin-load=rpl_semi_sync_slave=semisync_slave.so
rpl_semi_sync_slave_enabled=1

复制

在有的高可用架构下，master和slave需同时启动，以便在切换后能继续使用半同步复制：

plugin-load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
rpl-semi-sync-master-enabled = 1
rpl-semi-sync-slave-enabled = 1

复制

4. 重启从上的IO线程

mysql> STOP SLAVE IO_THREAD;

mysql> START SLAVE IO_THREAD;

5. 验证半同步是否在运行

主： mysql> show status like 'Rpl_semi_sync_master_status';

从： mysql> show status like 'Rpl_semi_sync_slave_status';

6.gtid开启

在my.cnf配置中加入

    gtid-mode=on
    enforce-gtid-consistency=true
    master-info-repository=TABLE
    relay-log-info-repository=TABLE

复制

搭建MHA

一. 配置ssh无密钥证书

1. Master上生成秘钥

    cd root/.ssh   
    ssh-keygen -t dsa -P '' -f id_dsa        #生成秘钥
    cat id_dsa.pub >> authorized_keys        #复制秘钥

复制

2. Master生成完秘钥后，在Candicate master及Slave中按相同方式生成秘钥

3. 构建通用的authorized_keys

在Master上（172.16.150.179）

cd root/.ssh
scp id_dsa.pub root@172.16.150.180:/root/.ssh/id_dsa.pub.179
scp id_dsa.pub root@172.16.150.181:/root/.ssh/id_dsa.pub.179

复制

以相同方式将180和181上的id_dsa.pub文件传至179

cat id_dsa.pub.180 >> authorized_keys
cat id_dsa.pub.181 >> authorized_keys

复制

4. 验证三台机器间是否实现了ssh免密码互通

二. 安装MHA

1. 安装MHA Node

安装Node前需要安装依赖

yum install perl-DBD-MySQL -y
yum install cpan

复制

在所有节点安装mha Node

tar zxvf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56
perl Makefile.PL
make && make install

复制

2. 安装MHA Manager（179上）

MHA Manager依赖模块

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y

复制

安装Manager

tar zxvf mha4mysql-manager-0.56.tar.gz　　　
cd mha4mysql-manager-0.56
perl Makefile.PL 　　　
make && make install

复制

复制相关脚本

cd tmp/mha4mysql-manager-0.56/samples/scripts/
cp * usr/local/bin/

复制

3. 配置MHA

创建目录

mkdir -p etc/masterha
cp tmp/mha4mysql-manager-0.56/samples/conf/app1.cnf etc/masterha/
vim etc/masterha/app1.cnf

复制

配置文件如下：

[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
master_binlog_dir=/mysql/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=123456
user=root
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=root
report_script=/usr/local/bin/send_report
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 172.16.150.179 -s 172.16.150.180
ssh_user=root

[server1]
hostname=172.16.150.179
candidate_master=1

[server2]
hostname=172.16.150.180
candidate_master=1

[server3]
hostname=172.16.150.181

复制

在每个Slave节点上设置relay log的清除方式
```
mysql> set global relay_log_purge=0;

复制
```

修改脚本文件，配置脚本中的vip切换

vim usr/local/bin/master_ip_failover

        #!/usr/bin/env perl

            use strict;
        use warnings FATAL => 'all';

        use Getopt::Long;

        my (
            $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
            $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
        );

        my $vip = '172.16.159.101/24';
        my $key = '1';
        my $ssh_start_vip = "/sbin/ifconfig bond0:$key $vip";
        my $arp = "/usr/sbin/arping -I bond0 -s 172.16.159.101 -b -c 1 172.16.159.254 ";
        my $ssh_stop_vip = "/sbin/ifconfig bond0:$key down";

        GetOptions(
            'command=s'          => \$command,
            'ssh_user=s'         => \$ssh_user,
            'orig_master_host=s' => \$orig_master_host,
            'orig_master_ip=s'   => \$orig_master_ip,
            'orig_master_port=i' => \$orig_master_port,
            'new_master_host=s'  => \$new_master_host,
            'new_master_ip=s'    => \$new_master_ip,
            'new_master_port=i'  => \$new_master_port,
        );

        exit &main();

        sub main {

            print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

            if ( $command eq "stop" || $command eq "stopssh" ) {

                my $exit_code = 1;
                eval {
                    $exit_code = 0;
                };
                if ($@) {
                    warn "Got Error: $@\n";
                    exit $exit_code;
                }
                exit $exit_code;
            }
            elsif ( $command eq "start" ) {

                my $exit_code = 10;
                eval {
                    print "Enabling the VIP - $vip on the new master - $new_master_host \n";
                    &start_vip();
                    $exit_code = 0;
                };
                if ($@) {
                    warn $@;
                    exit $exit_code;
                }
                exit $exit_code;
            }
            elsif ( $command eq "status" ) {
                print "Checking the Status of the script.. OK \n";
                exit 0;
            }
            else {
                &usage();
                exit 1;
            }
        }

        sub start_vip() {
            `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
            `ssh $ssh_user\@$new_master_host \" $arp \"`;
        }
        sub stop_vip() {
             return 0  unless  ($ssh_user);
            `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
        }

        sub usage {
            print
            "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";

复制

4. 验证MHA基础功能

检查ssh配置

masterha_check_ssh --conf=/etc/masterha/app1.cnf

复制

正常的返回结果

[root@localhost conf]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Thu Oct 19 10:15:11 2017 - [warning] Global configuration file etc/masterha_default.cnf not found. Skipping.
Thu Oct 19 10:15:11 2017 - [info] Reading application default configuration from etc/masterha/app1.cnf..
Thu Oct 19 10:15:11 2017 - [info] Reading server configuration from etc/masterha/app1.cnf..
Thu Oct 19 10:15:11 2017 - [info] Starting SSH connection tests..
Thu Oct 19 10:15:12 2017 - [debug] 
Thu Oct 19 10:15:11 2017 - [debug]  Connecting via SSH from root@172.16.150.179(172.16.150.179:22) to root@172.16.150.180(172.16.150.180:22)..
Thu Oct 19 10:15:12 2017 - [debug]   ok.
Thu Oct 19 10:15:12 2017 - [debug]  Connecting via SSH from root@172.16.150.179(172.16.150.179:22) to root@172.16.150.181(172.16.150.181:22)..
Thu Oct 19 10:15:12 2017 - [debug]   ok.
Thu Oct 19 10:15:12 2017 - [debug] 
Thu Oct 19 10:15:12 2017 - [debug]  Connecting via SSH from root@172.16.150.180(172.16.150.180:22) to root@172.16.150.179(172.16.150.179:22)..
Thu Oct 19 10:15:12 2017 - [debug]   ok.
Thu Oct 19 10:15:12 2017 - [debug]  Connecting via SSH from root@172.16.150.180(172.16.150.180:22) to root@172.16.150.181(172.16.150.181:22)..
Thu Oct 19 10:15:12 2017 - [debug]   ok.
Thu Oct 19 10:15:13 2017 - [debug] 
Thu Oct 19 10:15:12 2017 - [debug]  Connecting via SSH from root@172.16.150.181(172.16.150.181:22) to root@172.16.150.179(172.16.150.179:22)..
Thu Oct 19 10:15:13 2017 - [debug]   ok.
Thu Oct 19 10:15:13 2017 - [debug]  Connecting via SSH from root@172.16.150.181(172.16.150.181:22) to root@172.16.150.180(172.16.150.180:22)..
Thu Oct 19 10:15:13 2017 - [debug]   ok.
Thu Oct 19 10:15:13 2017 - [info] All SSH connection tests passed successfully.

复制

检查整个复制环境状况

masterha_check_repl --conf=/etc/masterha/app1.cnf

复制

正常的返回结果

[root@localhost conf]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Oct 19 10:18:17 2017 - [warning] Global configuration file etc/masterha_default.cnf not found. Skipping.
Thu Oct 19 10:18:17 2017 - [info] Reading application default configuration from etc/masterha/app1.cnf..
Thu Oct 19 10:18:17 2017 - [info] Reading server configuration from etc/masterha/app1.cnf..
Thu Oct 19 10:18:17 2017 - [info] MHA::MasterMonitor version 0.56.
Thu Oct 19 10:18:18 2017 - [info] GTID failover mode = 1
Thu Oct 19 10:18:18 2017 - [info] Dead Servers:
Thu Oct 19 10:18:18 2017 - [info] Alive Servers:
Thu Oct 19 10:18:18 2017 - [info]   172.16.150.179(172.16.150.179:3306)
Thu Oct 19 10:18:18 2017 - [info]   172.16.150.180(172.16.150.180:3306)
Thu Oct 19 10:18:18 2017 - [info]   172.16.150.181(172.16.150.181:3306)
Thu Oct 19 10:18:18 2017 - [info] Alive Slaves:
Thu Oct 19 10:18:18 2017 - [info]   172.16.150.180(172.16.150.180:3306)  Version=5.6.24-log (oldest major version between slaves) log-bin:enabled
Thu Oct 19 10:18:18 2017 - [info]     GTID ON
Thu Oct 19 10:18:18 2017 - [info]     Replicating from 172.16.150.179(172.16.150.179:3306)
Thu Oct 19 10:18:18 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Oct 19 10:18:18 2017 - [info]   172.16.150.181(172.16.150.181:3306)  Version=5.6.24-log (oldest major version between slaves) log-bin:enabled
Thu Oct 19 10:18:18 2017 - [info]     GTID ON
Thu Oct 19 10:18:18 2017 - [info]     Replicating from 172.16.150.179(172.16.150.179:3306)
Thu Oct 19 10:18:18 2017 - [info] Current Alive Master: 172.16.150.179(172.16.150.179:3306)
Thu Oct 19 10:18:18 2017 - [info] Checking slave configurations..
Thu Oct 19 10:18:18 2017 - [info]  read_only=1 is not set on slave 172.16.150.180(172.16.150.180:3306).
Thu Oct 19 10:18:18 2017 - [info]  read_only=1 is not set on slave 172.16.150.181(172.16.150.181:3306).
Thu Oct 19 10:18:18 2017 - [info] Checking replication filtering settings..
Thu Oct 19 10:18:18 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Oct 19 10:18:18 2017 - [info]  Replication filtering check ok.
Thu Oct 19 10:18:18 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Oct 19 10:18:18 2017 - [info] Checking SSH publickey authentication settings on the current master..
Thu Oct 19 10:18:18 2017 - [info] HealthCheck: SSH to 172.16.150.179 is reachable.
Thu Oct 19 10:18:18 2017 - [info] 
172.16.150.179(172.16.150.179:3306) (current master)
 +--172.16.150.180(172.16.150.180:3306)
 +--172.16.150.181(172.16.150.181:3306)

Thu Oct 19 10:18:18 2017 - [info] Checking replication health on 172.16.150.180..
Thu Oct 19 10:18:18 2017 - [info]  ok.
Thu Oct 19 10:18:18 2017 - [info] Checking replication health on 172.16.150.181..
Thu Oct 19 10:18:18 2017 - [info]  ok.
Thu Oct 19 10:18:18 2017 - [info] Checking master_ip_failover_script status:
Thu Oct 19 10:18:18 2017 - [info]   usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.150.179 --orig_master_ip=172.16.150.179 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.150.220/24===

Checking the Status of the script.. OK 
Thu Oct 19 10:18:18 2017 - [info]  OK.
Thu Oct 19 10:18:18 2017 - [warning] shutdown_script is not defined.
Thu Oct 19 10:18:18 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

复制

5. 启动MHA

开启MHA Manager：

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < dev/null > var/log/masterha/app1/manager.log 2>&1 &

复制

检查MHA Manager状态：

masterha_check_status --conf=/etc/masterha/app1.cnf
返回：
app1 (pid:2816) is running(0:PING_OK), master:172.16.150.179

复制

查看日志：

tailf -n 100 var/log/masterha/app1/manager.log

复制

关闭MHA：

masterha_stop --conf=/etc/masterha/app1.cnf

复制

6. 在线切换

关闭mha manager

masterha_stop --conf=/etc/masterha/app1.cnf

复制

手动切换

masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.16.150.179 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1000

复制

在线切换必须满足的条件：

所有slave的IO线程都在运行；
所有slave的SQL线程都在运行；
所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒，如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒；
在master端，通过show processlist输出，没有一个更新花费的时间大于running_updates_limit秒；

恢复步骤

1. 重新启动宕机的数据库

2. 将宕机的机器作为新的从加入集群

找出故障时所处的位置：

grep 'All other slaves should start replication from here' /var/log/masterha/app1/manager.log

复制

将找出的CHANGE MASTER... 语句作为主从复制语句，在新从库上执行：

CHANGE MASTER TO MASTER_HOST='172.16.150.180', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='root', MASTER_PASSWORD='xxx';

复制

开启主从：

[root@localhost][two]> CHANGE MASTER TO MASTER_HOST='172.16.150.180', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='root', MASTER_PASSWORD='123456';
Query OK, 0 rows affected, 2 warnings (0.59 sec)

[root@localhost][two]> start slave;
Query OK, 0 rows affected (0.18 sec)

复制

验证主从：

[root@localhost][two]> start slave;

复制

在manager节点修改配置文件：

vim /etc/masterha/app1.cnf

#加入以下配置 （mha会在切换时切除无效的server配置信息，重启前需要把这些配置添加回去）
[server1]
hostname=172.16.150.179
candidate_master=1

复制

3. 还原主从关系（可选）

验证MHA复制状态：

masterha_check_status --conf=/etc/masterha/app1.cnf

复制

手动切换主从关系

masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.16.150.179 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1000
复制

mysql

文章转载自真菌的博客，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。