Mysql-MHA集群

Posted 礁之

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Mysql-MHA集群相关的知识,希望对你有一定的参考价值。

文章目录

一、MHA概述

  • MHA(Master High Availability)目前在mysql高可用方面是一个相对成熟的解决方案,它由日本DeNA公司的youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

  • 该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

  • 在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

  • 目前MHA主要支持一主多从的架构。要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库。因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。(出自:《深入浅出MySQL(第二版)》)从代码层面看,MHA就是一套Perl脚本,那么相信以阿里系的技术实力,将MHA改成支持一主一从也并非难事。

MHA架构:

MHA工作原理总结为以下几条:

  1. MHA Manager 会定时探测集群中的 Master 节点;
  2. 当 Master 出现故障时,从宕机崩溃的 Master 保存二进制日志事件(BinLog Events);
  3. 识别含有最新更新的 Slave;
  4. 应用差异的中继日志(Relay Log)到其它的 Slave;
  5. 应用从 Master 保存的二进制日志事件(BinLog Events);
  6. 提升一个 Slave 为新的 Master,使其它 Slave 连接新的 Master 进行复制。

二、部署MHA

(1)实验环境

系统主机名ip安装软件扮演角色
Centos7.4master192.168.100.11mysqlmysql主节点
Centos7.4slave1192.168.100.12mysqlmysql从节点
Centos7.4slave2192.168.100.13mysqlmysql从节点
Centos7.4manager192.168.100.14MHA manager节点

(2)实验步骤

1、配置免密登录

#由于 MHA Manager 通过 SSH 访问所有的 Node 节点,各个 Node 节点也同样通过 SSH 来相互发送不同的 Relay Log 文件,所以要在每一个 Node 和 Manager 上配置 SSH 无密码登陆。每个节点都需要向其他主机包括自己发送私钥,下面只做master的免密
[root@master ~]# ssh-keygen -t rsa
[root@master ~]# ssh-copy-id root@192.168.100.211  #自己也需要发
[root@master ~]# ssh-copy-id root@192.168.100.212
[root@master ~]# ssh-copy-id root@192.168.100.213
[root@master ~]# ssh-copy-id root@192.168.100.214

2、安装MHA软件包

#准备工作,四台主机都需要操作
[root@master ~]# mkdir /root/mha
[root@master ~]# cd /root/mha
[root@master mha]# ll   #上传三个软件包
总用量 5520
-rw-r--r-- 1 root root   81080 8月   2 04:18 mha4mysql-manager-0.57-0.el7.noarch.rpm
-rw-r--r-- 1 root root   35360 8月   2 04:18 mha4mysql-node-0.57-0.el7.noarch.rpm
-rw-r--r-- 1 root root 5532094 8月   2 04:18 mhapath.tar.gz
[root@master mha]# cat <<END > /etc/yum.repos.d/centos.repo
[centos]
name=centos7
baseurl=file:///mnt
enabled=1
gpgcheck=0

[mha]
name=mha
baseurl=file:///root/mha/mhapath
enabled=1
gpgcheck=0
END
[root@master mha]# mount /dev/cdrom /mnt

#在各节点上安装 mha4mysql-node,包括一个master主节点和两个slave从节点
[root@master mha]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
[root@master mha]# rpm -ivh /root/mha/mha4mysql-node-0.57-0.el7.noarch.rpm
#在manager节点安装 Manager 服务
[root@manager mha]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN
[root@manager mha]# rpm -ivh /root/mha/mha4mysql-manager-0.57-0.el7.noarch.rpm 

-Manager 管理工具

在 Manager 节点安装完成后会生成一些管理工具,Manager 的主要管理工具有:

masterha_check_ssh:检查 MHA 的 SSH 配置状况。
masterha_check_repl:检查 MySQL 复制状况。
masterha_manager:启动 MHA。
masterha_check_status:检查当前 MHA 运行状态。
masterha_master_monitor:检查 Master 是否宕机。
masterha_master_switch:控制故障转移(自动或者手动)。
masterha_conf_host:添加或删除配置的 Server 信息

3、配置主从复制

#主服务器上配置
[root@master mha]# cat <<END >> /etc/my.cnf
log-bin=mysql-bin-master
server-id=1
END
[root@master mha]# cat /etc/my.cnf
[mysqld]
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
character_set_server=utf8
init_connect='SET NAMES utf8'
log-error=/usr/local/mysql/logs/mysqld.log
pid-file=/usr/local/mysql/logs/mysqld.pid
skip-name-resolve

log-bin=mysql-bin-master
server-id=1
[root@master mha]# systemctl restart mysqld
[root@master mha]# mysql -u root -p123123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 2
Server version: 5.7.12-log Source distribution

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement.

mysql> grant replication slave on *.* to repl@'192.168.100.%' identified by '123123';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant all privileges on *.* to root@'192.168.100.%' identified by '123123';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye
#两台从服务器上配置
——————————————————————————————————————————————————————————————————slave1:
[root@slave1 mha]# cat <<END >> /etc/my.cnf
log-bin=mysql-slave1
server-id=2
log_slave_updates=1
END
[root@slave1 mha]# cat /etc/my.cnf
[mysqld]
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
character_set_server=utf8
init_connect='SET NAMES utf8'
log-error=/usr/local/mysql/logs/mysqld.log
pid-file=/usr/local/mysql/logs/mysqld.pid
skip-name-resolve

log-bin=mysql-slave1
server-id=2
log_slave_updates=1
[root@slave1 mha]# systemctl restart mysqld
[root@slave1 mha]# mysql -uroot -p123123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 2
Server version: 5.7.12-log Source distribution

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement.

mysql> grant replication slave on *.* to repl@'192.168.100.%' identified by '123123';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant all privileges on *.* to root@'192.168.100.%' identified by '123123';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye

————————————————————————————————————————————————slave2:
[root@slave2 mha]# cat <<END >> /etc/my.cnf
log-bin=mysql-slave2
server-id=3
log_slave_updates=1
END
[root@slave2 mha]# cat /etc/my.cnf
[mysqld]
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
character_set_server=utf8
init_connect='SET NAMES utf8'
log-error=/usr/local/mysql/logs/mysqld.log
pid-file=/usr/local/mysql/logs/mysqld.pid
skip-name-resolve

log-bin=mysql-slave2
server-id=3
log_slave_updates=1
[root@slave2 mha]# systemctl restart mysqld
[root@slave2 mha]# mysql -uroot -p123123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 2
Server version: 5.7.12-log Source distribution

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement.

mysql> grant replication slave on *.* to repl@'192.168.100.%' identified by '123123';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> grant all privileges on *.* to root@'192.168.100.%' identified by '123123';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye
#建立主从复制(两台从节点操作一致)
————————————————————————————————————————————————————————————————slave1
[root@slave1 mha]# mysql -uroot -p123123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 4
Server version: 5.7.12-log Source distribution

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement.

mysql> change master to 
       master_host='192.168.100.211',
       master_user='repl',
       master_password='123123'; 
Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\\G;
。。。。。。
             Slave_IO_Running: Yes     #看这两项yes就行
             Slave_SQL_Running: Yes
。。。。。。
mysql> exit
Bye

————————————————————————————————————————————————————————————————slave2
[root@slave2 mha]# mysql -uroot -p123123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 3
Server version: 5.7.12-log Source distribution

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement.

mysql> change master to 
    ->        master_host='192.168.100.211',
    ->        master_user='repl',
    ->        master_password='123123'; 
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\\G;
。。。。。。
             Slave_IO_Running: Yes     #看这两项yes就行
             Slave_SQL_Running: Yes
。。。。。。
mysql> exit
Bye

4、设置 MySQL 程序及 BinLog 程序的软连接

#三台节点上都操作
[root@master mha]# ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
[root@master mha]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog

5、在两台 Slave 设置临时只读和不清除中继日志

[root@slave1 mha]#  mysql -uroot -p123123 -e 'set global read_only=1'
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@slave1 mha]# mysql -uroot -p123123 -e 'set global relay_log_purge=0'
mysql: [Warning] Using a password on the command line interface can be insecure.
——————————————————————————————————————————————————————————————————————————————
[root@slave2 mha]# mysql -uroot -p123123 -e 'set global relay_log_purge=0'
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@slave2 mha]# mysql -uroot -p123123 -e 'set global read_only=1'
mysql: [Warning] Using a password on the command line interface can be insecure.

6、配置 MHA 工作目录及配置文件

#在manager主机上操作
[root@manager mha]# mkdir -p /etc/masterha
[root@manager mha]# mkdir -p /var/log/masterha/app1
[root@manager mha]# vim /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1				# 设置 manager 的工作目录
manager_log=/var/log/masterha/app1/manager.log		# 设置 manager 的日志文件
master_binlog_dir=/usr/local/mysql/data/			# 设置 master 保存 binlog 的位置. 以便 MHA 可以找到 master 的日志
user=root						# 设置监控 mysql 的用户
password=123123					# 设置监控 mysql 用户的密码
ping_interval=1					# 设置监控主库. 发送 ping 包的时间间隔
remote_workdir=/tmp				# 设置远端 mysql 在发生切换时 binlog 的保存位置
repl_user=repl					# 设置 mysql 中用于复制的用户
repl_password=123123			# 设置 mysql 中用于复制的用户密码
ssh_user=root					# 设置 ssh 的登录用户名

[server1]
hostname=192.168.100.211
port=3306

[server2]
hostname=192.168.100.212
port=3306
candidate_master=1				# 设置当前节点为候选的 master
check_repl_delay=0				# 当落后 master 100M 的 relay logs 时. MHA 将不会选择该 slave 作为一个新的 master

[server3]
hostname=192.168.100.213
port=3306
#保存退出
[root@manager mha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf  #检测 SSH 连接是否配置正常
Mon Aug  2 04:55:52 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Aug  2 04:55:52 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Aug  2 04:55:52 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Aug  2 04:55:52 2021 - [info] Starting SSH connection tests..
Mon Aug  2 04:55:53 2021 - [debug] 
Mon Aug  2 04:55:52 2021 - [debug]  Connecting via SSH from root@192.168.100.211(192.168.100.211:22) to root@192.168.100.212(192.168.100.212:22)..
Mon Aug  2 04:55:53 2021 - [debug]   ok.
Mon Aug  2 04:55:53 2021 - [debug]  Connecting via SSH from root@192.168.100.211(192.168.100.211:22) to root@192.168.100.213(192.168.100.213:22)..
Mon Aug  2 04:55:53 2021 - [debug]   ok.
Mon Aug  2 04:55:54 2021 - [debug] 
Mon Aug  2 04:55:53 2021 - [debug]  Connecting via SSH from root@192.168.100.212(192.168.100.212:22) to root@192.168.100.211(192.168.100.211:22)..
Mon Aug  2 04:55:53 2021 - [debug]   ok.
Mon Aug  2 04:55:53 2021 - [debug]  Connecting via SSH from root@192.168.100.212(192.168.100.212:22) to root@192.168.100.213(192.168.100.213:22)..
Mon Aug  2 04:55:53 2021 - [debug]   ok.
Mon Aug  2 04:55:54 2021 - [debug] 
Mon Aug  2 04:55:53 2021 - [debug]  Connecting via SSH from root@192.168.100.213(192.168.100.213:22) to root@192.168.100.211(192.168.100.211:22)..
Mon Aug  2 04:55:53 2021 - [debug]   ok.
Mon Aug  2 04:55:53 2021 - [debug]  Connecting via SSH from root@192.168.100.213(192.168.100.213:22) to root@192.168.100.212(192.168.100.212:22)..
Mon Aug  2 04:55:54 2021 - [debug]   ok.
Mon Aug  2 04:55:54 2021 - [info] All SSH connection tests passed successfully.   #上面的都是ok,并且出现这行就是正常

7、在Manager节点检查复制配置

[root@manager mha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Mon Aug  2 05:26:40 2021 - [warning] master_ip_failover_script is not defined.
Mon Aug  2 05:26:40 2021 - [warning] shutdown_script is not defined.
Mon Aug  2 05:26:40 2021 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.   #ok表示正常

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FLud5wXn-1647244513093)(F:\\个人文档\\markdown图片\\image-20210801212653515.png)]

8、启动Manager

  • MySQL 中主从的工作状态检测及切换是由 Manager 节点来完成的,MHA 安装完成以及检测通过后就可以根据自己的需求开启以及停止 Manager。
#Manager 是通过 masterha_manager 命令开启,启动后需要将它放在后台运行。
[root@manager mha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf \\
 --remove_dead_master_conf --ignore_last_failover < /dev/null > \\
 /var/log/masterha/app1/manager.log 2>&1 &
[1] 1743
  • 检测 Manager 的工作状态
#当 MHA Manager 启动监控以后,如果没有异常则不会打印任何信息。我们可通过 masterha_check_status 命令检查 Manager 的状态。
[root@manager app1]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:14175) is running(0:PING_OK), master:192.168.100.211    #running即可
  • 在 MHA 环境中配置 VIP
#通过 MHA 进行故障转移后,连接 MySQL 数据库的服务并不知道 MySQL 复制环境中进行了故障的转移,同时连 MySQL 的服务也无法知晓主节点是哪一个,此时,可以通过配置 VIP 的方式让所有的应用程序连接 VIP,当 MySQL 故障切换时,VIP 会自动漂移到新的主节点。
————————————————————————————————————————————————————在master主节点上配置vip
[root@master ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:65:e8:85 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.211/24 brd 192.168.100.255 scope global ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::e029:444:1ec0:7c78/64 scope link 
       valid_lft forever preferred_lft forever
    inet6 fe80::7ab:dbe:2aec:32fa/64 scope link tentative dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::34f4:cad:16ae:5b4d/64 scope link tentative dadfailed 
       valid_lft forever preferred_lft forever
[root@master ~]# ifconfig ens32:1 192.168.100.188 netmask 255.255.255.0 up
[root@master ~]# ifconfig ens32:1
ens32:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.100.188  netmask 255.255.255.0  broadcast 192.168.100.255
        ether 00:0c:29:65:e8:85  txqueuelen 1000  (Ethernet)
        
————————————————————————————————————————————在manager上修改 MHA 配置文件,使其支持 VIP
[root@manager ~]# vim /etc/masterha/app1.cnf 
[server default]
master_ip_failover_script=/usr/bin/master_ip_failover   #在第一行下添加
#保存退出
  • 在manager节点上编写 VIP 自动切换脚本
[root@manager ~]# vim /usr/bin/master_ip_failover   #就是在app1上指定的文件
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
my $vip = '192.168.100.188/24';       #记得修改vip
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";
GetOptions(
    'command=s'          => \\$command,
    'ssh_user=s'         => \\$ssh_user,
    'orig_master_host=s' => \\$orig_master_host,
    'orig_master_ip=s'   => \\$orig_master_ip,
    'orig_master_port=i' => \\$orig_master_port,
    'new_master_host=s'  => \\$new_master_host,
    'new_master_ip=s'    => \\$new_master_ip,
    'new_master_port=i'  => \\$new_master_port,
);
exit &main();
sub main 
    print "\\n\\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\\n\\n";
    if ( $command eq "stop" || $command eq "stopssh" ) 
        my $exit_code = 1;
        eval 
            print "Disabling the VIP on old master: $orig_master_host \\n";
            &stop_vip();
            $exit_code = 0;
        ;
        if ($@) 
            warn "Got Error: $@\\n";
            exit $exit_code;
        
        exit $exit_code;
    
    elsif ( $command eq "start" ) 
        my $exit_code = 10;
        eval 
            print "Enabling the VIP - $vip on the new master - $new_master_host \\n";
            &start_vip();
            $exit_code = 0;
        ;
        if ($@) 
            warn $@;
            exit $exit_code;
        
        exit $exit_code;
    
    elsif ( $command eq "status" ) 
        print "Checking the Status of the script.. OK \\n";
        exit 0;
    
    else 
        &usage();
        exit 1;
    

sub start_vip() 
    `ssh $ssh_user\\@$new_master_host \\" $ssh_start_vip \\"`;

sub stop_vip() 
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\\@$orig_master_host \\" $ssh_stop_vip \\"`;

sub usage 
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\\n";

#保存退出
[root@manager ~]# chmod +x /usr/bin/master_ip_failover     #添加可执行权限
  • 再次在manager上进行检测
[root@manager ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
。。。。。。
MySQL Replication Health is OK.    #这里也是ok
  • 开启监控
[root@manager ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf \\
--remove_dead_master_conf --ignore_last_failover < /dev/null > \\
/var/log/masterha/app1/manager.log 2>&1 &
[1] 14175
[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:14175) is running(0:PING_OK), master:192.168.100.211

9、验证故障转移

******(1)关闭master上的mysql
[root@master ~]# systemctl stop mysqld

******(2)去manager上查看日志
[root@manager ~]# tail -n10 /var/log/masterha/app1/manager.log
Mon Aug  2 06:10:00 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.211' (111))
Mon Aug  2 06:10:00 2021 - [warning] Connection failed 4 time(s)..
Mon Aug  2 06:10:00 2021 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Mon Aug  2 06:10:01 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.211' (111))
Mon Aug  2 06:10:01 2021 - [warning] Connection failed 1 time(s)..
Mon Aug  2 06:10:01 2021 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.100.212 -s 192.168.100.213  --user=root  --master_host=192.168.100.211  --master_ip=192.168.100.211  --master_port=3306 --master_user=root --master_password=123123 --ping_type=SELECT
Mon Aug  2 06:10:01 2021 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin-master
sh: /usr/local/bin/masterha_secondary_check: 没有那个文件或目录
Mon Aug  2 06:10:01 2021 - [error][/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm, ln412] Got unknown error from /usr/local/bin/masterha_secondary_check -s 192.168.100.212 -s 192.168.100.213  --user=root  --master_host=192.168.100.211  --master_ip=192.168.100.211  --master_port=3306 --master_user=root --master_password=123123 --ping_type=SELECT. exit.
Mon Aug  2 06:10:01 2021 - [info] HealthCheck: SSH to 192.168.100.211 is reachable.

******(3)在slave1上查看,这样做只会飘逸一次,并且当master重新连回来不会回到主从复制的集群中
[root@slave1 ~]# ifconfig ens32:1   #发现vip已经漂移到了slave1上
ens32:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.100.188  netmask 255.255.255.0  broadcast 192.168.100.255
        ether 00:0c:29:7d:7d:68  txqueuelen 1000  (Ethernet)

三、MHA报错解决

  • 确认安装依赖
yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
  • 报错1
运行masterha_check_repl --conf=/etc/masterha/app1.cnf报错

Testing [mysql](https://www.yisu.com/mysql/) connection and privileges..sh: mysql: command not found
mysql command failed with rc 127:0!
at /usr/bin/apply_diff_relay_logs line 375

解决方案:ln -s /usr/local/mysql/bin/mysql /usr/bin
  • 报错2
运行masterha_check_repl --conf=/etc/masterha/app1.cnf报错

Can't exec "mysqlbinlog": No such file or directory at /usr/local/perl5/MHA/BinlogManager.pm line 99.

解决方案:在node节点上执行 which mysqlbinlog,比如我的结果就是
[localhost~]$ which mysqlbinlog
/usr/local/mysql/bin/mysqlbinlog

ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
  • 报错3
运行master_check_ssh --conf=/etc/masterha/aap1.cnf报错

connection via SSH fromroot@192.168.17.199toroot@192.168.17.200 ...
permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)
[error] [/usr/local/share/perl5/MHA/SSHcheck.pm,ln163]

解决方案:一般是公钥有问题,需要删除 /root/.ssh/known_hosts里面的相关ip内容 重新生成一下就ok了
  • 报错4
运行master_check_ssh --conf=/etc/masterha/aap1.cnf报错

Sun Nov 20 20:10:59 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000001 
Sun Nov 20 20:10:59 2016 - [info] Connecting to root@172.18.3.180(172.18.3.180).. 
Failed to save binary log: Binlog not found from /data/mysqllog/3306! If 

以上是关于Mysql-MHA集群的主要内容,如果未能解决你的问题,请参考以下文章

Linux云计算-MySQL-高可用集群架构-MHA 架构

面试:MySql-MHA

Elasticsearch学习之集群常见状况处理(干货)

快速检测集群的健康状况

在 C# 应用程序中获取集群 azure kubernetes 服务 (AKS) 运行状况/可用性

kubernetes 二进制安装(v1.20.16)验证 master 部署