MHA-手动Failover流程(传统复制&GTID复制)

Posted 醒嘞

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MHA-手动Failover流程(传统复制&GTID复制)相关的知识,希望对你有一定的参考价值。

本文仅梳理手动Failover流程。MHA的介绍详见:MySQL高可用架构之MHA

一、基本环境

1.1、复制结构

VMware10.0+CentOS6.9+mysql5.7.21

ROLE HOSTNAME BASEDIR DATADIR IP PORT
Node1 ZST1 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.132 3307
Node2 ZST2 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.133 3307
Node3 ZST3 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.134 3307

传统复制基于Row+Position,GTID复制基于Row+Gtid搭建的一主两从复制结构:Node1->{Node2、Node3}

1.2、MHA配置文件

文中使用的MHA版本是0.56,并且在Node1、Node2、Node3全部安装manager、node包
MHA的配置文件如下

# 全局级配置文件:/etc/masterha/masterha_default.conf
[root@ZST1 masterha]# cat masterha_default.conf 
[server default]
#MySQL的用户和密码
user=mydba
password=mysql5721

#系统ssh用户
ssh_user=root

#复制用户
repl_user=repl
repl_password=repl

#监控
ping_interval=5
#shutdown_script=/etc/masterha/send_report.sh

#切换调用的脚本
master_ip_failover_script=/etc/masterha/master_ip_failover
master_ip_online_change_script=/etc/masterha/master_ip_online_change

log_level=debug
[root@ZST1 masterha]# 


# 集群1配置文件:/etc/masterha/app1.conf
[root@ZST1 masterha]# cat app1.conf 
[server default]
#mha manager工作目录
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log
remote_workdir=/var/log/masterha/app1

[server1]
hostname=192.168.85.132
port=3307
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=1
check_repl_delay=0

[server2]
hostname=192.168.85.133
port=3307
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=1
check_repl_delay=0

[server3]
hostname=192.168.85.134
port=3307
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=1
check_repl_delay=0
[root@ZST1 masterha]# 
View Code

1.3、测试数据

通过停止从节点的io_thread,再往主节点写入数据,模拟出主从数据、从从数据不一致~

#首先清空表中记录
mydba@192.168.85.132,3307 [replcrash]> truncate table py_user;

#Node1写入第一条记录
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
#Node3停止io_thread
mydba@192.168.85.134,3307 [replcrash]> stop slave io_thread;

#Node1写入第二条记录
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
#Node2停止io_thread
mydba@192.168.85.133,3307 [replcrash]> stop slave io_thread;

#Node1写入第三条记录
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;

# 最终各节点记录如下
#Node1有三条记录
mydba@192.168.85.132,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
|   2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307   |
|   3 | 2d8900cc-325d-11e8-88e6-000c29c1 | 2018-03-28 15:54:01 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
3 rows in set (0.00 sec)
mydba@192.168.85.132,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000004 |     1303 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
#Node2有两条记录
mydba@192.168.85.133,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
|   2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
2 rows in set (0.00 sec)
mydba@192.168.85.133,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000007 |     8859 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
#Node1有一条记录
mydba@192.168.85.134,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
1 row in set (0.00 sec)
mydba@192.168.85.134,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 |    10322 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
View Code

很明显从节点Node3落后于从节点Node2、从节点Node2落后于主节点Node1

二、传统复制下手动Failover

手动Failover场景,Master挂掉,但是mha_manager没有开启,可以通过手动Failover

2.1、手动Failover

• 关闭Node1节点数据库服务

# 关闭Node1节点数据库服务
mydba@192.168.85.132,3307 [replcrash]> shutdown;

# Node2、Node3节点复制状态
mydba@192.168.85.133,3307 [replcrash]> pager cat | egrep \'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running\'
PAGER set to \'cat | egrep \'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running\'\'
mydba@192.168.85.133,3307 [replcrash]> show slave status\\G
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 973
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 973
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
1 row in set (0.00 sec)
mydba@192.168.85.133,3307 [replcrash]> 

mydba@192.168.85.134,3307 [replcrash]> pager cat | egrep \'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running\'
PAGER set to \'cat | egrep \'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running\'\'
mydba@192.168.85.134,3307 [replcrash]> show slave status\\G
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 643
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 643
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
1 row in set (0.00 sec)
mydba@192.168.85.134,3307 [replcrash]> 
View Code

此时,是否开启从库的io_thread没啥影响,主库已经down掉,从库的io_thread肯定是连不上去
• 手动Failover脚本,指定新Master为Node3

# Node1节点手动故障切换
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover
View Code

此时复制结构为Node1->{Node2、Node3},手动故障切换后结构为:Node3->{Node2}

2.2、切换流程

手动Failover日志输出

# 手动Failover 
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
Wed Mar 28 16:01:07 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
Wed Mar 28 16:01:07 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Wed Mar 28 16:01:07 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
Wed Mar 28 16:01:07 2018 - [info] MHA::MasterFailover version 0.56.
Wed Mar 28 16:01:07 2018 - [info] Starting master failover.
Wed Mar 28 16:01:07 2018 - [info] 
==================== 1、配置检查阶段,Start ====================
Wed Mar 28 16:01:07 2018 - [info] * Phase 1: Configuration Check Phase..
Wed Mar 28 16:01:07 2018 - [info] 
Wed Mar 28 16:01:08 2018 - [debug] Connecting to servers..
Wed Mar 28 16:01:09 2018 - [debug]  Connected to: 192.168.85.133(192.168.85.133:3307), user=mydba
Wed Mar 28 16:01:09 2018 - [debug]  Number of slave worker threads on host 192.168.85.133(192.168.85.133:3307): 0
Wed Mar 28 16:01:09 2018 - [debug]  Connected to: 192.168.85.134(192.168.85.134:3307), user=mydba
Wed Mar 28 16:01:09 2018 - [debug]  Number of slave worker threads on host 192.168.85.134(192.168.85.134:3307): 0
Wed Mar 28 16:01:09 2018 - [debug]  Comparing MySQL versions..
Wed Mar 28 16:01:09 2018 - [debug]   Comparing MySQL versions done.
Wed Mar 28 16:01:09 2018 - [debug] Connecting to servers done.
Wed Mar 28 16:01:09 2018 - [info] GTID failover mode = 0
Wed Mar 28 16:01:09 2018 - [info] Dead Servers:
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.132(192.168.85.132:3307)
Wed Mar 28 16:01:09 2018 - [info] Checking master reachability via MySQL(double check)...
Wed Mar 28 16:01:09 2018 - [info]  ok.
Wed Mar 28 16:01:09 2018 - [info] Alive Servers:
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.133(192.168.85.133:3307)
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.134(192.168.85.134:3307)
Wed Mar 28 16:01:09 2018 - [info] Alive Slaves:
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Mar 28 16:01:09 2018 - [debug]    Relay log info repository: FILE
Wed Mar 28 16:01:09 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Wed Mar 28 16:01:09 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Mar 28 16:01:09 2018 - [debug]    Relay log info repository: FILE
Wed Mar 28 16:01:09 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Wed Mar 28 16:01:09 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
******************** 选择是否继续进行 ********************
Master 192.168.85.132(192.168.85.132:3307) is dead. Proceed? (yes/NO): yes
Wed Mar 28 16:01:30 2018 - [info] Starting Non-GTID based failover.
Wed Mar 28 16:01:30 2018 - [info] 
Wed Mar 28 16:01:30 2018 - [info] ** Phase 1: Configuration Check Phase completed.
==================== 1、配置检查阶段,End ====================
Wed Mar 28 16:01:30 2018 - [info] 
==================== 2、故障Master关闭阶段,Start ====================
Wed Mar 28 16:01:30 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Mar 28 16:01:30 2018 - [info] 
Wed Mar 28 16:01:30 2018 - [debug]  Stopping IO thread on 192.168.85.133(192.168.85.133:3307)..
Wed Mar 28 16:01:30 2018 - [debug]  Stopping IO thread on 192.168.85.134(192.168.85.134:3307)..
Wed Mar 

以上是关于MHA-手动Failover流程(传统复制&GTID复制)的主要内容,如果未能解决你的问题,请参考以下文章

mha 复制检查报错“There is no alive server. We can't do failover”

MHA mysql主从故障转移

MySQL MHA--故障切换模式(GTID模式和非GTID模式)

MHA部署实现高可用

MHA故障切换脚本master_ip_failover结合VIP

MHA 传统复制和GTID复制的区别