MHA-手动Failover流程(传统复制>ID复制)
Posted 醒嘞
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MHA-手动Failover流程(传统复制>ID复制)相关的知识,希望对你有一定的参考价值。
本文仅梳理手动Failover流程。MHA的介绍详见:MySQL高可用架构之MHA
一、基本环境
1.1、复制结构
VMware10.0+CentOS6.9+mysql5.7.21
ROLE | HOSTNAME | BASEDIR | DATADIR | IP | PORT |
Node1 | ZST1 | /usr/local/mysql | /data/mysql/mysql3307/data | 192.168.85.132 | 3307 |
Node2 | ZST2 | /usr/local/mysql | /data/mysql/mysql3307/data | 192.168.85.133 | 3307 |
Node3 | ZST3 | /usr/local/mysql | /data/mysql/mysql3307/data | 192.168.85.134 | 3307 |
传统复制基于Row+Position,GTID复制基于Row+Gtid搭建的一主两从复制结构:Node1->{Node2、Node3}
1.2、MHA配置文件
文中使用的MHA版本是0.56,并且在Node1、Node2、Node3全部安装manager、node包
MHA的配置文件如下
# 全局级配置文件:/etc/masterha/masterha_default.conf [root@ZST1 masterha]# cat masterha_default.conf [server default] #MySQL的用户和密码 user=mydba password=mysql5721 #系统ssh用户 ssh_user=root #复制用户 repl_user=repl repl_password=repl #监控 ping_interval=5 #shutdown_script=/etc/masterha/send_report.sh #切换调用的脚本 master_ip_failover_script=/etc/masterha/master_ip_failover master_ip_online_change_script=/etc/masterha/master_ip_online_change log_level=debug [root@ZST1 masterha]# # 集群1配置文件:/etc/masterha/app1.conf [root@ZST1 masterha]# cat app1.conf [server default] #mha manager工作目录 manager_workdir=/var/log/masterha/app1 manager_log=/var/log/masterha/app1/app1.log remote_workdir=/var/log/masterha/app1 [server1] hostname=192.168.85.132 port=3307 master_binlog_dir=/data/mysql/mysql3307/logs candidate_master=1 check_repl_delay=0 [server2] hostname=192.168.85.133 port=3307 master_binlog_dir=/data/mysql/mysql3307/logs candidate_master=1 check_repl_delay=0 [server3] hostname=192.168.85.134 port=3307 master_binlog_dir=/data/mysql/mysql3307/logs candidate_master=1 check_repl_delay=0 [root@ZST1 masterha]#
1.3、测试数据
通过停止从节点的io_thread,再往主节点写入数据,模拟出主从数据、从从数据不一致~
#首先清空表中记录 mydba@192.168.85.132,3307 [replcrash]> truncate table py_user; #Node1写入第一条记录 mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id; #Node3停止io_thread mydba@192.168.85.134,3307 [replcrash]> stop slave io_thread; #Node1写入第二条记录 mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id; #Node2停止io_thread mydba@192.168.85.133,3307 [replcrash]> stop slave io_thread; #Node1写入第三条记录 mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id; # 最终各节点记录如下 #Node1有三条记录 mydba@192.168.85.132,3307 [replcrash]> select * from py_user; +-----+----------------------------------+---------------------+-----------+ | uid | name | add_time | server_id | +-----+----------------------------------+---------------------+-----------+ | 1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307 | | 2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307 | | 3 | 2d8900cc-325d-11e8-88e6-000c29c1 | 2018-03-28 15:54:01 | 1323307 | +-----+----------------------------------+---------------------+-----------+ 3 rows in set (0.00 sec) mydba@192.168.85.132,3307 [replcrash]> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000004 | 1303 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) #Node2有两条记录 mydba@192.168.85.133,3307 [replcrash]> select * from py_user; +-----+----------------------------------+---------------------+-----------+ | uid | name | add_time | server_id | +-----+----------------------------------+---------------------+-----------+ | 1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307 | | 2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307 | +-----+----------------------------------+---------------------+-----------+ 2 rows in set (0.00 sec) mydba@192.168.85.133,3307 [replcrash]> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000007 | 8859 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) #Node1有一条记录 mydba@192.168.85.134,3307 [replcrash]> select * from py_user; +-----+----------------------------------+---------------------+-----------+ | uid | name | add_time | server_id | +-----+----------------------------------+---------------------+-----------+ | 1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307 | +-----+----------------------------------+---------------------+-----------+ 1 row in set (0.00 sec) mydba@192.168.85.134,3307 [replcrash]> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000002 | 10322 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec)
很明显从节点Node3落后于从节点Node2、从节点Node2落后于主节点Node1
二、传统复制下手动Failover
手动Failover场景,Master挂掉,但是mha_manager没有开启,可以通过手动Failover
2.1、手动Failover
• 关闭Node1节点数据库服务
# 关闭Node1节点数据库服务 mydba@192.168.85.132,3307 [replcrash]> shutdown; # Node2、Node3节点复制状态 mydba@192.168.85.133,3307 [replcrash]> pager cat | egrep \'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running\' PAGER set to \'cat | egrep \'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running\'\' mydba@192.168.85.133,3307 [replcrash]> show slave status\\G Master_Log_File: mysql-bin.000004 Read_Master_Log_Pos: 973 Relay_Master_Log_File: mysql-bin.000004 Slave_IO_Running: No Slave_SQL_Running: Yes Exec_Master_Log_Pos: 973 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 1 row in set (0.00 sec) mydba@192.168.85.133,3307 [replcrash]> mydba@192.168.85.134,3307 [replcrash]> pager cat | egrep \'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running\' PAGER set to \'cat | egrep \'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running\'\' mydba@192.168.85.134,3307 [replcrash]> show slave status\\G Master_Log_File: mysql-bin.000004 Read_Master_Log_Pos: 643 Relay_Master_Log_File: mysql-bin.000004 Slave_IO_Running: No Slave_SQL_Running: Yes Exec_Master_Log_Pos: 643 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 1 row in set (0.00 sec) mydba@192.168.85.134,3307 [replcrash]>
此时,是否开启从库的io_thread没啥影响,主库已经down掉,从库的io_thread肯定是连不上去
• 手动Failover脚本,指定新Master为Node3
# Node1节点手动故障切换 [root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover
此时复制结构为Node1->{Node2、Node3},手动故障切换后结构为:Node3->{Node2}
2.2、切换流程
手动Failover日志输出
# 手动Failover [root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover --dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132. Wed Mar 28 16:01:07 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf.. Wed Mar 28 16:01:07 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Wed Mar 28 16:01:07 2018 - [info] Reading server configuration from /etc/masterha/app1.conf.. Wed Mar 28 16:01:07 2018 - [info] MHA::MasterFailover version 0.56. Wed Mar 28 16:01:07 2018 - [info] Starting master failover. Wed Mar 28 16:01:07 2018 - [info] ==================== 1、配置检查阶段,Start ==================== Wed Mar 28 16:01:07 2018 - [info] * Phase 1: Configuration Check Phase.. Wed Mar 28 16:01:07 2018 - [info] Wed Mar 28 16:01:08 2018 - [debug] Connecting to servers.. Wed Mar 28 16:01:09 2018 - [debug] Connected to: 192.168.85.133(192.168.85.133:3307), user=mydba Wed Mar 28 16:01:09 2018 - [debug] Number of slave worker threads on host 192.168.85.133(192.168.85.133:3307): 0 Wed Mar 28 16:01:09 2018 - [debug] Connected to: 192.168.85.134(192.168.85.134:3307), user=mydba Wed Mar 28 16:01:09 2018 - [debug] Number of slave worker threads on host 192.168.85.134(192.168.85.134:3307): 0 Wed Mar 28 16:01:09 2018 - [debug] Comparing MySQL versions.. Wed Mar 28 16:01:09 2018 - [debug] Comparing MySQL versions done. Wed Mar 28 16:01:09 2018 - [debug] Connecting to servers done. Wed Mar 28 16:01:09 2018 - [info] GTID failover mode = 0 Wed Mar 28 16:01:09 2018 - [info] Dead Servers: Wed Mar 28 16:01:09 2018 - [info] 192.168.85.132(192.168.85.132:3307) Wed Mar 28 16:01:09 2018 - [info] Checking master reachability via MySQL(double check)... Wed Mar 28 16:01:09 2018 - [info] ok. Wed Mar 28 16:01:09 2018 - [info] Alive Servers: Wed Mar 28 16:01:09 2018 - [info] 192.168.85.133(192.168.85.133:3307) Wed Mar 28 16:01:09 2018 - [info] 192.168.85.134(192.168.85.134:3307) Wed Mar 28 16:01:09 2018 - [info] Alive Slaves: Wed Mar 28 16:01:09 2018 - [info] 192.168.85.133(192.168.85.133:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Mar 28 16:01:09 2018 - [debug] Relay log info repository: FILE Wed Mar 28 16:01:09 2018 - [info] Replicating from 192.168.85.132(192.168.85.132:3307) Wed Mar 28 16:01:09 2018 - [info] Primary candidate for the new Master (candidate_master is set) Wed Mar 28 16:01:09 2018 - [info] 192.168.85.134(192.168.85.134:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled Wed Mar 28 16:01:09 2018 - [debug] Relay log info repository: FILE Wed Mar 28 16:01:09 2018 - [info] Replicating from 192.168.85.132(192.168.85.132:3307) Wed Mar 28 16:01:09 2018 - [info] Primary candidate for the new Master (candidate_master is set) ******************** 选择是否继续进行 ******************** Master 192.168.85.132(192.168.85.132:3307) is dead. Proceed? (yes/NO): yes Wed Mar 28 16:01:30 2018 - [info] Starting Non-GTID based failover. Wed Mar 28 16:01:30 2018 - [info] Wed Mar 28 16:01:30 2018 - [info] ** Phase 1: Configuration Check Phase completed. ==================== 1、配置检查阶段,End ==================== Wed Mar 28 16:01:30 2018 - [info] ==================== 2、故障Master关闭阶段,Start ==================== Wed Mar 28 16:01:30 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. Wed Mar 28 16:01:30 2018 - [info] Wed Mar 28 16:01:30 2018 - [debug] Stopping IO thread on 192.168.85.133(192.168.85.133:3307).. Wed Mar 28 16:01:30 2018 - [debug] Stopping IO thread on 192.168.85.134(192.168.85.134:3307).. Wed Mar以上是关于MHA-手动Failover流程(传统复制>ID复制)的主要内容,如果未能解决你的问题,请参考以下文章
mha 复制检查报错“There is no alive server. We can't do failover”
MySQL MHA--故障切换模式(GTID模式和非GTID模式)