MHA

Posted 2020-10-13 John_2011

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了MHA相关的知识，希望对你有一定的参考价值。

1、检查MHA Manager到所有MHA Node的SSH连接状态

[[email protected] ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 
Thu Nov  9 11:01:51 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Nov  9 11:01:51 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Nov  9 11:01:51 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Nov  9 11:01:51 2017 - [info] Starting SSH connection tests..
Thu Nov  9 11:01:52 2017 - [debug] 
Thu Nov  9 11:01:51 2017 - [debug]  Connecting via SSH from [email protected]192.168.1.120(192.168.1.120:22) to [email protected]192.168.1.119(192.168.1.119:22)..
Thu Nov  9 11:01:52 2017 - [debug]   ok.
Thu Nov  9 11:01:52 2017 - [debug]  Connecting via SSH from [email protected]192.168.1.120(192.168.1.120:22) to [email protected]192.168.1.121(192.168.1.121:22)..
Thu Nov  9 11:01:52 2017 - [debug]   ok.
Thu Nov  9 11:01:52 2017 - [debug] 
Thu Nov  9 11:01:51 2017 - [debug]  Connecting via SSH from [email protected]192.168.1.119(192.168.1.119:22) to [email protected]192.168.1.120(192.168.1.120:22)..
Thu Nov  9 11:01:52 2017 - [debug]   ok.
Thu Nov  9 11:01:52 2017 - [debug]  Connecting via SSH from [email protected]192.168.1.119(192.168.1.119:22) to [email protected]192.168.1.121(192.168.1.121:22)..
Thu Nov  9 11:01:52 2017 - [debug]   ok.
Thu Nov  9 11:01:53 2017 - [debug] 
Thu Nov  9 11:01:52 2017 - [debug]  Connecting via SSH from [email protected]192.168.1.121(192.168.1.121:22) to [email protected]192.168.1.119(192.168.1.119:22)..
Thu Nov  9 11:01:53 2017 - [debug]   ok.
Thu Nov  9 11:01:53 2017 - [debug]  Connecting via SSH from [email protected]192.168.1.121(192.168.1.121:22) to [email protected]192.168.1.120(192.168.1.120:22)..
Thu Nov  9 11:01:53 2017 - [debug]   ok.
Thu Nov  9 11:01:53 2017 - [info] All SSH connection tests passed successfully.

2、检查整个复制环境状态

[[email protected] ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Thu Nov  9 11:29:10 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Nov  9 11:29:10 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Nov  9 11:29:10 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Nov  9 11:29:10 2017 - [info] MHA::MasterMonitor version 0.57.
Thu Nov  9 11:29:11 2017 - [info] GTID failover mode = 0
Thu Nov  9 11:29:11 2017 - [info] Dead Servers:
Thu Nov  9 11:29:11 2017 - [info] Alive Servers:
Thu Nov  9 11:29:11 2017 - [info]   192.168.1.119(192.168.1.119:3306)
Thu Nov  9 11:29:11 2017 - [info]   192.168.1.120(192.168.1.120:3306)
Thu Nov  9 11:29:11 2017 - [info]   192.168.1.121(192.168.1.121:3306)
Thu Nov  9 11:29:11 2017 - [info] Alive Slaves:
Thu Nov  9 11:29:11 2017 - [info]   192.168.1.120(192.168.1.120:3306)  Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Thu Nov  9 11:29:11 2017 - [info]     Replicating from 192.168.1.119(192.168.1.119:3306)
Thu Nov  9 11:29:11 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov  9 11:29:11 2017 - [info]   192.168.1.121(192.168.1.121:3306)  Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Thu Nov  9 11:29:11 2017 - [info]     Replicating from 192.168.1.119(192.168.1.119:3306)
Thu Nov  9 11:29:11 2017 - [info] Current Alive Master: 192.168.1.119(192.168.1.119:3306)
Thu Nov  9 11:29:11 2017 - [info] Checking slave configurations..
Thu Nov  9 11:29:11 2017 - [info]  read_only=1 is not set on slave 192.168.1.120(192.168.1.120:3306).
Thu Nov  9 11:29:11 2017 - [info]  read_only=1 is not set on slave 192.168.1.121(192.168.1.121:3306).
Thu Nov  9 11:29:11 2017 - [info] Checking replication filtering settings..
Thu Nov  9 11:29:11 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Nov  9 11:29:11 2017 - [info]  Replication filtering check ok.
Thu Nov  9 11:29:11 2017 - [info] GTID (with auto-pos) is not supported
Thu Nov  9 11:29:11 2017 - [info] Starting SSH connection tests..
Thu Nov  9 11:29:14 2017 - [info] All SSH connection tests passed successfully.
Thu Nov  9 11:29:14 2017 - [info] Checking MHA Node version..
Thu Nov  9 11:29:14 2017 - [info]  Version check ok.
Thu Nov  9 11:29:14 2017 - [info] Checking SSH publickey authentication settings on the current master..
Thu Nov  9 11:29:14 2017 - [info] HealthCheck: SSH to 192.168.1.119 is reachable.
Thu Nov  9 11:29:15 2017 - [info] Master MHA Node version is 0.57.
Thu Nov  9 11:29:15 2017 - [info] Checking recovery script configurations on 192.168.1.119(192.168.1.119:3306)..
Thu Nov  9 11:29:15 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/data/mysql/tmp/save_binary_logs_test --manager_version=0.57 --start_file=mysql-bin.000004 
Thu Nov  9 11:29:15 2017 - [info]   Connecting to [email protected]192.168.1.119(192.168.1.119:22).. 
  Creating /data/mysql/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000004
Thu Nov  9 11:29:15 2017 - [info] Binlog setting check done.
Thu Nov  9 11:29:15 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Nov  9 11:29:15 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=192.168.1.120 --slave_ip=192.168.1.120 --slave_port=3306 --workdir=/data/mysql/tmp --target_version=5.7.20-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Thu Nov  9 11:29:15 2017 - [info]   Connecting to [email protected]192.168.1.120(192.168.1.120:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-log.000006
    Temporary relay log file is /var/lib/mysql/relay-log.000006
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu Nov  9 11:29:15 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=192.168.1.121 --slave_ip=192.168.1.121 --slave_port=3306 --workdir=/data/mysql/tmp --target_version=5.7.20-log --manager_version=0.57 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Thu Nov  9 11:29:15 2017 - [info]   Connecting to [email protected]192.168.1.121(192.168.1.121:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-log.000008
    Temporary relay log file is /var/lib/mysql/relay-log.000008
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu Nov  9 11:29:16 2017 - [info] Slaves settings check done.
Thu Nov  9 11:29:16 2017 - [info] 
192.168.1.119(192.168.1.119:3306) (current master)
 +--192.168.1.120(192.168.1.120:3306)
 +--192.168.1.121(192.168.1.121:3306)

Thu Nov  9 11:29:16 2017 - [info] Checking replication health on 192.168.1.120..
Thu Nov  9 11:29:16 2017 - [info]  ok.
Thu Nov  9 11:29:16 2017 - [info] Checking replication health on 192.168.1.121..
Thu Nov  9 11:29:16 2017 - [info]  ok.
Thu Nov  9 11:29:16 2017 - [warning] master_ip_failover_script is not defined.
Thu Nov  9 11:29:16 2017 - [warning] shutdown_script is not defined.
Thu Nov  9 11:29:16 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

3、检查MHA Manager的状态（masterha_check_status）

[[email protected] ~]# masterha_check_status --conf=/etc/masterha/app1.cnf

4、开启MHA Manager监控（masterha_manager）

[[email protected] ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover &

参数说明：

--remove_dead_master_conf：该参数表示当发生主从切换后，老的主库的IP将会从配置文件中移除。

--ignore_last_failover：在缺省情况下，如果MHA检测到连续发生宕机，且两次宕机时间间隔不足8小时的话，则不会进行failover，之所以这样限制是为了避免ping-pong效应。

5、关闭MHA Manager监控（masterha_stop）

[[email protected] ~]# masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover  (wd: /var/log/masterha/app1)
(wd now: ~)

6、各个脚本说明

[[email protected] scripts]# pwd
/root/mha4mysql-manager-0.57/samples/scripts
[[email protected] scripts]# ll
total 32
-rwxr-xr-x. 1 1001 1001  3648 May 31  2015 master_ip_failover
-rwxr-xr-x. 1 1001 1001  9870 May 31  2015 master_ip_online_change
-rwxr-xr-x. 1 1001 1001 11867 May 31  2015 power_manager
-rwxr-xr-x. 1 1001 1001  1360 May 31  2015 send_report

#自动切换时vip管理的脚本，如果使用keepalived的，可以编写脚本完成对vip的管理
master_ip_failover

#在线切换时vip的管理
master_ip_online_change

#故障发生后关闭主机的脚本
power_manager

#因故障切换后发送报警的脚本
send_report

以上是关于MHA的主要内容，如果未能解决你的问题，请参考以下文章