Mysql高可用—5.7 gtid 主从 + orchestrator 配置及简单测试

Posted 翔之天空

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Mysql高可用—5.7 gtid 主从 + orchestrator 配置及简单测试相关的知识,希望对你有一定的参考价值。

 

参考:https://github.com/openark/orchestrator/tree/master/docs

下载地址:https://github.com/openark/orchestrator/releases

 

目录

--1、环境信息:

--2、安装orchestrator (三台服务器 都要安装)

--3、配置orchestrator及被管理数据库(三台服务器 都要配置)

--4、启动(三台服务器 都要启动orchestrator,让其orchestrator通过raft一致性算法来保障自身高可用)

--4-1、debug启动orchestrator

--4-2、登录web管理orchestrator:

--5、orchestrator-client  后台命令管理orchestrator

--6、orchestrator的自身高可用

--7、被管理的mysql服务 高可用

--7-1、主库宕机 自动恢复

--7-2、优雅的提升主库(主从切换)

--7-3、手工恢复

--7-4、手工强制恢复

--7-5、从库1(192.168.56.60:3306) 宕机

附:启动orchestrator时 报错 及 解决


 

--1、环境信息:

--1-1、版本:
系统版本:CentOS7 
mysql版本:Mysql5.7
orch版本: orchestrator-3.1.4    下载地址:https://github.com/openark/orchestrator/releases


--1-2、目前已配置环境:
--Mysql一主两从: 是属于被管理的三个mysql数据库
主库: 主机名:hostmysql80  IP:192.168.56.80  端口:3306  server_id:1113306  读写
从库1:主机名:hostmysql61  IP:192.168.56.61  端口:3306  server_id:2223306  只读
从库2:主机名:hostmysql60  IP:192.168.56.60  端口:3306  server_id:3333306  只读
--必要参数my.cnf设置如下:GTID复制、ROw模式、log_slave_updates开启、slave_net_timeout实际情况设置 但默认60s才重连主库时间过长
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
log_slave_updates = 1
slave_net_timeout = 4



--Orchestrator的后端mysql库 用于存储orch的配置信息 这里三个节点是做raft高可用(复用服务器 在原有mysql服务器上的第二个实例 3307端口 来配置Orchestrator的后端mysql)
--存储orch的信息 可以选择mysql库 也可以选择sqlite库。 这里选择了mysql库 所有的参数配置也是按照mysql库的来配置
192.168.56.80  3307        server_id:1113307       
192.168.56.61  3307        server_id:2223307
192.168.56.60  3307        server_id:3333307




--1-3、三台服务器 都要 配置/etc/hosts 如下:
[root@hostmysql80 mysql_setup]# vi /etc/hosts
192.168.56.80  hostmysql80
192.168.56.61  hostmysql61 
192.168.56.60  hostmysql60 

 

--2、安装orchestrator (三台服务器 都要安装)

--2-1、安装orchestrator的依赖包
[root@hostmysql80 mysql_setup]# rpm -ivh oniguruma-5.9.5-3.el7.x86_64.rpm

[root@hostmysql80 mysql_setup]# rpm -ivh jq-1.6-1.el7.x86_64.rpm         

--2-2、安装orchestrator
[root@hostmysql80 mysql_setup]# rpm -ivh orchestrator-3.1.4-1.x86_64.rpm

--查看安装包
[root@hostmysql80 ~]# cd /usr/local/orchestrator
[root@hostmysql80 orchestrator]# ll
total 19480
-rwxr-xr-x. 1 root root 19927832 Jan 26 22:05 orchestrator                                //程序
-rw-rw-r--. 1 root root     5436 Jan 26 22:04 orchestrator-sample.conf.json               //参数配置sample模板,正式参数文件为orchestrator.conf.json
-rw-rw-r--. 1 root root     5060 Jan 26 22:04 orchestrator-sample-sqlite.conf.json        //后端为sqlite数据库的参数模板
drwxr-xr-x. 7 root root       82 Apr  2 16:07 resources                                   //orchestrator相关的文件

 

--3、配置orchestrator及被管理数据库(三台服务器 都要配置)

--3-1、配置 /usr/local/orchestrator/orchestrator.conf.json 参数配置文件 , 参考官网https://github.com/openark/orchestrator/blob/master/docs/configuration.md


--3-2、orch 后端数据库 3307 需要建立orchestrator库 及 管理用户 
--(建立orchestrator_srv管理用户  host指定127.0.0.1 本机访问即可)
--参照参数  MySQLOrchestratorHost、MySQLOrchestratorPort、MySQLOrchestratorDatabase、MySQLOrchestratorUser、MySQLOrchestratorPassword、MySQLOrchestratorCredentialsConfigFile  
[root@hostmysql80 orchestrator]# mysql -uroot -pRoot123$ -h127.0.0.1  -P3307
CREATE DATABASE IF NOT EXISTS orchestrator;
CREATE USER 'orchestrator_srv'@'127.0.0.1' IDENTIFIED BY 'orc_server_password';
GRANT ALL PRIVILEGES ON orchestrator.* TO 'orchestrator_srv'@'127.0.0.1';



--3-3、被管理的一主两从 mysql库 3306 需要新增orchestrator用来连接的用户
--(建立orch_topology_user被管理用户  host指定orchestrator管理的IP即可 192.168.56.% ,在orchestrator连接被管理库时 用'orch_topology_user'@'hostname' 形式连接 /etc/hosts之前已配置过)
--参照参数  MySQLTopologyUser、MySQLTopologyPassword
[root@hostmysql80 orchestrator]# mysql -uroot -pRoot123$ -h127.0.0.1  -P3306
CREATE USER 'orch_topology_user'@'192.168.56.%' IDENTIFIED BY 'orch_topology_password';
GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orch_topology_user'@'192.168.56.%';
GRANT SELECT ON mysql.slave_master_info TO 'orch_topology_user'@'192.168.56.%';
GRANT SELECT ON perconadb.* TO 'orch_topology_user'@'192.168.56.%';                       //用户赋权一些监控信息的表权限
--GRANT SELECT ON ndbinfo.processes TO 'orch_topology_user'@'192.168.56.%';               //Only for NDB Cluster 只在NDB环境设置



--3-4、被管理的一主两从 mysql库 3306  需要修改从库的复制参数
--slave_net_timeout是等待主库发送数据过来的时间(单位 秒),超过这个时间就会重连主库,5.7.7之后默认60秒。
--MASTER_CONNECT_RETRY是尝试连接主库的间隔时间(单位 秒),默认60秒。
--MASTER_RETRY_COUNT是尝试连接主库的次数 默认86400次。 
--master_heartbeat_period 是主库主动发送给从库的心跳包间隔时间,默认是slave_net_timeout的1/2时间(30秒)。
--默认参数情况下 如果从库等待了slave_net_timeout(60)秒还没有接收到主库的数据(包括每30秒一次的心跳包),就会发起重连主库 会每隔MASTER_CONNECT_RETRY(60)秒重连一次,直到重新连接上或者尝试了MASTER_RETRY_COUNT(86400)次后依然没连接上 放弃了。
--这样等待重连主库的时间会很长 60秒才重试一次,所以修改参数 如下: 从库如果等待4秒没有接收到主库数据(包括每2秒一次的心跳包) 就每隔1秒 重连主库 直到连接上或者尝试86400次之后放弃。

set global slave_net_timeout = 4;                                                               // /etc/my.cnf参数文件也需要修改
CHANGE MASTER TO MASTER_CONNECT_RETRY=1, MASTER_RETRY_COUNT=86400, master_heartbeat_period=2;   //需要stop slave 

 

 

--4、启动(三台服务器 都要启动orchestrator,让其orchestrator通过raft一致性算法来保障自身高可用)

--4-1、debug启动orchestrator


[root@hostmysql80 ~]# cd /usr/local/orchestrator

[root@hostmysql80 orchestrator]# ./orchestrator --debug --config=/usr/local/orchestrator/orchestrator.conf.json http

--4-2、登录web管理orchestrator:

http://192.168.56.60:3000   三个orchestrator 节点任意一个就可以

--添加 被管理的数据库(任意一个库就可以)

--查看被管理数据库的主从复制状态

--查看orchestrator 三节点的状态

 

 

--5、orchestrator-client  后台命令管理orchestrator

--5-1、安装orchestrator-client (任意一台服务器安装即可,可以不在orchestrator或者被管理mysql的服务器上安装)
[root@hostmysql80 mysql_setup]# rpm -ivh orchestrator-client-3.1.4-1.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:orchestrator-client-1:3.1.4-1    ################################# [100%]


--5-2、设置环境变量
[root@hostmysql80 ~]# vi /etc/profile.d/orchestrator-client.sh
export ORCHESTRATOR_API="hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api"

--5-3、查看帮助
[root@hostmysql80 ~]# orchestrator-client --help              
Usage: orchestrator-client -c <command> [flags...]
Example: orchestrator-client -c which-master -i some.replica
Options:

  -h, --help
    print this help
  -c <command>, --command <command>
    indicate the operation to perform (see listing below)
  -a <alias>, --alias <alias>
    cluster alias
......


--5-4、基本命令
--列出 所有集群  clusters
[root@hostmysql80 ~]# orchestrator-client -c clusters
hostmysql80:3306
--列出 指定集群下 所有实例 which-cluster-instances
[root@hostmysql80 orchestrator]# orchestrator-client -c which-cluster-instances -i  hostmysql80:3306
hostmysql60:3306
hostmysql61:3306
hostmysql80:3306

--指定复制集群的拓扑
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

--搜索实例 search
[root@hostmysql80 orchestrator]# orchestrator-client -c search -i  host
hostmysql61:3306
hostmysql80:3306
hostmysql60:3306


--遗忘一个指定实例 forget   (forget-cluster 是遗忘集群)
[root@hostmysql80 ~]# orchestrator-client -c forget -i hostmysql60:3306

--发现一个指定实例 discover  (async-discover 异步发现)
[root@hostmysql80 ~]# orchestrator-client -c discover -i hostmysql60:3306
hostmysql60:3306

 

--6、orchestrator的自身高可用

-- orchestr ator 通过raft分布式一致性协议来保证自身的高可用 3 节点时 最多1节点失败可保障 orchestrator正常使用,5节点时 最多2节点失败  可保障 orchestrator正常使用。 --下面演示 3节点 有1个节点失败 和 2个节点失败的场景
--6-1、查看当前raft的情况是健康的 ,leader节点 是192.168.56.60:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-leader
192.168.56.60:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-health
healthy

--6-2、干掉当前leader(192.168.56.60:10008)节点的orchestrator
[root@hostmysql60 ~]# ps -ef|grep orch
root      5981  5776  2 15:42 pts/1    00:03:29 ./orchestrator --config=/usr/local/orchestrator/orchestrator.conf.json http
root      7867  6138  0 18:27 pts/2    00:00:00 grep --color=auto orch
[root@hostmysql60 ~]# kill -9 5981

--6-3、再次查看当前raft的情况是健康的(3节点 失败1节点 无问题) ,leader节点 变成了192.168.56.80:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-leader
192.168.56.80:10008
[root@hostmysql80 ~]#  orchestrator-client -c raft-health
healthy

--6-4、干掉当前leader(192.168.56.80:10008)节点的orchestrator
[root@hostmysql80 ~]# ps -ef|grep orch
root      7402  7310  1 15:31 pts/2    00:03:17 ./orchestrator --debug --config=/usr/local/orchestrator/orchestrator.conf.json http
root      9722  7687  0 18:29 pts/1    00:00:00 grep --color=auto orch
[root@hostmysql80 ~]# kill -9 7402

--6-5、再次查看当前raft的情况是非健康的(3节点 失败2节点 orchestrator挂了) ,如下图 查看web界面一样 变灰 orchestrator 不可用了
[root@hostmysql80 ~]# orchestrator-client -c raft-leader
orchestrator-client[9738]: Cannot determine leader from hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api
[root@hostmysql80 ~]# orchestrator-client -c raft-health
orchestrator-client[9770]: Cannot determine leader from hostmysql80:3000/api hostmysql61:3000/api hostmysql60:3000/api

 

--7、被管理的mysql服务 高可用

--参考官网: https://github.com/openark/orchestrator/blob/master/docs/topology-recovery.md#automated-recovery --高可用方式:1:主库宕机 自动恢复、2:优雅的提升主库(主从切换)、3:手工恢复、4:手工强制恢复    

--7-1、主库宕机 自动恢复

--主库宕机了, 根据从库优先提升主库规则 自动的把该从库提升为新主库(此时 旧主库独立 可以手工设置 read_only=1 然后拉回复制集群中 )  。 如果参数 "ApplyMySQLPromotionAfterMasterFailover": true,   那么自动的对新主库执行 reset slave all 和 set read_only=0 (但要注意 /etc/my.cnf参数文件并没有更改)。
--需要在orchestrator.conf.json 参数配置文件 设置如下参数:
  "FailureDetectionPeriodBlockMinutes": 60,                       //在该时间内再次出现故障,不会在次被监控
  "RecoveryPeriodBlockSeconds": 3600,                             //在该时间内再次出现故障,不会进行迁移,避免出现恢复冲突
  "RecoveryIgnoreHostnameFilters": [],                            //恢复将忽略匹配的主机
  "RecoverMasterClusterFilters": ["*"],                           //设置自动failover 切换主库。只对匹配的集群进行主恢复("*"模式匹配所有)
  "RecoverIntermediateMasterClusterFilters":["*"],                //设置自动failover 切换主库。只对匹配的集群进行恢复("*"模式匹配所有内容)


--设置 提升 主库的规则,优先提升hostmysql61:3306为主库
[root@hostmysql80 ~]# orchestrator-client -c register-candidate -i hostmysql61:3306 --promotion-rule prefer
hostmysql61:3306


--查看集群
[root@hostmysql80 orchestrator]# orchestrator-client -c clusters
hostmysql80:3306
--查看集群的复制拓补状态 都正常的
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID


--主库(hostmysql80:3306)宕机 
[root@hostmysql80 ~]# systemctl stop mysqld.service
--查看集群的复制拓补状态 主库(hostmysql80:3306) 显示不可用, 但从库ok 
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |unknown|invalid|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID
--主库已变成downtimed状态(维护状态)
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|unknown|invalid|5.7.27-log|rw|ROW|>>,GTID,downtimed
--显示downtimed状态(维护状态)的情况
[root@hostmysql80 orchestrator]# orchestrator-client -c downtimed
hostmysql80:3306
--分析复制拓补中存在的问题 主库(hostmysql80:3306) 死掉了 而且没有所属从库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
hostmysql80:3306 (cluster hostmysql80:3306): DeadMasterWithoutSlaves


--通过任一从库 查看集群的复制拓补状态,发现hostmysql61:3306已变成新主库
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

--查看新主库的状态(hostmysql61:3306),因为参数设置是 "ApplyMySQLPromotionAfterMasterFailover": true,   所以对新主库执行了 reset slave all 和 set read_only=0 。
mysql> show slave status\\G
Empty set (0.00 sec)

mysql> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.00 sec)

--以上的自动恢复流程 查看/tmp/recovery.log日志, 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
OnFailureDetectionProcesses
OnFailureDetectionProcesses
PreFailoverProcesses                      #在执行恢复操作之前立即执行
PostMasterFailoverProcesses               #在主恢复成功结束时执行
PostFailoverProcesses                     #在任何成功恢复结束时执行

[root@hostmysql61 ~]# more /tmp/recovery.log
Detected UnreachableMaster on hostmysql80:3306. Affected replicas: 2
Detected DeadMaster on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMaster on hostmysql80:3306
Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql61:3306
(for all types) Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql61:3306



--恢复主库(192.168.56.80:3306)
[root@hostmysql80 ~]# systemctl start mysqld.service

--再次分析复制拓补已无问题,但hostmysql80:3306成为了独立的库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|0s|ok|5.7.27-log|rw|ROW|>>,GTID

--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ,显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually

--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306

--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库,设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;

mysql> stop slave;

mysql> reset slave all;

mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61', 
              MASTER_PORT=3306,
              MASTER_USER='repl', 
              MASTER_PASSWORD='replPwd8!', 
              MASTER_AUTO_POSITION=1,
              MASTER_CONNECT_RETRY=1, 
              MASTER_RETRY_COUNT=86400, 
              MASTER_HEARTBEAT_PERIOD=2;

mysql> start slave;


--查看集群的复制拓补状态,hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

--7-2、优雅的提升主库(主从切换)

--优雅的提升一个从库为新主库,在提升新主库之前 会自动把旧主库设置 read_only=1,然后把旧主库变为新主库的从库(维护状态 需要手工start slave拉起)。如果参数"ApplyMySQLPromotionAfterMasterFailover": true,   那么对新主库执行 reset slave all 和 set read_only=0(但要注意 /etc/my.cnf参数文件并没有更改) 。  

--查看集群的复制拓补状态 都正常的。 hostmysql61:3306为主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID


--优雅的提升hostmysql80:3306为主库,hostmysql61:3306为其从库
[root@hostmysql80 ~]# orchestrator-client -c graceful-master-takeover -a hostmysql61:3306 -d hostmysql80:3306
hostmysql80:3306


--查看集群的复制拓补状态 hostmysql80:3306已提升为主库,  hostmysql61:3306变成了从库(downtimed 维护状态)
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql80:3306  |  0s|ok            |5.7.27-log|rw|ROW|>>,GTID          
+ hostmysql60:3306|  0s|ok            |5.7.27-log|ro|ROW|>>,GTID          
- hostmysql61:3306|null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID,downtimed


--以上优雅提升主库的流程 查看/tmp/recovery.log日志, 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
PreGracefulTakeoverProcesses
OnFailureDetectionProcesses
PreFailoverProcesses
PostMasterFailoverProcesses
PostFailoverProcesses
PostGracefulTakeoverProcesses

[root@hostmysql80 ~]# more /tmp/recovery.log
Planned takeover about to take place on hostmysql61:3306. Master will switch to read_only
Detected DeadMaster on hostmysql61:3306. Affected replicas: 1
Will recover from DeadMaster on hostmysql61:3306
Recovered from DeadMaster on hostmysql61:3306. Failed: hostmysql61:3306; Promoted: hostmysql80:3306
(for all types) Recovered from DeadMaster on hostmysql61:3306. Failed: hostmysql61:3306; Successor: hostmysql80:3306
Planned takeover complete


--手工拉起hostmysql61库的复制
mysql> start slave;


--查看集群的复制拓补状态,hostmysql61的复制已正常
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

 

--7-3、手工恢复

--在自动恢复禁止或者阻塞时,可以手工恢复复制集群。 流程大体和自动恢复一致  只多了一个手动恢复。

--设置自动恢复禁止,需要在orchestrator.conf.json 参数配置文件 设置如下参数:
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPeriodBlockSeconds": 3600,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": ["_auto_failure_"],                    //匹配不上即可
  "RecoverIntermediateMasterClusterFilters": ["_auto_failure_"],        //匹配不上即可


--主库宕机
[root@hostmysql80 orchestrator]# systemctl stop mysqld.service

--查看集群的复制拓补状态,集群挂了(因为没有自动恢复 所以会一直挂起)
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql80:3306  |unknown|invalid       |5.7.27-log|rw|ROW|>>,GTID
- hostmysql60:3306|   null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID
- hostmysql61:3306|   null|nonreplicating|5.7.27-log|ro|ROW|>>,GTID


--对失败的实例 执行手动恢复
[root@hostmysql80 orchestrator]# orchestrator-client -c recover -i hostmysql80:3306
hostmysql61:3306

--查看集群的复制拓补状态,已把失败的实例(主库)剔除集群,提升hostmysql61:3306为主库,复制集群恢复正常
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID 

--以上的手动恢复流程 查看/tmp/recovery.log日志, 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数
OnFailureDetectionProcesses
OnFailureDetectionProcesses
PreFailoverProcesses                      #在执行恢复操作之前立即执行
PostMasterFailoverProcesses               #在主恢复成功结束时执行
PostFailoverProcesses                     #在任何成功恢复结束时执行

[root@hostmysql80 orchestrator]# more /tmp/recovery.log
Detected UnreachableMaster on hostmysql80:3306. Affected replicas: 2
Detected DeadMasterAndSomeSlaves on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMasterAndSomeSlaves on hostmysql80:3306
Recovered from DeadMasterAndSomeSlaves on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql60:3306
(for all types) Recovered from DeadMasterAndSomeSlaves on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql60:3306





--恢复主库(192.168.56.80:3306)
[root@hostmysql80 ~]# systemctl start mysqld.service

--再次分析复制拓补已无问题,但hostmysql80:3306成为了独立的库
[root@hostmysql80 orchestrator]# orchestrator-client -c replication-analysis
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql60:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306|0s|ok|5.7.27-log|rw|ROW|>>,GTID

--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ,显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually

--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306

--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库,设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;

mysql> stop slave;

mysql> reset slave all;

mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61',
              MASTER_PORT=3306,
              MASTER_USER='repl',
              MASTER_PASSWORD='replPwd8!',
              MASTER_AUTO_POSITION=1,
              MASTER_CONNECT_RETRY=1,
              MASTER_RETRY_COUNT=86400,
              MASTER_HEARTBEAT_PERIOD=2;

mysql> start slave;


--查看集群的复制拓补状态,hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

 

--7-4、手工强制恢复

--不管orchestrator的任何想法,强制进行故障转移

--查看集群的复制拓补状态,hostmysql80是复制集群的主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql80:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql61:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID


--手工强制让hostmysql61:3306变成主库,忽略orchestrator的想法
[root@hostmysql80 ~]# orchestrator-client -c force-master-failover -i hostmysql61:3306
hostmysql61:3306

--查看之前的旧主库hostmysql80:3306  变成了独立的库。
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306  
hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
--新主库hostmysql61:3306 是集群复制的主库
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID



--以上的手动恢复流程 查看/tmp/recovery.log日志, 参考《Mysql主从复制08_高可用+orchestrator_2_参数配置文件》章节中的参数

OnFailureDetectionProcesses
PreFailoverProcesses
PostMasterFailoverProcesses
PostFailoverProcesses

[root@hostmysql60 ~]# more /tmp/recovery.log    
Detected DeadMaster on hostmysql80:3306. Affected replicas: 2
Will recover from DeadMaster on hostmysql80:3306
Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Promoted: hostmysql61:3306
(for all types) Recovered from DeadMaster on hostmysql80:3306. Failed: hostmysql80:3306; Successor: hostmysql61:3306




--把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库 ,显示太复杂 需要手工加入集群
[root@hostmysql80 orchestrator]# orchestrator-client -c relocate -i hostmysql80:3306 -d hostmysql61:3306
2020-04-08 18:11:33 ERROR Relocating hostmysql80:3306 below hostmysql61:3306 turns to be too complex; please do it manually

--根据复制规则检查hostmysql80:3306是否可以从hostmysql61:3306复制
[root@hostmysql80 orchestrator]# orchestrator-client -c can-replicate-from -i hostmysql80:3306 -d hostmysql61:3306
hostmysql61:3306

--手动把hostmysql80:3306库加入到复制集群中 作为hostmysql61:3306的从库
--进入hostmysql80:3306 库,设置read_only = 1;(包括/etc/my.cnf参数文件) 和 reset slave all
mysql> set global read_only = 1;

mysql> stop slave;

mysql> reset slave all;

mysql> CHANGE MASTER TO MASTER_HOST='hostmysql61',
              MASTER_PORT=3306,
              MASTER_USER='repl',
              MASTER_PASSWORD='replPwd8!',
              MASTER_AUTO_POSITION=1,
              MASTER_CONNECT_RETRY=1,
              MASTER_RETRY_COUNT=86400,
              MASTER_HEARTBEAT_PERIOD=2;

mysql> start slave;


--查看集群的复制拓补状态,hostmysql80加入了复制集群中
[root@hostmysql80 orchestrator]# orchestrator-client -c topology-tabulated -i  hostmysql80:3306
hostmysql61:3306  |0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

 

--7-5、从库1(192.168.56.60:3306) 宕机

--从库1(192.168.56.60:3306) 宕掉
[root@hostmysql61 system]# systemctl stop mysqld.service

--查看集群的复制拓补状态 从库1(192.168.56.60:3306) 显示不可用
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |     0s|ok     |5.7.27-log|rw|ROW|>>,GTID
- hostmysql60:3306|unknown|invalid|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|     0s|ok     |5.7.27-log|ro|ROW|>>,GTID

--恢复从库1(192.168.56.60:3306)
[root@hostmysql61 ~]# systemctl start mysqld.service

--从库会自动加入orchestrator监控中
[root@hostmysql80 ~]# orchestrator-client -c topology-tabulated -i  hostmysql61:3306
hostmysql61:3306  |0s|ok|5.7.27-log|rw|ROW|>>,GTID
+ hostmysql60:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID
+ hostmysql80:3306|0s|ok|5.7.27-log|ro|ROW|>>,GTID

 

附:启动orchestrator时 报错 及 解决

问题1:2020-04-07 10:01:11 ERROR ReadTopologyInstance(192.168.56.80:3306) show variables like 'maxscale%': QueryRowsMap unexpected error: runtime error: invalid memory address or nil pointer dereference
解决1:配置文件设置 "SkipMaxScaleCheck": true,        


问题2:2020-04-07 10:12:42 ERROR ReadTopologyInstance(192.168.56.80:3306) show global status like 'Uptime': TLS requested but server does not support TLS
解决2:配置文件设置 "MySQLTopologyUseMixedTLS": false,



问题3:2020-04-08 09:45:34 ERROR ReadTopologyInstance(hostmysql80:3306) show slave hosts: ReadTopologyInstance(hostmysql80:3306) 'show slave hosts' returned row with <host,port>: <,3306>
解决3:配置文件设置 "DiscoverByShowSlaveHosts": true,   从库添加 report_host


--主库查看 无host
mysql> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                           |
+-----------+------+------+-----------+--------------------------------------+
|   2223306 |      | 3306 |   1113306 | c959bdb9-f94c-11e9-b3b8-0800277143f5 |
|   3333306 |      | 3306 |   1113306 | 78a79555-758e-11ea-a2cc-080027c2dbc3 |
+-----------+------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)

--两个从库分别添加
report_host=192.168.56.61  /60
report_port=3306

--主库再次查看有host信息,无报错
mysql> show slave hosts;
+-----------+---------------+------+-----------+--------------------------------------+
| Server_id | Host          | Port | Master_id | Slave_UUID                           |
+-----------+---------------+------+-----------+--------------------------------------+
|   2223306 | 192.168.56.61 | 3306 |   1113306 | c959bdb9-f94c-11e9-b3b8-0800277143f5 |
|   3333306 | 192.168.56.60 | 3306 |   1113306 | 78a79555-758e-11ea-a2cc-080027c2dbc3 |
+-----------+---------------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)

 

以上是关于Mysql高可用—5.7 gtid 主从 + orchestrator 配置及简单测试的主要内容,如果未能解决你的问题,请参考以下文章

Mysql高可用—5.7 gtid 主从 + orchestrator 配置及简单测试

MHA+ProxySQL,mysql的高可用集群+读写分离GTID主从复制

4 MySQL 5.7 高可用

maxscale配合MHA搭建读写分离的高可用架构(基于GTID replication主从架构,mysql5.6)

mysql集群-基于GTID的主从复制

mysql 5.7 主从同步 gtid