MySQL学习笔记14分组复制的部署之单主模式的部署及故障恢复
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MySQL学习笔记14分组复制的部署之单主模式的部署及故障恢复相关的知识,希望对你有一定的参考价值。
1. 单主模式的部署步骤
目标:部署一个有3台主机的单主模式的mysql分组。
Primary:192.168.197.110。
Secondary:192.168.197.111。
Secondary:192.168.197.112。
MySQL端口:3306,MySQL分组复制端口:33061。
(1)主机192.168.197.110上的操作。
按照如下步骤完成MySQL配置,并且将自己加入到分组中,最后启动分组。
(a)修改MySQL配置。
修改MySQL服务的配置文件。
[mysqld]
server_id=110
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "192.168.197.110:33061"
loose-group_replication_group_seeds="192.168.197.110:33061,192.168.197.110:33061,192.168.197.110:33061"
loose-group_replication_bootstrap_group= off
其中,transaction_write_set_extraction表示需要为每个事务收集写集合并使用XXHASH64算法编码。
loose-group_replication_group_name表示本MySQL服务将加入的分组的名称。
loose-group_replication_start_on_boot表示在MySQL服务启动时是否自动启动分组复制服务。
loose-group_replication_local_address表示分组复制服务的本地监听地址。
loose-group_replication_group_seeds表示分组复制服务的种子成员,当本MySQL服务需要联系MySQL分组时,应当与这些种子成员联系。
loose-group_replication_bootstrap_group表示是否应当启动MySQL分组。在一个分组中,只应当允许一个成员启动MySQL分组。
(b)添加分组复制用户。
mysql>SET SQL_LOG_BIN=0
Query OK, 0 rows affected (0.03 sec)
mysql> create user ‘repl‘@‘%.coe2coe.me‘ identified by ‘123456‘;
Query OK, 0 rows affected (0.03 sec)
mysql> grant replication slave on *.* to ‘repl‘@‘%‘;
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)
mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.03 sec)
(c)建立分组复制关系。
使用MySQL分组复制专门的复制通道,建立复制关系。
mysql> CHANGE MASTER TO MASTER_USER=‘repl‘, MASTER_PASSWORD=‘123456‘ FOR CHANNEL ‘group_replication_recovery‘;
Query OK, 0 rows affected, 2 warnings (0.04 sec)
(d)安装分组复制插件。
mysql> INSTALL PLUGIN group_replication SONAME ‘group_replication.so‘;
Query OK, 0 rows affected (0.17 sec)
安装完成后使用show plugins;命令查看已安装的插件,应该可以看到有一个插件:
| group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |
(e)启动分组和分组复制服务。
启动分组
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec)
启动分组复制服务:
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (2.17 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)
至此,MySQL分组已经启动。可以通过以下命令查看分组成员,应该有一个在线成员。
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c110 | 110.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
1 row in set (0.00 sec)
(2)主机192.168.197.111上的操作。
对该主机的MySQL服务进行相似的配置,将该主机上的MySQL服务加入到MySQL分组中。
区别在于192.168.197.110上启动了MySQL分组以及分组复制服务,而其它主机上仅仅需要启动分组复制服务,而不能再次启动MySQL分组。
mysql> start group_replication;
Query OK, 0 rows affected (4.94 sec)
正常情况下,可以看到110和111两台主机上的MySQL都已经处于在线状态。
mysql> SELECT * FROM performance_schema.replication_group_members;
No connection. Trying to reconnect...
Connection id: 16
Current database: mysql
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c110 | 110.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c111 | 111.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
2 rows in set (0.00 sec)
在执行start group_replication时,有时会出错,或者长时间处于Recoverying状态并最终变成ERROR状态。查看MySQL的log,可以看到以下的错误:
2017-08-19T06:12:16.471125Z 15 [ERROR] Slave I/O for channel ‘group_replication_recovery‘: error connecting to master ‘[email protected]:3306‘ - retry-time: 60 retries: 1, Error_code: 2003
2017-08-19T06:12:16.471163Z 15 [Note] Slave I/O thread for channel ‘group_replication_recovery‘ killed while connecting to master
2017-08-19T06:12:16.471169Z 15 [Note] Slave I/O thread exiting for channel ‘group_replication_recovery‘, read up to log ‘FIRST‘, position 4
2017-08-19T06:12:16.471315Z 10 [ERROR] Plugin group_replication reported: ‘There was an error when connecting to the donor server. Check group replication recovery‘s connection credentials.‘
此时需要检查主机上的DNS解析是否正常,以及是否可以正常连接其它已经在线的MySQL服务。
(3)部署192.168.197.112节点。
按照与(2)中类似的方式进行设置和部署即可。
(4)验证部署结果。
部署完毕后,查看MySQL分组中的节点:
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c110 | 110.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c111 | 111.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c112 | 112.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
3 rows in set (0.00 sec)
三个节点均已经处于在线状态。
继续进行数据验证。
先在192.168.197.110(Primary)上创建数据库和数据表并插入数据:
mysql> create database mydb;
Query OK, 1 row affected (0.01 sec)
mysql> use mydb;
Database changed
mysql> create table test (name varchar(100) primary key);
Query OK, 0 rows affected (0.03 sec)
mysql> insert into test (name) values (‘001‘),(‘002‘),(‘003‘);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from test;
+------+
| name |
+------+
| 001 |
| 002 |
| 003 |
+------+
3 rows in set (0.00 sec)
然后在192.168.197.111(Secondary)和192.168.197.112(Secondary)上查看数据:
mysql> use mydb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from test;
+------+
| name |
+------+
| 001 |
| 002 |
| 003 |
+------+
3 rows in set (0.00 sec)
结果表明数据已经正确的复制到了其余两个节点上了。
至此,MySQL分组复制体系部署成功。
两个Secondary节点已经自动被设置为read_only模式了。
mysql> show variables like ‘%read_only%‘;
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_read_only | OFF |
| read_only | ON |
| super_read_only | ON |
| tx_read_only | OFF |
+------------------+-------+
4 rows in set (0.01 sec)
而Primary节点仍然为可读写模式。
mysql> show variables like ‘%read_only%‘;
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_read_only | OFF |
| read_only | OFF |
| super_read_only | OFF |
| tx_read_only | OFF |
+------------------+-------+
4 rows in set (0.01 sec)
1.2. 单主模式的Primary故障自动切换
在基于单主模式的MySQL分组复制体系正常运行时,将Primary节点停止服务,此时分组复制体系中将自动选举一个新的节点作为Primary。
故障发生之前的在线状态列表:
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c110 | 110.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c111 | 111.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c112 | 112.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
3 rows in set (0.00 sec)
在192.168.197.110(Primary)上执行如下命令使得Primary停止服务:
mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)
此时在192.168.197.111(Secondary)上查看在线状态列表:
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c111 | 111.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c112 | 112.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
2 rows in set (0.00 sec)
查看111的系统变量:
mysql> show variables like ‘%read_only%‘;
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_read_only | OFF |
| read_only | OFF |
| super_read_only | OFF |
| tx_read_only | OFF |
+------------------+-------+
4 rows in set (0.00 sec)
查看目前的Primary节点:
mysql> SELECT VARIABLE_NAME,VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME like ‘group_replication%‘;
+----------------------------------+--------------------------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | a2392929-6dfb-11e7-b294-000c29b1c111 |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)
查看111的日志记录:
2017-08-19T07:16:51.519461Z 0 [Note] Plugin group_replication reported: ‘getstart group_id 4317e324‘
2017-08-19T07:16:52.634010Z 0 [Note] Plugin group_replication reported: ‘Unsetting super_read_only.‘
2017-08-19T07:16:52.634129Z 98 [Note] Plugin group_replication reported: ‘A new primary was elected, enabled conflict detection until the new primary applies all relay logs‘
查看112的日志记录:
2017-08-19T07:00:43.407749Z 0 [Note] Plugin group_replication reported: ‘getstart group_id 4317e324‘
2017-08-19T07:00:45.570407Z 0 [Note] Plugin group_replication reported: ‘Marking group replication view change with view_id 15031230388599955:19‘
2017-08-19T07:00:45.849564Z 0 [Note] Plugin group_replication reported: ‘The member with address 111.coe2coe.me:3306 was declared online within the replication group‘
2017-08-19T07:16:50.830199Z 0 [Note] Plugin group_replication reported: ‘getstart group_id 4317e324‘
2017-08-19T07:16:51.944908Z 0 [Note] Plugin group_replication reported: ‘Setting super_read_only.‘
2017-08-19T07:16:51.945033Z 33 [Note] Plugin group_replication reported: ‘A new primary was elected, enabled conflict detection until the new primary applies all relay logs‘
综合上述信息,说明此时192.168.197.111已经成为Primary节点了:
(a)111已经变成可读写状态。
(b)111的在线状态发生了变化,尽管仍然是ONLINE状态。
此时,如果再次启动曾经的Primary(192.168.197.110)节点,则该节点的位置比较尴尬,即使仍然作为启动分组的节点启动,观察到的结果也是该节点所在的分组与目前已经存在的分组不是同一个分组。
重启110后110所在的分组:
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c110 | 110.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
1 row in set (0.00 sec)
原来的分组仍然只有2个节点成员在线:
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c111 | 111.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c112 | 112.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
2 rows in set (0.00 sec)
在发生这种情况后,如果只是希望110节点以Secondary方式加入原来的分组,则可以按照以下方式在110上进行操作:
110原来是是bootstrap_group方式启动的,需要屏蔽掉。
mysql> set @@global.group_replication_bootstrap_group=off;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c110 | 110.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
1 row in set (0.00 sec)
mysql> stop group_replication;
Query OK, 0 rows affected (9.51 sec)
mysql> start group_replication;
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.
出现这个错误的原因是110上出现了新的Primary(111)上没有的数据,此时使用以下命令直接忽略。不忽略则不能加入到原来的分组中。
mysql> set @@global.group_replication_allow_local_disjoint_gtids_join=1;
Query OK, 0 rows affected (0.00 sec)
mysql> start group_replication;
Query OK, 0 rows affected (3.18 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c110 | 110.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c111 | 111.coe2coe.me | 3306 | ONLINE |
| group_replication_applier | a2392929-6dfb-11e7-b294-000c29b1c112 | 112.coe2coe.me | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
3 rows in set (0.00 sec)
至此,在模拟故障恢复之后的110已经成功加入到原来的分组中了。
以上是关于MySQL学习笔记14分组复制的部署之单主模式的部署及故障恢复的主要内容,如果未能解决你的问题,请参考以下文章