专职DBA-MySQL主从延迟复制
Posted zhouwanchun
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了专职DBA-MySQL主从延迟复制相关的知识,希望对你有一定的参考价值。
专职DBA-mysql主从延迟复制 本次实验环境延用MySQL主从异步复制的搭建环境 mysql集群企业级架构方案 1.根据对数据库的访问请求实现读写分离(读从写主) 2.根据不同的业务拆分多个从库以提供访问 一主五从 3从提供外部用户读请求访问(读写分离、LVS负载均衡) 1从用于内部用户读访问(业务后台、数据分析、搜索业务、财务统计、定时任务、开发查询等) 1从用于数据库定时全备份,以及增量备份(开启binlog) 3.实现对主库的高可用 (1).heartbeat+dbrd+mysql方案 通过dbrd工具对主数据库服务器实现基于block的异机物理复制,类似于网络RAID1. 优点:速度很快。 缺点:不能被访问,除非主节点宕机,备节点才可以提供访问。 (2).mysql-MMM(Master-Master replication Manager)方案 通过mysql的replication实现主主之间的数据同步。 优点:可以实现slaves负载均衡。 缺点:MMM无法完全保持数据的一致性。 (3).mysql-MHA(Master High Availability)+keepalived方案 通过mysql的replication实现数据库服务器之间的数据同步。 优点:同时可以实现从库负载均衡,主库宕机后自动选择最优的从库,将其切换为主库。 并尽最大的努力对有所有的库做数据补全操作,一直到最新。 并对其他从库和新主库实现复制,再加上keepalived是为了实现vip漂移。 (4).PXC (5).共享存储方案 (6).数据库分布式部署方案 (7).MGR mysql企业级备份策略方案 1.利用mysql主从复制的从库进行数据备份策略 (1).选择一个不对外提供服务的从库,专门做数据备份用。 (2).开启从库的binlog功能。 (3).数据量小于30GB用mysqldump逻辑备份; 数据库大于30GB用Xtrabackup物理热备工具。 mysql主从复制生产环境的常见延迟原因 易导致复制延迟的原因: 1.一个主库的从库太多 2.从库硬件比主库查 3.慢sql语句过多 4.主从复制的设计问题 5.主从复制之间的网络延迟 6.主库读写压力太大 mysql主从复制数据一致性企业级方案 1.采用半同步复制方案 2.当复制发生延迟时让程序改读主库 mysql多线程复制解决复制延迟实践 [[email protected] ~]# mysqld --defaults-file=/data/mysql/3306/my.cnf & [[email protected] ~]# mysqld --defaults-file=/data/mysql/3306/my.cnf & (1).查看当前slave服务器的SQL线程状态 [[email protected] ~]# mysql -S /data/mysql/3306/mysql.sock -p Enter password: Slave [(none)]> show processlist; +----+-------------+-----------+------+---------+------+--------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------------+-----------+------+---------+------+--------------------------------------------------------+------------------+ | 1 | system user | | NULL | Connect | 49 | Waiting for master to send event | NULL | | 2 | system user | | NULL | Connect | 48 | Slave has read all relay log; waiting for more updates | NULL | | 4 | root | localhost | NULL | Query | 0 | starting | show processlist | +----+-------------+-----------+------+---------+------+--------------------------------------------------------+------------------+ 3 rows in set (0.00 sec) (2).检查多线程的参数配置 默认为0表示单线程复制 Slave [(none)]> show variables like "%parallel%"; +------------------------+----------+ | Variable_name | Value | +------------------------+----------+ | slave_parallel_type | DATABASE | | slave_parallel_workers | 0 | +------------------------+----------+ 2 rows in set (0.01 sec) (3).停止主从复制,在线修改线程数 Slave [(none)]> stop slave; Query OK, 0 rows affected (0.00 sec) Slave [(none)]> set global slave_parallel_workers = 4; Query OK, 0 rows affected (0.00 sec) Slave [(none)]> show variables like "%parallel%"; +------------------------+----------+ | Variable_name | Value | +------------------------+----------+ | slave_parallel_type | DATABASE | | slave_parallel_workers | 4 | +------------------------+----------+ 2 rows in set (0.00 sec) (4).启动主从复制,查看SQL线程数 Slave [(none)]> start slave; Query OK, 0 rows affected (0.04 sec) Slave [(none)]> show processlist; +----+-------------+-----------+------+---------+------+--------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------------+-----------+------+---------+------+--------------------------------------------------------+------------------+ | 4 | root | localhost | NULL | Query | 0 | starting | show processlist | | 5 | system user | | NULL | Connect | 28 | Waiting for master to send event | NULL | | 6 | system user | | NULL | Connect | 28 | Slave has read all relay log; waiting for more updates | NULL | | 7 | system user | | NULL | Connect | 28 | Waiting for an event from Coordinator | NULL | | 8 | system user | | NULL | Connect | 28 | Waiting for an event from Coordinator | NULL | | 9 | system user | | NULL | Connect | 28 | Waiting for an event from Coordinator | NULL | | 10 | system user | | NULL | Connect | 28 | Waiting for an event from Coordinator | NULL | +----+-------------+-----------+------+---------+------+--------------------------------------------------------+------------------+ 7 rows in set (0.00 sec) (5).想永久生效就写入my.cnf [[email protected] ~]# vim /data/mysql/3306/my.cnf [mysqld] slave_parallel_workers = 4 让mysql主从复制的从库只读访问 1.read-only参数允许数据库更新的条件 (1).具有super权限的用户可以更新,不受read-only参数影响。例如:root (2).来自从服务器具备主从复制权限的线程可以更新,不受read-only参数的影响。例如:rep 2.如何配置read-only参数 (1).启动数据库时直接带--read-only参数启动。 mysqld_safe --read-only --user=mysql & (2).在my.cnf文件中配置 [[email protected] ~]# vim /data/mysql/3306/my.cnf [mysqld] read-only 然后重启数据库 mysqladmin -S /data/mysql/3306/mysql.sock -p shutdown mysqld --defaults-file=/data/mysql/3306/my.cnf & mysql主从复制读写分离Web用户生产设置方案 在配置好mysql主从复制,并实现了读写分离以后,数据库授权程序访问的用户设置方法: 1.主库和从库使用不同的用户,授予不同的权限。 主库上对web_w用户的授权 grant select,insert,update,delete on `web`.* to ‘web_w‘@‘192.168.10.%‘ identified by ‘123‘; 从库上对web_r用户的授权 grant select on `web`.* to ‘web_r‘@‘192.168.10.%‘ identified by ‘123‘; 2.网站程序访问主库和从库时使用一套用户密码。 (1).主库和从库使用相同的用户,但授予不同的权限。 忽略主库的mysql授权库同步 [[email protected] ~]# vim /data/mysql/3306/my.cnf binlog-ignore-db = mysql #mysql库不记录binlog日志 replicate-ignore-db = mysql #忽略复制mysql库 在主库上创建完web用户和权限之后,在从库上revoke回收对应的更新权限 主库:grant select,insert,update,delete on `web`.* to ‘web‘@‘192.168.10.%‘ identified by ‘123‘; 从库:grant select on `web`.* to ‘web‘@‘192.168.10.%‘ identified by ‘123‘; 在从库上设置read-only参数,让从库只读 [[email protected] ~]# vim /data/mysql/3306/my.cnf [mysqld] read-only 然后重启数据库 mysqladmin -S /data/mysql/3306/mysql.sock -p shutdown mysqld --defaults-file=/data/mysql/3306/my.cnf & mysql主从延迟复制方案及恢复实践 [[email protected] ~]# mysql -S /data/mysql/3306/mysql.sock -p Enter password: Slave [(none)]> stop slave; Query OK, 0 rows affected (0.01 sec) Slave [(none)]> change master to master_delay = 60; Query OK, 0 rows affected (0.01 sec) Slave [(none)]> start slave; Query OK, 0 rows affected (0.02 sec) Slave [(none)]> show slave status\G *************************** 1. row *************************** SQL_Delay: 60 #延迟60秒进行复制 SQL_Remaining_Delay: NULL #还剩多少秒执行复制 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates #sql线程状态 [[email protected] ~]# mysql -S /data/mysql/3306/mysql.sock -p Enter password: Master [(none)]> create database app; Query OK, 1 row affected (0.00 sec) Master [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app | | mysql | | performance_schema | | shenzhen | | sys | +--------------------+ 6 rows in set (0.00 sec) Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | shenzhen | | sys | +--------------------+ 5 rows in set (0.01 sec) 但是中继日志里面已经有创建的语句了,说明IO线程还是实时在工作的。 [[email protected] ~]# cd /data/mysql/3306/data/ [[email protected] /data/mysql/3306/data]# ls -l total 122952 -rw-r----- 1 mysql mysql 56 Jul 15 05:52 auto.cnf -rw-r----- 1 mysql mysql 206 Jul 16 01:43 db02-relay-bin.000001 -rw-r----- 1 mysql mysql 476 Jul 16 01:47 db02-relay-bin.000002 -rw-r----- 1 mysql mysql 48 Jul 16 01:43 db02-relay-bin.index -rw-r----- 1 mysql mysql 599 Jul 15 06:54 ib_buffer_pool -rw-r----- 1 mysql mysql 12582912 Jul 16 01:23 ibdata1 -rw-r----- 1 mysql mysql 50331648 Jul 16 01:23 ib_logfile0 -rw-r----- 1 mysql mysql 50331648 Jul 15 05:52 ib_logfile1 -rw-r----- 1 mysql mysql 12582912 Jul 16 01:23 ibtmp1 -rw-r----- 1 mysql mysql 122 Jul 16 01:48 master.info drwxr-x--- 2 mysql mysql 4096 Jul 15 06:35 mysql drwxr-x--- 2 mysql mysql 8192 Jul 15 05:52 performance_schema -rw-r----- 1 mysql mysql 59 Jul 16 01:43 relay-log.info drwxr-x--- 2 mysql mysql 48 Jul 15 06:50 shenzhen drwxr-x--- 2 mysql mysql 8192 Jul 15 05:52 sys -rw-r----- 1 mysql mysql 84 Jul 16 01:43 worker-relay-log.info.1 -rw-r----- 1 mysql mysql 84 Jul 16 01:43 worker-relay-log.info.2 -rw-r----- 1 mysql mysql 84 Jul 16 01:43 worker-relay-log.info.3 -rw-r----- 1 mysql mysql 84 Jul 16 01:43 worker-relay-log.info.4 [[email protected] /data/mysql/3306/data]# mysqlbinlog db02-relay-bin.000002 SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; create database app /*!*/; 过了1分钟后 Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app | | mysql | | performance_schema | | shenzhen | | sys | +--------------------+ 6 rows in set (0.00 sec) mysql的延迟复制实际上影响的只是SQL线程将数据应用到从库。 而IO线程早已把主库更新的数据写到了从库的中继日志里面。 因此,在延迟复制期间,即使主库宕机了,从库到了延迟复制的时间,也依然会把数据更新到与主库宕机时一致。 使用mysql主从延迟复制进行数据恢复实践 1.模拟环境,将从库延迟调整为3600秒 [[email protected] ~]# mysql -S /data/mysql/3306/mysql.sock -p Enter password: Slave [(none)]> stop slave; Query OK, 0 rows affected (0.01 sec) [[email protected] ~]# mysql -u root -p -S /application/mysql/tmp/mysql.sock Enter password: mysql> stop slave ; Query OK, 0 rows affected (0.00 sec) Slave [(none)]> change master to master_delay = 3600; Query OK, 0 rows affected (0.02 sec) Slave [(none)]> start slave; Query OK, 0 rows affected (0.03 sec) Slave [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.10.11 Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 350 Relay_Log_File: db02-relay-bin.000002 Relay_Log_Pos: 320 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 350 Relay_Log_Space: 526 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 113306 Master_UUID: 7c145945-a680-11e9-baea-000c29a14cf7 Master_Info_File: /data/mysql/3306/data/master.info SQL_Delay: 3600 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: 7c145945-a680-11e9-baea-000c29a14cf7:1-4 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) 2.模拟在主库写入数据,每隔5秒写入一个库,就当是模拟用户写入数据了 for n in 01..05 do mysql -S /data/mysql/3306/mysql.sock -p123 -e "create database app$n;" sleep 5 done [[email protected] ~]# for n in 01..05 > do > mysql -S /data/mysql/3306/mysql.sock -p123 -e "create database app$n;" > sleep 5 > done mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. [[email protected] ~]# 3.模拟人为破坏数据,也可以是不带where的update语句。 Master [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app | | app01 | | app02 | | app03 | | app04 | | app05 | | mysql | | performance_schema | | shenzhen | | sys | +--------------------+ 11 rows in set (0.00 sec) 删除oldboy5数据库,后面要做的就是把这个数据库恢复回来,别的数据还得保留。 Master [(none)]> drop database app05; Query OK, 0 rows affected (0.01 sec) Master [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app | | app01 | | app02 | | app03 | | app04 | | mysql | | performance_schema | | shenzhen | | sys | +--------------------+ 10 rows in set (0.00 sec) Master [(none)]> drop database app05; Query OK, 0 rows affected (0.01 sec) Master [(none)]> show databases like "%app%"; +------------------+ | Database (%app%) | +------------------+ | app | | app01 | | app02 | | app03 | | app04 | +------------------+ 5 rows in set (0.00 sec) 现在,所有的从库都已经是坏数据了,只有延迟从库是好的,但是是一个小时之前的数据。 4.当数据库出现误删数据的情况时,特别是update不加条件破坏数据,要想完整恢复数据,最好选择对外停止访问措施,此时需要牺牲用户体验了。 [[email protected] ~]# iptables -I INPUT -p tcp --dport 3306 ! -s 192.168.10.13 -j DROP 非192.168.10.13禁止访问数据库3306端口,11是主库IP,192.168.10.13为远程连接ssh客户端的IP。 5.登录主库从库查看binlog发送接收进行确认。 Master [(none)]> show processlist; +----+------+---------------------+------+-------------+------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+---------------------+------+-------------+------+---------------------------------------------------------------+------------------+ | 5 | root | localhost | NULL | Query | 0 | starting | show processlist | | 6 | rep | 192.168.10.12:39828 | NULL | Binlog Dump | 754 | Master has sent all binlog to slave; waiting for more updates | NULL | +----+------+---------------------+------+-------------+------+---------------------------------------------------------------+------------------+ 2 rows in set (0.00 sec) 从库 Slave [(none)]> show processlist; +----+-------------+-----------+------+---------+------+----------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------------+-----------+------+---------+------+----------------------------------------------------------------+------------------+ | 20 | root | localhost | NULL | Query | 0 | starting | show processlist | | 21 | system user | | NULL | Connect | 789 | Waiting for master to send event | NULL | | 22 | system user | | NULL | Connect | 228 | Waiting until MASTER_DELAY seconds after master executed event | NULL | | 23 | system user | | NULL | Connect | 789 | Waiting for an event from Coordinator | NULL | | 24 | system user | | NULL | Connect | 789 | Waiting for an event from Coordinator | NULL | | 25 | system user | | NULL | Connect | 789 | Waiting for an event from Coordinator | NULL | | 26 | system user | | NULL | Connect | 789 | Waiting for an event from Coordinator | NULL | +----+-------------+-----------+------+---------+------+----------------------------------------------------------------+------------------+ 7 rows in set (0.00 sec) Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app | | mysql | | performance_schema | | shenzhen | | sys | +--------------------+ 6 rows in set (0.00 sec) 6.在从库上停止主从复制,并查看数据库是否已同步过来。 Slave [(none)]> stop slave; Query OK, 0 rows affected (0.01 sec) Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app | | mysql | | performance_schema | | shenzhen | | sys | +--------------------+ 6 rows in set (0.00 sec) 因为还未到延迟时间,所以数据不会同步到该延迟从库。 7.根据relay-log.info记录的sql线程读取relay-log的位置,解析未应用到从库的relay-bin日志。 进入中继日志所在的目录 [[email protected] ~]# cd /data/mysql/3306/data/ [[email protected] /data/mysql/3306/data]# ls -l total 122952 drwxr-x--- 2 mysql mysql 20 Jul 16 01:48 app -rw-r----- 1 mysql mysql 56 Jul 15 05:52 auto.cnf -rw-r----- 1 mysql mysql 206 Jul 16 01:55 db02-relay-bin.000001 -rw-r----- 1 mysql mysql 1282 Jul 16 02:05 db02-relay-bin.000002 -rw-r----- 1 mysql mysql 48 Jul 16 01:55 db02-relay-bin.index -rw-r----- 1 mysql mysql 599 Jul 15 06:54 ib_buffer_pool -rw-r----- 1 mysql mysql 12582912 Jul 16 01:49 ibdata1 -rw-r----- 1 mysql mysql 50331648 Jul 16 01:49 ib_logfile0 -rw-r----- 1 mysql mysql 50331648 Jul 15 05:52 ib_logfile1 -rw-r----- 1 mysql mysql 12582912 Jul 16 01:23 ibtmp1 -rw-r----- 1 mysql mysql 123 Jul 16 02:09 master.info drwxr-x--- 2 mysql mysql 4096 Jul 15 06:35 mysql drwxr-x--- 2 mysql mysql 8192 Jul 15 05:52 performance_schema -rw-r----- 1 mysql mysql 61 Jul 16 02:09 relay-log.info drwxr-x--- 2 mysql mysql 48 Jul 15 06:50 shenzhen drwxr-x--- 2 mysql mysql 8192 Jul 15 05:52 sys -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.1 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.2 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.3 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.4 [[email protected] /data/mysql/3306/data]# ls -l *relay* -rw-r----- 1 mysql mysql 206 Jul 16 01:55 db02-relay-bin.000001 中继日志 -rw-r----- 1 mysql mysql 1282 Jul 16 02:05 db02-relay-bin.000002 中继日志 -rw-r----- 1 mysql mysql 48 Jul 16 01:55 db02-relay-bin.index 中继日志索引 -rw-r----- 1 mysql mysql 61 Jul 16 02:09 relay-log.info 线程读取中继日志的位置信息 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.1 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.2 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.3 -rw-r----- 1 mysql mysql 84 Jul 16 01:55 worker-relay-log.info.4 [[email protected] /data/mysql/3306/data]# cat relay-log.info 7 ./db02-relay-bin.000002 #SQL线程读取中继日志的文件名信息 320 #SQL线程读取中继日志的位置点信息 mysql-bin.000005 350 3600 4 1 8.解析SQL线程未解析的全部剩余relay-bin中继日志数据。 [[email protected] /data/mysql/3306/data]# mysqlbinlog --start-position=320 db02-relay-bin.000002 > /backup/sql/relay.sql [[email protected] /data/mysql/3306/data]# mysqlbinlog --skip-gtids --start-position=320 db02-relay-bin.000002 > /backup/sql/relay.sql [[email protected] ~]# ls -l /backup/sql/relay.sql -rw-r--r-- 1 root root 3688 Jul 16 02:15 /backup/sql/relay.sql 9.找到破坏数据库的SQL语句,并从已解析的SQL语句中将其删除掉,这里使用的是"drop database app05" [[email protected] ~]# egrep "drop database app05" /backup/sql/relay.sql drop database app05 [[email protected] ~]# sed -i ‘/drop database app05/d‘ /backup/sql/relay.sql [[email protected] ~]# egrep "drop database app05" /backup/sql/relay.sql [[email protected] ~]# egrep "^drop database app05" /backup/sql/relay.sql 10.将解析后并处理好的relay.sql数据文件恢复到延迟从库。 [[email protected] ~]# mysql -S /data/mysql/3306/mysql.sock -p < /backup/sql/relay.sql Enter password: Slave [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | app | | app01 | | app02 | | app03 | | app04 | | app05 | | mysql | | performance_schema | | shenzhen | | sys | +--------------------+ 11 rows in set (0.00 sec) 之前的删除的app05数据库已经恢复找回来了!!! 利用延迟从库恢复数据库完毕,此时还需要将此从库切换为主库,作为新主库对外提供用户访问。再对其他遭到破坏的主从数据库进行修复。 -------------------------------------------------------->OK Slave [(none)]> start slave; Query OK, 0 rows affected (0.01 sec) Master [shenzhen]> use shenzhen; Database changed Master [shenzhen]> insert into t1(id) values(2); Query OK, 1 row affected (0.01 sec) Master [shenzhen]> select * from t1; +------+ | id | +------+ | 1 | | 2 | +------+ 2 rows in set (0.00 sec) Slave [(none)]> stop slave; Query OK, 0 rows affected (0.02 sec) Slave [(none)]> change master to master_delay = 20; Query OK, 0 rows affected (0.01 sec) Slave [(none)]> start slave; Query OK, 0 rows affected (0.03 sec) Slave [(none)]> select * from shenzhen.t1; +------+ | id | +------+ | 1 | | 2 | +------+ 2 rows in set (0.00 sec)
以上是关于专职DBA-MySQL主从延迟复制的主要内容,如果未能解决你的问题,请参考以下文章