MySQL GTID复制报错跳过方法

Posted 码城

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MySQL GTID复制报错跳过方法相关的知识,希望对你有一定的参考价值。

gtid复制

更详细的验证步骤,详见

mysql主备复制集群跳过错误方法(GTID) - 码到城攻MySQL主备复制集群跳过错误方法(GTID),区分传统复制和GTID复制,复制方式不同,跳过错误的方式也不同。https://www.codecomeon.com/posts/192/


问题复现


重点验证下 `gtid` 复制报错,如何跳过,如下为一个正常 `gtid` 复制的表,在从实例上删除表,然后主实例中对表插入数据,以复现复制报错:

从:192.168.78.189 3306,上删除表 test:

    MySQL [gtid_test]> show tables;
    +---------------------+
    | Tables_in_gtid_test |
    +---------------------+
    | test                |
    +---------------------+
    1 row in set (0.00 sec)
    
    MySQL [gtid_test]> 
    MySQL [gtid_test]> drop table test;
    Query OK, 0 rows affected (0.01 sec)
    
    MySQL [gtid_test]> show tables;
    Empty set (0.00 sec)
    
    MySQL [gtid_test]>
主:192.168.78.188 3308 上对正在复制的表 test 插入 sql:

    MySQL [(none)]> show databases;
    +--------------------+
    | Database           |
    +--------------------+
    | gtid_test          |
    | information_schema |
    | mysql              |
    | ning               |
    | performance_schema |
    | sys                |
    +--------------------+
    6 rows in set (0.01 sec)
    
    MySQL [(none)]> use gtid_test;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    MySQL [gtid_test]> show tables;
    +---------------------+
    | Tables_in_gtid_test |
    +---------------------+
    | test                |
    +---------------------+
    1 row in set (0.00 sec)
    
    MySQL [gtid_test]> insert into test values (2);
    Query OK, 1 row affected (0.00 sec)
    
    MySQL [gtid_test]>


然后,在从上查看同步状态:

    MySQL [gtid_test]> show slave status\\G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for source to send event
                      Master_Host: 192.168.78.188
                      Master_User: gtid_copy
                      Master_Port: 3308
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000002
              Read_Master_Log_Pos: 667
                   Relay_Log_File: 67bf0b455536-relay-bin.000004
                    Relay_Log_Pos: 598
            Relay_Master_Log_File: mysql-bin.000002
                 Slave_IO_Running: Yes
                Slave_SQL_Running: No
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 1146
                       Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' at master log mysql-bin.000002, end_log_pos 636. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 382
                  Relay_Log_Space: 1140
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 1146
                   Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' at master log mysql-bin.000002, end_log_pos 636. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 188
                      Master_UUID: 1d9b633d-e1ae-11ec-8ece-0242ac110002
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: 
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 220602 07:47:50
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-15 
                Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-14, 
    f19b117a-cf6d-11ec-9fd7-0242ac110002:1 // 此为从节点删除表执行的事务
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
           Master_public_key_path: 
            Get_master_public_key: 0
                Network_Namespace: 
    1 row in set, 1 warning (0.00 sec)
    
    ERROR: No query specified
    
    MySQL [gtid_test]>


可以看出,复制已经报错,而:

    Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-15 
    Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-14, 


意思是,从主接收 事务 1 到 15,但是只成功执行了 1 到 14 ,15 并没有执行成功,如果 15 没有成功的前提下,我们发现,后续的复制,也没法在进行,如下,主节点在新增一张表:

    MySQL [gtid_test]> create table next_test (a int);
    Query OK, 0 rows affected (0.02 sec)
    
    MySQL [gtid_test]> 
    MySQL [gtid_test]> show tables;
    +---------------------+
    | Tables_in_gtid_test |
    +---------------------+
    | next_test           |
    | test                |
    +---------------------+
    2 rows in set (0.00 sec)
    
    MySQL [gtid_test]>


如下,可以看到,由于上一个 15 事务的失败,导致后续的 16 事务同样没有在从实例上执行:

    MySQL [gtid_test]> show slave status\\G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for source to send event
                      Master_Host: 192.168.78.188
                      Master_User: gtid_copy
                      Master_Port: 3308
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000002
              Read_Master_Log_Pos: 872
                   Relay_Log_File: 67bf0b455536-relay-bin.000004
                    Relay_Log_Pos: 598
            Relay_Master_Log_File: mysql-bin.000002
                 Slave_IO_Running: Yes
                Slave_SQL_Running: No
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 1146
                       Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' at master log mysql-bin.000002, end_log_pos 636. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 382
                  Relay_Log_Space: 1345
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 1146
                   Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' at master log mysql-bin.000002, end_log_pos 636. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 188
                      Master_UUID: 1d9b633d-e1ae-11ec-8ece-0242ac110002
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: 
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 220602 07:47:50
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16
                Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-14,
    f19b117a-cf6d-11ec-9fd7-0242ac110002:1
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
           Master_public_key_path: 
            Get_master_public_key: 0
                Network_Namespace: 
    1 row in set, 1 warning (0.01 sec)
    
    ERROR: No query specified
    
    MySQL [gtid_test]> 
    MySQL [gtid_test]> 
    MySQL [gtid_test]> show tables;
    Empty set (0.00 sec)
    
    MySQL [gtid_test]>


如下,15 16 号事务并未执行:

    Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16
    Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-14,


解决问题


方法一


我们验证一种常用的方法,在从节点上跳过错误事务:

1. 停止slave进程
        mysql> STOP SLAVE;
2. 设置事务号,事务号从 Retrieved_Gtid_Set 获取,在session里设置gtid_next,即跳过这个GTID
        mysql> SET @@SESSION.GTID_NEXT= '1d9b633d-e1ae-11ec-8ece-0242ac110002:15'
3. 设置空事物
        mysql> BEGIN; COMMIT;
4. 恢复自增事物号
        mysql> SET SESSION GTID_NEXT = AUTOMATIC;
5. 启动slave进程
        mysql> START SLAVE;


方法二
重置 master 方法跳过错误:

    mysql>  STOP SLAVE;
    mysql>  RESET MASTER;
    mysql>  SET @@GLOBAL.GTID_PURGED ='1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16'
    mysql>  START SLAVE;

上面这些命令的意思是,忽略 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16 这个 GTID 事务,下一次事务接着从 17 这个GTID开始,即可跳过上述错误。

以上是关于MySQL GTID复制报错跳过方法的主要内容,如果未能解决你的问题,请参考以下文章

MySQL跳过复制错误

在线建立或重做mysql主从复制架构方法(传统模式和GTID模式)

MySQL(二十三)GTID 复制

与MySQL传统复制相比,GTID都有哪些独特的复制姿势

MySQL5.7 传统复制到GTID复制

与MySQL传统复制相比,GTID有哪些独特的复制姿势