暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

MySQL主从报错-Error_code: 1205

原创 冯刚 2022-10-13
1480

前言

从库SQL线程报错:Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Lock wait timeout exceeded; try restarting transaction。

该案例可以定位到引起锁超时的事务以及语句,但是没定位到根源。kill会话和重启slave线程均会卡住,最后通过重启数据库实例解决。

1 环境信息

数据库信息:MySQL5.7.22

操作系统:CentOS Linux release 7.5

2 错误描述

早晨主从延迟告警,登录从库:

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: xxxxxxxxx
                  Master_User: xxxxxxx
                  Master_Port: 3306
                Connect_Retry: 1
              Master_Log_File: mysql_bin.007010
          Read_Master_Log_Pos: 6372081
               Relay_Log_File: relay.007809
                Relay_Log_Pos: 1026664439
        Relay_Master_Log_File: mysql_bin.007009
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1205
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1026664226
              Relay_Log_Space: 1080115539
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 11075
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1205
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 9218
                  Master_UUID: xxxxxxxx
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Waiting for workers to exit
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 221012 06:58:30
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: aeef80c3-1cba-11ea-b233-00163e0ab1ca:528429488-566695017
            Executed_Gtid_Set: 5cdb505c-1cbc-11ea-98f6-00163e0f2394:1-10,
aeef80c3-1cba-11ea-b233-00163e0ab1ca:1-566672256
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
mysql> select * from performance_schema.replication_applier_status_by_worker limit 5\G
*************************** 2. row ***************************
         CHANNEL_NAME: 
            WORKER_ID: 2
            THREAD_ID: NULL
        SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257
    LAST_ERROR_NUMBER: 1205
   LAST_ERROR_MESSAGE: Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Lock wait timeout exceeded; try restarting transaction
 LAST_ERROR_TIMESTAMP: 2022-10-12 06:58:30

2022-10-12 06:58:30开始报错,报错如下:

LAST_ERROR_NUMBER: 1205
LAST_ERROR_MESSAGE: Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Lock wait timeout exceeded; try restarting transactionLAST_ERROR_TIMESTAMP: 2022-10-12 06:58:30

3 排查

3.1 查看processlist

mysql> show processlist;
+---------+-----------------+-------------------+------+-------------+---------+---------------------------------------------------------------+------------------+
| Id      | User            | Host              | db   | Command     | Time    | State                                                         | Info             |
+---------+-----------------+-------------------+------+-------------+---------+---------------------------------------------------------------+------------------+
|      42 | system user     |                   | NULL | Connect     |    9260 | Waiting for workers to exit                                   | NULL             |
|      43 | system user     |                   | NULL | Connect     |   11063 | Waiting for preceding transaction to commit                   | NULL             |
| 1420811 | system user     |                   | NULL | Connect     | 1359763 | Waiting for master to send event                              | NULL             |
| 1512587 | xxxxxxxxx       | 172.16.4.14:32398 | NULL | Binlog Dump |   10926 | Master has sent all binlog to slave; waiting for more updates | NULL             |
| 1513330 | root            | localhost         | NULL | Query       |       0 | starting                                                      | show processlist |
+---------+-----------------+-------------------+------+-------------+---------+---------------------------------------------------------------+------------------+

3.2 查看正在运行事务

mysql> select * from information_schema.innodb_trx;
+------------+-----------+---------------------+-----------------------+------------------+------------+---------------------+-----------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+
| trx_id     | trx_state | trx_started         | trx_requested_lock_id | trx_wait_started | trx_weight | trx_mysql_thread_id | trx_query | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_lock_structs | trx_lock_memory_bytes | trx_rows_locked | trx_rows_modified | trx_concurrency_tickets | trx_isolation_level | trx_unique_checks | trx_foreign_key_checks | trx_last_foreign_key_error | trx_adaptive_hash_latched | trx_adaptive_hash_timeout | trx_is_read_only | trx_autocommit_non_locking |
+------------+-----------+---------------------+-----------------------+------------------+------------+---------------------+-----------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+
| 1703298045 | RUNNING   | 2022-10-12 06:55:48 | NULL                  | NULL             |       5004 |                  43 | NULL      | NULL                |                 0 |                 2 |                3 |                  1136 |               1 |              5001 |                       0 | READ COMMITTED      |                 1 |                      1 | NULL                       |                         0 |                         0 |                0 |                          0 |
+------------+-----------+---------------------+-----------------------+------------------+------------+---------------------+-----------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+
1 row in set (0.00 sec)

3.3 查看该线程最后执行语句

mysql> SELECT a.sql_text, c.id, d.trx_started  FROM PERFORMANCE_SCHEMA.events_statements_current a JOIN PERFORMANCE_SCHEMA.threads b ON a.thread_id = b.thread_id JOIN information_schema.PROCESSLIST c ON b.processlist_id = c.id JOIN information_schema.innodb_trx d ON c.id = d.trx_mysql_thread_id  WHERE c.id = 43  ORDER BY d.trx_started\G
*************************** 1. row ***************************
   sql_text: BEGIN
         id: 43
trx_started: 2022-10-12 06:55:48
1 row in set (0.00 sec)

3.4 错误日志

看报错,是有锁等待超时。查看错误日志:

2022-10-12T06:58:30.764087+08:00 48 [ERROR] Slave SQL for channel '': worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable. Error_code: 1205
2022-10-12T06:58:30.764099+08:00 48 [ERROR] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205
2022-10-12T06:58:30.764111+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:58:30.764145+08:00 44 [ERROR] Slave SQL for channel '': worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable. Error_code: 1205
2022-10-12T06:58:30.764156+08:00 44 [ERROR] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205
2022-10-12T06:58:30.764609+08:00 42 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql_bin.007009' position 1026664226

可以看到2022-10-12 06:58:30报错,Slave SQL线程挂起([ERROR] Slave SQL for channel '': worker thread retried transaction 10 time(s) in vain, giving up)。

下面是报错之前的信息。

2022-10-12T06:20:16.870943+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3299801 times hitting slave_pending_jobs_size_max; current event size = 28148.
2022-10-12T06:20:17.415877+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3299811 times hitting slave_pending_jobs_size_max; current event size = 17784.
2022-10-12T06:20:17.890645+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3299821 times hitting slave_pending_jobs_size_max; current event size = 11503.
2022-10-12T06:20:18.434491+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3299831 times hitting slave_pending_jobs_size_max; current event size = 28913.
2022-10-12T06:20:19.986829+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3299841 times hitting slave_pending_jobs_size_max; current event size = 28142.
2022-10-12T06:20:20.481470+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3299851 times hitting slave_pending_jobs_size_max; current event size = 26086.
2022-10-12T06:20:20.786977+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3299861 times hitting slave_pending_jobs_size_max; current event size = 28296.
2022-10-12T06:20:20.861478+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4015ms. The settings might not be optimal. (flushed=10000 and evicted=0, during the time.)
2022-10-12T06:55:20.090999+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3426631 times hitting slave_pending_jobs_size_max; current event size = 27986.
2022-10-12T06:55:20.176349+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3426641 times hitting slave_pending_jobs_size_max; current event size = 25257.
2022-10-12T06:55:20.258600+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3426651 times hitting slave_pending_jobs_size_max; current event size = 27961.
2022-10-12T06:55:20.381914+08:00 42 [Note] Multi-threaded slave: Coordinator has waited 3426661 times hitting slave_pending_jobs_size_max; current event size = 22941.
2022-10-12T06:55:59.743955+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:55:59.743958+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:56:11.883068+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:56:11.883127+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:56:24.965242+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:56:24.965287+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:56:39.049467+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:56:39.049541+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:56:54.132776+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:56:54.132832+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:57:10.227067+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:57:26.310615+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:57:26.310670+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:57:42.403741+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:57:42.403799+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:57:58.582153+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:57:58.582200+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:58:14.665334+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205
2022-10-12T06:58:14.665413+08:00 44 [Warning] Slave SQL for channel '': Worker 2 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257' at master log mysql_bin.007009, end_log_pos 1028024606; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007009, end_log_pos 1028024606, Error_code: 1205
2022-10-12T06:58:30.764049+08:00 48 [Warning] Slave SQL for channel '': Worker 6 failed executing transaction 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672261' at master log mysql_bin.007009, end_log_pos 1029272183; Could not execute Write_rows event on table test.t1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql_bin.007004, end_log_pos 1029272183, Error_code: 1205

3.5 解析主库binlog

根据报错的事务id,去binlog里面查。

主库:mysqlbinlog --base64-output=decode-rows -v mysql_bin.007009 |grep -B 20 -A 10000000 566672257 > 7009_1.sql

# at 1027686158
#221012  6:25:45 server id 9218  end_log_pos 1027686223 CRC32 0x1d830f9a    GTID    last_committed=444  sequence_number=446 rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257'/*!*/;
# at 1027686223
#221012  6:25:45 server id 9218  end_log_pos 1027686304 CRC32 0x90dc9476    Query   thread_id=29311920  exec_time=0 error_code=0
SET TIMESTAMP=1665527145/*!*/;
BEGIN
/*!*/;
# at 1027686304
# at 1027860550
#221012  6:25:45 server id 9218  end_log_pos 1027860625 CRC32 0x95107af1    Table_map: `test`.`t1` mapped to number 7187
# at 1027860625
#221012  6:25:45 server id 9218  end_log_pos 1027868820 CRC32 0xad3c9abc    Write_rows: table id 7187
# at 1027868820
#221012  6:25:45 server id 9218  end_log_pos 1027877022 CRC32 0x00ede7d7    Write_rows: table id 7187
# at 1027877022
#221012  6:25:45 server id 9218  end_log_pos 1027885215 CRC32 0x08a47400    Write_rows: table id 7187
# at 1027885215
#221012  6:25:45 server id 9218  end_log_pos 1027893429 CRC32 0x77c4df5d    Write_rows: table id 7187
# at 1027893429
#221012  6:25:45 server id 9218  end_log_pos 1027901621 CRC32 0x0c8f9049    Write_rows: table id 7187
...............................................................................省略N条...............................................................................# at 1028008214
#221012 6:25:45 server id 9218 end_log_pos 1028016400 CRC32 0xb91812ad Write_rows: table id 7187
# at 1028016400
#221012  6:25:45 server id 9218  end_log_pos 1028024606 CRC32 0xad6cb550    Write_rows: table id 7187
# at 1028024606
#221012  6:25:45 server id 9218  end_log_pos 1028026822 CRC32 0x06dbbcbb    Write_rows: table id 7187 flags: STMT_END_F
### INSERT INTO `test`.`t1`
### SET
###   @1=88206936
###   @2=15455906
###   @3='145300'
###   @4=1
###   @5='2022-10-12 00:00:00'.......................................................
........................省略N条........................
.......................................................
### INSERT INTO `test`.`t1`
### SET
###   @1=88211935
###   @2=4312695
###   @3='141165'
###   @4=1
###   @5='2022-10-12 00:00:00'
# at 1028026822
#221012  6:25:45 server id 9218  end_log_pos 1028026853 CRC32 0x6143fd8c    Xid = 12669228146
COMMIT/*!*/;

可以看到报错对应的aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257事务,为插入操作,插入了多条。

上面加粗下划线的position:1028024606为从库sql线程报错的位点,也是插入的最后,主库COMMIT成功了。

4 原因

结合上面的thead_id=43查到的正在运行的事务信息,最后执行的语句为BEGIN。考虑是从库执行该事务时,因为INSERT锁等待超过,导致事务失败。

登录主从库,查询test.t1表数据,从库确实确实该事务插入的60多万行数据。

主库:

mysql> select count(*) from test.t1 where update_time='2022-10-12';
+----------+
| count(*) |
+----------+
|   728490 |
+----------+
1 row in set (0.22 sec)

从库:

mysql> select count(*) from test.t1 where update_time='2022-10-12';
+----------+
| count(*) |
+----------+
|    95000 |
+----------+
1 row in set (0.02 sec)

5 解决方案

5.1 kill

首先kill 43这个会话,但是好几个小时后一直显示状态为killed。

5.2 重启

最后重启mysql实例解决。下面为重启过程中的日志信息

2022-10-12T15:43:43.087642+08:00 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 4545891666024
2022-10-12T15:43:43.087679+08:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 4545891666033
2022-10-12T15:43:43.087686+08:00 0 [Note] InnoDB: Database was not shutdown normally!
2022-10-12T15:43:43.087691+08:00 0 [Note] InnoDB: Starting crash recovery.
2022-10-12T15:43:43.973948+08:00 0 [Note] InnoDB: Transaction 1703298045 was in the XA prepared state.
2022-10-12T15:43:44.027245+08:00 0 [Note] InnoDB: Transaction 1703298045 was in the XA prepared state.
2022-10-12T15:43:44.052875+08:00 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2022-10-12T15:43:44.052890+08:00 0 [Note] InnoDB: Trx id counter is 1703548160
2022-10-12T15:43:44.110239+08:00 0 [Note] InnoDB: Last MySQL binlog file position 0 59767338, file name mysql_bin.007004
2022-10-12T15:43:48.822730+08:00 0 [Note] InnoDB: Starting in background the rollback of uncommitted transactions
2022-10-12T15:43:48.822772+08:00 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2022-10-12T15:43:48.855681+08:00 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-10-12T15:43:48.855708+08:00 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-10-12T15:43:48.855751+08:00 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-10-12T15:43:48.927293+08:00 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-10-12T15:43:48.928003+08:00 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2022-10-12T15:43:48.928024+08:00 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2022-10-12T15:43:48.979258+08:00 0 [Note] InnoDB: 5.7.22 started; log sequence number 4545891666033
221012 15:43:49 server_audit: MariaDB Audit Plugin version 1.4.4 STARTED.
2022-10-12T15:43:49.031365+08:00 0 [Note] Semi-sync replication enabled on the master.
2022-10-12T15:43:49.124967+08:00 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2022-10-12T15:43:49.752538+08:00 0 [Note] Slave: MTS group recovery relay log info group_master_log_name mysql_bin.007009, event_master_log_pos 1027345463.
2022-10-12T15:43:49.753026+08:00 0 [Note] Slave: MTS group recovery relay log info group_master_log_name mysql_bin.007009, event_master_log_pos 1027686158.
2022-10-12T15:43:49.753052+08:00 0 [Warning] Recovery from master pos 1026664226 and file mysql_bin.007009 for channel ''. Previous relay log pos and relay log file had been setto 1026664439, /data/mysql/mysql57_3306/relaylog/relay.007809 respectively.
2022-10-12T15:43:49.800123+08:00 0 [Note] Event Scheduler: Loaded 0 events
2022-10-12T15:43:49.800315+08:00 0 [Note] /usr/local/mysql57/sbin/mysqld: ready for connections.
Version: '5.7.22-log'  socket: '/data/mysql/mysql57_3306/socketdir/mysql.sock'  port: 3306  MySQL Community Server (GPL)
2022-10-12T15:44:12.006048+08:00 15 [Note] Slave I/O thread: Start semi-sync replication to master 'repl_user@172.16.9.218:3306' in log 'mysql_bin.007009' at position 1026664226
2022-10-12T15:44:12.007062+08:00 15 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.s

Transaction 1703298045 was in the XA prepared state 显示1703298045 这个XA事务进行了恢复。和上面3.2步骤查到的一致。


5.3 解析从库binlog

从库:mysqlbinlog --base64-output=decode-rows -v mysql_bin.007005 |grep -B 20 -A 50 566672257 |more

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#221012 15:43:49 server id 9219  end_log_pos 123 CRC32 0x238d52e8 	Start: binlog v 4, server v 5.7.22-log created 221012 15:43:49 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
# at 123
#221012 15:43:49 server id 9219  end_log_pos 234 CRC32 0xd11388da 	Previous-GTIDs
# 5cdb505c-1cbc-11ea-98f6-00163e0f2394:1-10,
# aeef80c3-1cba-11ea-b233-00163e0ab1ca:1-566672256
# at 234
#221012  6:25:45 server id 9218  end_log_pos 299 CRC32 0x3543e56f 	GTID	last_committed=0	sequence_number=1	rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'aeef80c3-1cba-11ea-b233-00163e0ab1ca:566672257'/*!*/;
# at 299
#221012  6:25:45 server id 9218  end_log_pos 362 CRC32 0xfaf82f74 	Query	thread_id=29311920	exec_time=33513	error_code=0
SET TIMESTAMP=1665527145/*!*/;
SET @@session.pseudo_thread_id=29311920/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=524288/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=45,@@session.collation_connection=45,@@session.collation_server=45/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 362
# at 174608
#221012  6:25:45 server id 9218  end_log_pos 174683 CRC32 0x34735ed6 	Table_map: `test`.`t1` mapped to number 315
# at 174683
#221012  6:25:45 server id 9218  end_log_pos 182878 CRC32 0xf01e4c33 	Write_rows: table id 315
# at 182878
#221012  6:25:45 server id 9218  end_log_pos 191080 CRC32 0x36c5eb3e 	Write_rows: table id 315
# at 191080
#221012  6:25:45 server id 9218  end_log_pos 199273 CRC32 0xd6d9106c 	Write_rows: table id 315
# at 199273
#221012  6:25:45 server id 9218  end_log_pos 207487 CRC32 0x263e09a4 	Write_rows: table id 315
.......................................................
........................省略N条........................
.......................................................

6 补充

从库相关参数:

mysql> show variables like '%innodb_rollback_on_timeout%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| innodb_rollback_on_timeout | OFF   |
+----------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like '%slave_transaction_retries%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| slave_transaction_retries | 10    |
+---------------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like '%pending%';
+-----------------------------+----------+
| Variable_name               | Value    |
+-----------------------------+----------+
| slave_pending_jobs_size_max | 16777216 |
+-----------------------------+----------+
1 row in set (0.00 sec)

主库参数:

mysql>  show variables like 'max_allowed_packet%';
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| max_allowed_packet | 67108864 |
+--------------------+----------+
1 row in set (0.00 sec)

看官网说法,从库的参数slave_pending_jobs_size_max要比主库的max_allowed_packet参数大才对。

后面如果再出现,会试一下。


最后修改时间:2022-10-17 11:02:35
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论