适用范围
MySQL5.7,MySQL8.0版本。
问题概述
客户在执行truncate table时阻塞,show processlist结果显示 Waiting for table metadata lock 元数据锁,但经过排查发现,并没有看到有其他的活动会话在执行,具体问题如下所示:
如上活动会话并未发现阻塞源,同时查询 select * from sys.innodb_lock_waits; 结果也为空。
表面上没有发现问题根源,正常情况下如果有会话对问题表同时进行数据修改或调整应该在show processlist中显示才对。
问题原因
此时通过查询事务表发现有可疑数据:
继续查询 trx_mysql_thread_id为13的会话:
id 为13的会话,此时显示为sleep状态,并不是我们理解的活动会话。sleep状态,但是在innodb_trx表中的状态为RUNNING,怀疑该会话阻塞了truncate操作,才导致元数据锁的产生。
同时通过 show engine innodb status\G 可以发现一些信息
这里显示有事务锁,thread id为13
解决方案
尝试通过将id为13的会话杀掉:kill 13;
会话杀掉后,truncate 操作马上执行成功,元数据锁消失:
问题总结
在日常运维中,经常会出现未提交事务影响正常SQL操作的现象,这种未提交事务通常在会话中的表现为Sleep状态,并不显示具体的执行内容,进而迷惑运维人员找不到具体原因,故这种情况下通常需要结合事务表来排查,将问题会话杀掉即可。同时也可以通过 show engin innodb status\G 查看但不一定每次都可以看到有可能被频繁的数据刷掉。
-- 查看未提交的事务(3秒内未操作的事务) SELECT p.ID AS conn_id, P.USER AS login_user, P.HOST AS login_host, p.DB AS database_name, P.TIME AS trx_sleep_seconds, TIME_TO_SEC(TIMEDIFF(NOW(),T.trx_started)) AS trx_open_seconds, T.trx_started, T.trx_isolation_level, T.trx_tables_locked, T.trx_rows_locked, t.trx_state, p.COMMAND AS process_state, ( SELECT GROUP_CONCAT(REPLACE(REPLACE(REPLACE(T1.`SQL_TEXT`,'\n',' '),'\r',' '),'\t',' ') SEPARATOR '; ') FROM performance_schema.events_statements_history AS T1 INNER JOIN performance_schema.threads AS T2 ON T1.`THREAD_ID`=T2.`THREAD_ID` WHERE T2.`PROCESSLIST_ID`=P.id ) AS trx_sql_text FROM `information_schema`.`INNODB_TRX` t INNER JOIN `information_schema`.`PROCESSLIST` p ON t.trx_mysql_thread_id=p.id WHERE t.trx_state='RUNNING' AND p.COMMAND='Sleep' AND P.TIME>3 ORDER BY T.trx_started ASC \G -- 查看每个未提交事务执行过的所有SQL SELECT @dt_ts:=UNIX_TIMESTAMP(NOW()); SELECT @dt_timer:=MAX(SH.TIMER_START) FROM performance_schema.threads AS T1 INNER JOIN performance_schema.events_statements_history AS SH ON T1.`THREAD_ID`=SH.`THREAD_ID` WHERE T1.PROCESSLIST_ID=CONNECTION_ID(); SELECT SH.CURRENT_SCHEMA AS database_name, REPLACE(REPLACE(REPLACE(SH.`SQL_TEXT`,'\n',' '),'\r',' '),'\t',' ') AS executed_sql, FROM_UNIXTIME(@dt_ts-CAST((@dt_timer-SH.TIMER_START)/1000000000000 AS SIGNED)) AS start_time, FROM_UNIXTIME(@dt_ts-CAST((@dt_timer+SH.TIMER_END)/1000000000000 AS SIGNED)) AS end_time, (SH.TIMER_END-SH.TIMER_START)/1000000000000 AS used_seconds, SH.TIMER_WAIT/1000000000000 AS wait_seconds, SH.LOCK_TIME/1000000000000 AS lock_seconds, SH.ROWS_AFFECTED AS affected_rows, SH.ROWS_SENT AS send_rows FROM performance_schema.threads AS T1 INNER JOIN performance_schema.events_statements_history AS SH ON T1.`THREAD_ID`=SH.`THREAD_ID` WHERE T1.PROCESSLIST_ID IN ( SELECT p.ID AS conn_id FROM `information_schema`.`INNODB_TRX` t INNER JOIN `information_schema`.`PROCESSLIST` p ON t.trx_mysql_thread_id=p.id WHERE t.trx_state='RUNNING' AND p.COMMAND='Sleep' AND P.TIME>3 ) AND SH.TIMER_START<@dt_timer ORDER BY SH.TIMER_START ASC;
复制
导致事务长期未提交的因素很多,常见的有:
1、业务在执行过程中在处理其他无关操作导致事务长期未被处理。
2、业务逻辑错误或处理异常,导致事务未被正常处理。
3、网络异常导致应用端请求未成功发送给数据库连接,数据库等待应用后续操作。
4、应用服务器性能异常如系统卡死,CPU过载,导致应用无法及时切换到该进程进行处理。
文章被以下合辑收录
评论
