业务反馈mysql无法访问,随即show processlist查看,发现存在大量Waiting for table flush的select sql线程。找到阻塞源并kill后,业务恢复正常。阻塞源是一个running状态的select sql,已经运行了20个小时,还未执行完成。理论上select sql之间是不会阻塞的,可本例又是如何产生的呢?
下面还原故障过程:
- session1 运行一个耗时SQL,此时session2的相关表查询并无阻塞。
#session1
mysql> select sleep(10000), id from t1;
...等待返回结果
#session2
mysql> show processlist;
+----+------+-----------------------+------+---------+------+------------+---------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------------+------+---------+------+------------+---------------------------------+
| 32 | root | 192.168.199.130:46370 | modb | Query | 173 | User sleep | select sleep(10000), id from t1 |
| 33 | root | 192.168.199.130:46371 | modb | Query | 0 | starting | show processlist |
+----+------+-----------------------+------+---------+------+------------+---------------------------------+
2 rows in set (0.00 sec)
mysql> select * from t1 limit 1;
+----+------+------+
| id | key1 | key2 |
+----+------+------+
| 1 | key1 | a |
+----+------+------+
1 row in set (0.00 sec)
- 读和读之间不阻塞。
- 扩展:读和写之间由于mvcc机制的存在,也不阻塞。
- 用mysqldump备份,由于要flush tables,故产生Waiting for table flush等待。
#备份(开始会有一个flush tables的操作)
mysqldump \
--protocol=TCP \
--host=127.0.0.1 \
--port=3306 \
--user=root \
--password="xxxxxx" \
--hex-blob \
--single-transaction \
--master-data=2 \
--set-gtid-purged=ON \
--all-databases \
--triggers --routines --events \
> /tmp/full_3306.sql
#session2查看阻塞
mysql> show processlist;
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
| 32 | root | 192.168.199.130:46370 | modb | Query | 717 | User sleep | select sleep(10000), id from t1 |
| 34 | root | 127.0.0.1:34176 | NULL | Query | 12 | Waiting for table flush | FLUSH /*!40101 LOCAL */ TABLES |
| 35 | root | 192.168.199.130:46373 | NULL | Query | 0 | starting | show processlist |
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
3 rows in set (0.00 sec)
- flush tables操作,关闭所有已打开的表对象,同时将查询缓存中的结果清空。
- flush tables操作,会被正在运行的SQL阻塞,直到正在运行的所有语句执行结束。
- 此后,只要是关于这张表的访问,均产生Waiting for table flush等待
#session3
mysql> select * from t1;
...等待返回结果
#session2
mysql> show processlist;
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
| 32 | root | 192.168.199.130:46370 | modb | Query | 997 | User sleep | select sleep(10000), id from t1 |
| 34 | root | 127.0.0.1:34176 | NULL | Query | 292 | Waiting for table flush | FLUSH /*!40101 LOCAL */ TABLES |
| 35 | root | 192.168.199.130:46373 | NULL | Query | 0 | starting | show processlist |
| 36 | root | 192.168.199.130:46374 | modb | Query | 41 | Waiting for table flush | select * from t1 |
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
4 rows in set (0.00 sec)
- 如果flush tables等待SQL请求操作的表集合为{tables},后续只要访问的表在{tables}中,就会被阻塞。
- 如果其他会话新请求操作{tables}以外的其他表,不会被阻塞。
- flush tables操作可认为是对{tables}所有表加表级排他锁,会阻塞其他会话关于{tables}表上的所有操作。
- 这时备份强行关闭后,并不会撤销flush tables命令,等待依然存在。
#session2
mysql> kill 34;
Query OK, 0 rows affected (0.01 sec)
mysql> show processlist;
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
| 32 | root | 192.168.199.130:46370 | modb | Query | 1155 | User sleep | select sleep(10000), id from t1 |
| 35 | root | 192.168.199.130:46373 | NULL | Query | 0 | starting | show processlist |
| 36 | root | 192.168.199.130:46374 | modb | Query | 199 | Waiting for table flush | select * from t1 |
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
3 rows in set (0.00 sec)
#session4
mysql> select * from t1 limit 2;
...等待返回结果
#session2
mysql> show processlist;
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
| 32 | root | 192.168.199.130:46370 | modb | Query | 1358 | User sleep | select sleep(10000), id from t1 |
| 35 | root | 192.168.199.130:46373 | NULL | Query | 0 | starting | show processlist |
| 36 | root | 192.168.199.130:46374 | modb | Query | 402 | Waiting for table flush | select * from t1 |
| 37 | root | 192.168.199.130:46375 | modb | Query | 141 | Waiting for table flush | select * from t1 limit 2 |
+----+------+-----------------------+------+---------+------+-------------------------+---------------------------------+
4 rows in set (0.01 sec)
- 当flush tables语句被kill后,后续sql依然要等待。
- 扩展:元数据锁与flush tables表现不同,当被kill后,后续sql不再产生Waiting for table metadata lock等待。
- kill源头阻塞sql,才可恢复
#session2
mysql> kill 32;
Query OK, 0 rows affected (0.00 sec)
mysql> show processlist;
+----+------+-----------------------+------+---------+------+----------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------------------+------+---------+------+----------+------------------+
| 35 | root | 192.168.199.130:46373 | NULL | Query | 0 | starting | show processlist |
| 36 | root | 192.168.199.130:46374 | modb | Sleep | 442 | | NULL |
| 37 | root | 192.168.199.130:46375 | modb | Sleep | 181 | | NULL |
+----+------+-----------------------+------+---------+------+----------+------------------+
3 rows in set (0.00 sec)
- 可试图从information_schema.innodb_trx视图中,查找未关闭的事务。
- 可试图从performance_schema.metadata_locks视图中,查找持有锁的sql线程。
总结:
- 对于flush tables操作,即使kill掉执行flush tables的sql线程,后续相关表的sql依旧会被阻塞,除非kill掉阻塞源。
- mysql需要监控长期未关闭的事务,有阻塞及时处理。
最后修改时间:2023-02-20 10:08:31
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




