MySQL在进行altertable等DDL操作时,有时会出现Waiting for table metadata lock的等待场景。而且,一旦alter table TableA的操作停滞在Waiting for table metadata lock的状态,后续对Table A的任何操作(包括读)都无法进行,因为他们也会在Opening tables的阶段进入到Waiting for table metadata lock的锁等待队列。如果是产品环境的核心表出现了这样的锁等待队列,就会造成灾难性的后果。
造成alter table产生Waiting fortable metadata lock的原因其实很简单,一般是以下几个简单的场景:
场景一:长事物运行,阻塞DDL,继而阻塞所有同表的后续操作.
这里的长事务也包括长的select操作
如图processlist_id:74 在执行altertable flight DDL语句时发生了Waiting for table metadata lock, 原因是因为上面的长SQL,processlist: 73, SQL 语句:
select b.flight_id, b.price from flight f, booking b,passengerdetails pd
where f.flight_id=b.flight_id
and b.passenger_id = pd.passenger_id
解决办法就是沟通一下看能否干掉长SQL或者事务如kill 74, 如果是DDL语句在先,就杀掉DDL语句所有的session
场景二:未提交事物,阻塞DDL,继而阻塞所有同表的后续操作
a. 通过show processlist看不到Table上有任何操作,但实际上存在有未提交的事务,可以在 information_schema.innodb_trx或者performance_schema.events_statements_current中查看到。
在事务没有完成之前,TableA上的锁不会释放,alter table同样获取不到metadata的独占锁。
处理方法:通过 select * frominformation_schema.innodb_trx\G, 找到未提交事物的sid, 然后 kill 掉,让其回滚。
对于未提交的事务只能通过下面语句:
select * fromperformance_schema.events_statements_current\G
selecttrx_id,trx_started,trx_requested_lock_id,trx_query,trx_mysql_thread_id fromINNODB_TRX;
比如:
会话A:
(product)root@localhost[airportdb]> update booking set price = 500 where booking_id=4;
更新一行记录, 不提交
会话B:
alter table flightadd column flightdesc char(20);
这样就会产生MDL锁:
后执行的语句为alter table add column
(product)root@localhost [information_schema]>select * from performance_schema.events_statements_current\G
Thread _ID为76
(product)root@localhost[information_schema]>
selectprocesslist_ID from performance_schema.threads where THREAD_ID=75
-> ;
+----------------+
| processlist_ID |
+----------------+
| 15 |
+----------------+
1 row in set (0.00sec)
(product)root@localhost[information_schema]> kill 15
找出没有提交的事务并干掉它。
还可以用下面方法
b. select * fromsys.schema_table_lock_waits\G ---5.7
(product)root@localhost [performance_schema]>select * from sys.schema_table_lock_waits\G
*************************** 1. row***************************
object_schema: airportdb
object_name: flight
waiting_thread_id: 83
waiting_pid: 23
waiting_account: root@localhost
waiting_lock_type: EXCLUSIVE
waiting_lock_duration: TRANSACTION
waiting_query: alter table flight drop column flightdesc
waiting_query_secs: 5
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 85
blocking_pid: 25
blocking_account: root@localhost
blocking_lock_type: SHARED_READ
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 25
sql_kill_blocking_connection: KILL 25
如果是MySQL8,则是sys.innodb_lock _waits
另外也可以通过
Select object_type,object_schema,object_name,lock_type,lock_duration,lock_status,owner_thread_idfromperformance_schema.metadata_locks;
这里,重点关注lock_status,"PENDING"代表线程在等待MDL,而"GRANTED"则代表线程持有MDL。
场景三
通过show processlist看不到Table A上有任何操作,在information_schema.innodb_trx中也没有任何进行中的事务。这很可能是因为在一个显式的事务中,对Table A进行了一个失败的操作(比如查询了一个不存在的字段),这时事务没有开始,但是失败语句获取到的锁依然有效,没有释放。从
performance_schema.events_statements_current
表中可以查到失败的语句。
官方手册上对此的说明如下:
If the serveracquires metadata locks for a statement that is syntactically valid but failsduring execution, it does not release the locks early. Lock release is stilldeferred to the end of the transaction because the failed statement is writtento the binary log and the locks protect log consistency.
也就是说除了语法错误,其他错误语句获取到的锁在这个事务提交或回滚之前,仍然不会释放掉。because the failed statement is written to the binary log and thelocks protect log consistency 但是解释这一行为的原因很难理解,因为错误的语句根本不会被记录到二进制日志。
处理方法:
通过performance_schema.events_statements_current找到其sid, kill 掉该session. 也可以 kill 掉DDL所在的session.
总之,alter table的语句是很危险的(其实他的危险其实是未提交事物或者长事务导致的),在操作之前最好确认对要操作的表没有任何进行中的操作、没有未提交事务、也没有显式事务中的报错语句,如果有alter table的维护任务,在无人监管的时候运行,最好通过lock_wait_timeout设置好超时时间,避免长时间的metedata锁等待。
注意上面用performance_schema来查看有个前提,是要ps中打开MDL锁信息的采集器开关。---8中默认开
select * fromperformance_schema.setup_instruments where name like '%/mdl';
+----------------------------+---------+-------+
| NAME | ENABLED |TIMED |
+----------------------------+---------+-------+
|wait/lock/metadata/sql/mdl | NO | NO |
+----------------------------+---------+-------+
1 row in set (0.00sec)
启用mdl的采集器
callsys.ps_setup_enable_instrument('sql/mdl');
+-----------------------+
| summary |
+-----------------------+
| Enabled 6 instruments|
+-----------------------+
1 row in set (0.00sec)
一般通过这条语句就够了:
SELECT
locked_schema,
locked_table,
locked_type,
waiting_processlist_id,
waiting_age,
waiting_query,
waiting_state,
blocking_processlist_id,
blocking_age,
substring_index(sql_text,"transaction_begin;",-1)ASblocking_query,
sql_kill_blocking_connection
FROM
(
SELECT
b.OWNER_THREAD_IDASgranted_thread_id,
a.OBJECT_SCHEMAASlocked_schema,
a.OBJECT_NAMEASlocked_table,
"Metadata Lock"ASlocked_type,
c.PROCESSLIST_IDASwaiting_processlist_id,
c.PROCESSLIST_TIMEASwaiting_age,
c.PROCESSLIST_INFOASwaiting_query,
c.PROCESSLIST_STATEASwaiting_state,
d.PROCESSLIST_IDASblocking_processlist_id,
d.PROCESSLIST_TIMEASblocking_age,
d.PROCESSLIST_INFOASblocking_query,
concat('KILL',d.PROCESSLIST_ID)ASsql_kill_blocking_connection
FROM
performance_schema.metadata_locks a
JOINperformance_schema.metadata_locks bONa.OBJECT_SCHEMA=b.OBJECT_SCHEMA
ANDa.OBJECT_NAME=b.OBJECT_NAME
ANDa.lock_status='PENDING'
ANDb.lock_status='GRANTED'
ANDa.OWNER_THREAD_ID<>b.OWNER_THREAD_ID
ANDa.lock_type='EXCLUSIVE'
JOINperformance_schema.threads cONa.OWNER_THREAD_ID=c.THREAD_ID
JOINperformance_schema.threads dONb.OWNER_THREAD_ID=d.THREAD_ID
) t1,
(
SELECT
thread_id,
group_concat(CASEWHENEVENT_NAME='statement/sql/begin'THEN"transaction_begin"ELSEsql_textENDORDERBYevent_idSEPARATOR ";" )ASsql_text
FROM
performance_schema.events_statements_history
GROUPBYthread_id
) t2
WHERE
t1.granted_thread_id=t2.thread_id\G