最近在mysql从库操作时候,执行了一条set global read_only=1的操作,结果卡住了...!!百思不得其解,觉得一探究竟。
因为是从库,不提供服务,所以有充足的时间查原纠根。我们看看show processlist的状态:
复制线程为8线程并发复制。上图出现死锁,结论:
97650 是用户发起的set global read_only=1操作;
109944 是复制线程thread/sql/slave_sql,sql线程,并行复制中的Coordinator;
109945/109946 是mts的Coordinator线程slave_worker,可以并行执行事务;
109944 Coordinator线程分发relaylog中事务时发现这个事务不能执行,要等待前面的事务完成提交,所以处于waiting for dependent transaction to commit的状态.
109945/109946线程与用户发起的97650线程形成死锁,109945线程等待97650线程释放GLobal Read Lock,97650线程占有MDL::global read lock 全局读锁,申请全局commit lock的时候,阻塞109946线程,109946线程占有MDL:: commit lock,因为从库设置slave_preserve_commit_order=on,保证从库binlog提交顺序,而109946线程执行事务对应的binlog靠后面,所以等待109945的事务提交。最终形成了109945->97650 ->109946->109945的死循环,形成死锁。为方便理解,我画了草图如下:
如上图三个线程形成死锁,又因为此次死锁是MDL锁,mysql server层面的锁,所以我们通过死锁检测很难发现,比如show engine innodb status、information_schema.INNODB_LOCKS、information_schema.INNODB_LOCK_WAITS都找不到死锁的数据。
[sys]>select a.thd_id, a.conn_id, b.thread_os_id, a.program_name,b.name,a.user,a.current_statement,b.PROCESSLIST_STATE from sys.processlist a ,performance_schema.threads b where a.thd_id =b.thread_id and conn_id >0;
复制
一、MDL锁细节:
http://mysql.taobao.org/monthly/2015/11/04/
以下很多锁分析基于该文章的基础。
二、分析锁
为方便查看,将死锁线程单独列出来:
root@localhost:mysql_xx.sock [performance_schema]>select * from threads where THREAD_ID in (109978,109979,109979,97684) order by PROCESSLIST_ID desc;
+-----------+---------------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+---------------------------------------------+------------------------+------------------+------+--------------+---------+-----------------+--------------+
| THREAD_ID | NAME | TYPE | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB | PROCESSLIST_COMMAND | PROCESSLIST_TIME | PROCESSLIST_STATE | PROCESSLIST_INFO | PARENT_THREAD_ID | ROLE | INSTRUMENTED | HISTORY | CONNECTION_TYPE | THREAD_OS_ID |
+-----------+---------------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+---------------------------------------------+------------------------+------------------+------+--------------+---------+-----------------+--------------+
| 109979 | thread/sql/slave_worker | FOREGROUND | 109945 | root | localhost | NULL | Connect | 98091 | Waiting for commit lock | NULL | 109978 | NULL | YES | YES | NULL | 156643 |
| 109978 | thread/sql/slave_sql | FOREGROUND | 109944 | root | localhost | NULL | Connect | 1485 | Waiting for dependent transaction to commit | NULL | 97684 | NULL | YES | YES | NULL | 156642 |
| 97684 | thread/sql/one_connection | FOREGROUND | 97650 | root | localhost | sys | Query | 1485 | Waiting for commit lock | set global read_only=1 | NULL | NULL | YES | YES | Socket | 146534 |
+-----------+---------------------------+------------+----------------+------------------+------------------+----------------+---------------------+------------------+---------------------------------------------+------------------------+------------------+------+--------------+---------+-----------------+--------------+
3 rows in set (0.00 sec)
复制
我们针对线程97650,找到它的io线程(thread_os_id)为146534,然后用pstack将该进程的栈信息打印出来:
PSTACK 146534:
/data/logs# pstack 156643
Thread 1 (Thread 0x7f4970360700 (LWP 156643)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1 0x0000000000ca3180 in native_cond_timedwait (abstime=0x7f497035f9e0, mutex=0x5fb66098, cond=0x5fb660c8) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/include/thr_cond.h:136
#2 my_cond_timedwait (abstime=0x7f497035f9e0, mp=0x5fb66098, cond=0x5fb660c8) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/include/thr_cond.h:189
#3 inline_mysql_cond_timedwait (src_line=1868, src_file=0x1572e10 "/var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/mdl.cc", abstime=0x7f497035f9e0, mutex=0x5fb66098, that=0x5fb660c8) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/include/mysql/psi/mysql_thread.h:1236
#4 MDL_wait::timed_wait (this=0x5fb66098, owner=0x5fb66000, abs_timeout=0x7f497035f9e0, set_status_on_timeout=true, wait_state_name=<optimized out>) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/mdl.cc:1868
#5 0x0000000000ca5fe5 in MDL_context::acquire_lock (this=0x5fb66098, mdl_request=0x7f497035fa70, lock_wait_timeout=<optimized out>) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/mdl.cc:3700
#6 0x000000000085af4c in ha_commit_trans (thd=0x5fb66000, all=true, ignore_global_read_lock=false) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/handler.cc:1772
#7 0x0000000000e10d99 in trans_commit (thd=0x5fb66000) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/transaction.cc:246
#8 0x0000000000f082a2 in Xid_log_event::do_commit (this=0x6087de60, thd_arg=0x5fb66000) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/log_event.cc:7139
#9 0x0000000000f081a3 in Xid_apply_log_event::do_apply_event_worker (this=0x6087dea8, w=0x5dce2000) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/log_event.cc:7234
#10 0x0000000000f71195 in slave_worker_exec_job_group (worker=0x5dce2000, rli=0x2e7a800) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/rpl_rli_pdb.cc:2695
#11 0x0000000000f5909b in handle_slave_worker (arg=0x5dce2000) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/sql/rpl_slave.cc:6266
#12 0x00000000013ff8a4 in pfs_spawn_thread (arg=0x5f40d4c0) at /var/lib/pb2/sb_1-1352104-1607570233.72/mysql-5.7.33/storage/perfschema/pfs.cc:2197
#13 0x00007f4ea275a6ba in start_thread (arg=0x7f4970360700) at pthread_create.c:333
#14 0x00007f4ea119e51d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
复制
通过以上图可以看到 MDL_context::acquire_lock 加上1个锁,然后进行加锁Global_read_lock::make_global_read_lock_block_commit.再然后fix_read_only完成MDL_SHARED锁。以上的加锁过程和分析加了什么锁。但针对具体持有了哪些锁细节并不清楚,set global read_only是使得整个系统处于只读(RO)状态,该命令是通过fix_read_only函数来完成的,fix_read_only会调用lock_global_read_lock函数。
2.2 set global rea_only(fix_read_only)产生的锁
lock_global_read_lock实现逻辑代码:
fix_read_only(),sys_vars.cc:2219
--my_bool new_read_only= read_only;
--read_only= opt_readonly;
...
--lock_global_read_lock(),lock.cc:966 (enum_grl_state m_state = Global_read_lock::GRL_ACQUIRED_AND_BLOCKS_COMMIT -> m_state = GRL_ACQUIRED)
--mdl_request.init(MDL_key::GLOBAL, "", "", MDL_SHARED, MDL_EXPLICIT);# 初始化一个namespace=GLOBAL的显式的MDL_SHARED的metadata lock, dbname 和 name为“”
--thd->mdl_context.acquire_lock(&mdl_request,thd->variables.lock_wait_timeout) # 获取刚刚初始化过的metadata lock
--make_global_read_lock_block_commit(),lock.cc:1041 (m_state = GRL_ACQUIRED -> m_state = GRL_ACQUIRED_AND_BLOCKS_COMMIT)
--mdl_request.init(MDL_key::COMMIT, "", "", MDL_SHARED, MDL_EXPLICIT);
--thd->mdl_context.acquire_lock(&mdl_request,thd->variables.lock_wait_timeout))
...
--opt_readonly= new_read_only ;
--read_only= opt_readonly;
复制
主要就是分三步骤:
1. lock_global_read_lock --> mdl_request.init(MDL_key::GLOBAL, "", "", MDL_SHARED, MDL_EXPLICIT);
2. close_cached_tables(FLUSH TABLES只做这一步)
3. make_global_read_lock_block_commit --> mdl_request.init(MDL_key::COMMIT, "", "", MDL_SHARED, MDL_EXPLICIT);
fix_read_only做的事情和FLUSH TABLES WITH READ LOCK基本一样,lock_global_read_lock(MDL_SHARED类型的global read lock)–>close_cached_tables–>make_global_read_lock_block_commit。加的都是显式的MDL_SHARED锁,
以上我们可以知道,set global read_only会获取2把锁:MDL::global read lock 和MDL::global commit lock。
//Global_read_lock::lock_global_read_lock
MDL_REQUEST_INIT(&mdl_request,MDL_key::GLOBAL, "", "", MDL_SHARED, MDL_EXPLICIT);
//Global_read_lock::make_global_read_lock_block_commit
MDL_REQUEST_INIT(&mdl_request,MDL_key::COMMIT, "", "", MDL_SHARED, MDL_EXPLICIT);
复制
2.3 事务提交产生的锁(GRL和GCL锁)
该库为从库,同步发生的ddl/dml较为频繁,对于此时发生的任何DDL、DML,首先都需要申请MDL_INTENTION_EXCLUSIVE(IX)类型的global read lock:–> mdl_request.init(MDL_key::GLOBAL, “”, “”, MDL_INTENTION_EXCLUSIVE, MDL_EXPLICIT); 然后申请对应表上必须的相应元数据锁。
同样该DDL/DML在提交时候,需要再次申请MDL_INTENTION_EXCLUSIVE(IX)类型的global commit lock:–> mdl_request.init(MDL_key::COMMIT, “”, “”, MDL_INTENTION_EXCLUSIVE, MDL_EXPLICIT); 注意: select操作不会发生任何锁(grl/gcl锁)申请。为方便理解,我贴个图。
事务产生的是scope类型的IX锁,与set global read_only产生的S锁,是不兼容的,会发生等待。两个事务构不成死锁,当然还有第三个等待事务。
2.4 slave_preserve_commit_order
官方原文: For multithreaded slaves, enabling this variable ensures that transactions are externalized on the slave in the same order as they appear in the slave's relay log. Setting this variable has no effect on slaves for which multithreading is not enabled. All replication threads (for all replication channels if you are using multiple replication channels) must be stopped before changing this variable. --log-bin and --log-slave-updates must be enabled on the slave. In addition --slave-parallel-type must be set to LOGICAL_CLOCK. Once a multithreaded slave has been started, transactions can begin to execute in parallel. With slave_preserve_commit_order enabled, the executing thread waits until all previous transactions are committed before committing. While the slave thread is waiting for other workers to commit their transactions it reports its status as Waiting for preceding transaction to commit.
大致实现原理就是:excecution阶段可以并行执行,binlog flush的时候,按顺序进行。引擎层提交的时候,根据binlog_order_commit也是排队顺序完成 换句话说,如果设置了这个参数,master是怎么并行的,slave就怎么办并行。
所以,事务的执行和set global read_only(FTWRL)语句获得两个锁都不是原子的,并行复制时模式下按以下的顺序就会出现死锁:
事务A、B可以并行复制,relay-log中A在前,slave_preserve_commit_order=1;
从库回放时,B执行较快,先执行到commit状态,获取commit锁,并进入等待waiting for dependent transaction to commit的状态,持有MDL::commit锁;
用户C执行set global read_only(FTWRL)命令,进入waiting for commit状态,持有(global read lock 、global commit lock)MDL_SHARED锁;
事务A执行:
1):在C获取global read lock之后执行,那么事务A会出现waiting for global read lock的状态
2):在C获取global read lock之前执行,同时在获得global commit lock之后应用Xid_event的事务,则会出现waiting for commit的状态。
把以上ABC换成我们这次死锁中的线程,对应如下:A:109445、B:109446、C:97560。只是发生这种概率应该还是挺小的,刚好能遇见。
为了验证,同样跟踪下复制线程109445的栈信息,找到其对应的os thread:
以上可以看到首先申请MDL的IX锁,进行Xid_log_event::do_commit,很显然符合我们分析的A事务执行的b种情况,出现了waiting for commit状态。该死锁也很有可能出现在备份过程中。
3、故障总结:
MDL模块作为一个集中的资源,收到不同线程发来的锁请求,而有的锁是互斥的,不能同时满足,在这种情况下就会等待,如果线程在此之前已经拿到某些锁的话,就会形成持有-等待的状态;而又不可能要求所有线程按某一固定顺序请求锁,这样就会形成等待循环,也就是死锁。
4、解决与避免:
通过分析得知要解决死锁,必然要某一线程放弃某状态锁,选择影响最小的锁释放(kill),将set global read_only进程kill掉,死锁解开,影响最少。
避免:
第一种可以通过关闭slave_preserve_commit_order顺序并行,但关闭有可能导致主从数据顺序不一致,复制异常等等。第二种就是执行前先执行一下FLUSH TABLES。如果是备份引起的FTWRL,则需要加上kill-long-queries-timeout参数。