『现象』
一线人员反馈容灾库查询慢,发来mysql错误日志,查看发现出现下面记录:
2022-01-04T05:06:31.028830+01:00 1126 [Note] Multi-threaded slave statistics for channel 'xxxx3': seconds elapsed = 123; events assigned = 2031062017; worker queues filled over overrun level = 1677832; waited due a Worker queue full = 362997; waited due the total size = 0; waited at clock conflicts = 1950338980161300 waited (count) when Workers occupied = 28008759 waited when Workers occupied = 228852321457300
2022-01-04T05:08:31.460758+01:00 1126 [Note] Multi-threaded slave statistics for channel 'xxxx3': seconds elapsed = 120; events assigned = 2031204353; worker queues filled over overrun level = 1677832; waited due a Worker queue full = 362997; waited due the total size = 0; waited at clock conflicts = 1950431725811100 waited (count) when Workers occupied = 28010722 waited when Workers occupied = 228867762790300
2022-01-04T05:09:47.866659+01:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 13095ms. The settings might not be optimal. (flushed=2000 and evicted=0, during the time.)
2022-01-04T05:10:31.166753+01:00 1126 [Note] Multi-threaded slave statistics for channel 'xxxx3': seconds elapsed = 120; events assigned = 2031332353; worker queues filled over overrun level = 1677832; waited due a Worker queue full = 362997; waited due the total size = 0; waited at clock conflicts = 1950524595327900 waited (count) when Workers occupied = 28012440 waited when Workers occupied = 228882816756300
2022-01-04T05:11:03.758176+01:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 16802ms. The settings might not be optimal. (flushed=2000 and evicted=0, during the time.)
2022-01-04T05:12:32.588940+01:00 1126 [Note] Multi-threaded slave statistics for channel 'xxxx3': seconds elapsed = 121; events assigned = 2031461377; worker queues filled over overrun level = 1677832; waited due a Worker queue full = 362997; waited due the total size = 0; waited at clock conflicts = 1950614429599300 waited (count) when Workers occupied = 28014169 waited when Workers occupied = 228904267514200
2022-01-04T05:14:35.470387+01:00 1126 [Note] Multi-threaded slave statistics for channel 'xxxx3': seconds elapsed = 123; events assigned = 2031603713; worker queues filled over overrun level = 1677832; waited due a Worker queue full = 362997; waited due the total size = 0; waited at clock conflicts = 1950710663557100 waited (count) when Workers occupied = 28015904 waited when Workers occupied = 228919686483500
2022-01-04T05:16:35.165712+01:00 1126 [Note] Multi-threaded slave statistics for channel 'xxxx3': seconds elapsed = 120; events assigned = 2031737857; worker queues filled over overrun level = 1677832; waited due a Worker queue full = 362997; waited due the total size = 0; waited at clock conflicts = 1950803558697000 waited (count) when Workers occupied = 28017510 waited when Workers occupied = 228935603900500
2022-01-04T05:18:35.083805+01:00 1126 [Note] Multi-threaded slave statistics for channel 'xxxx3': seconds elapsed = 120; events assigned = 2031856641; worker queues filled over overrun level = 1677832; waited due a Worker queue full = 362997; waited due the total size = 0; waited at clock conflicts = 1950899298017900 waited (count) when Workers occupied = 28019057 waited when Workers occupied = 228948804980700
『分析』
脏页产生太快,页面清理不过来,系统忙于刷脏页,把内存里的数据写入磁盘。多线程复制同步中,发生大量事件等候,队伍超长,处理任务性能下降。
MySQL脏页:当内存数据页跟磁盘数据页内容不一致的时候,称这个内存页为脏页
Multi-threaded slave statistics段字符串解释:
seconds elapsed :上一次统计跟这一次统计的时间间隔
events assigned:总共有多少个event被分配执行,计的是总数。
worker queues filled over overrun level:多线程同步中,worker 的私有队列长度超长的次数,计的是总数。
waited due a Worker queue full :因为worker的队列超长而产生等待的次数,计的是总数。
waited due the total size :超过最大size的次数,这个由参数slave_pending_jobs_size_max 指定。
waited at clock conflicts :因为逻辑时间产生冲突的等待时间,单位是纳秒。
waited (count) when Workers occupied :因为workder被占用而出现等待的次数。(总计值)。
waited when Workers occupied :因为workder被占用而出现等待的总时间,总计值,单位是纳秒。
page_cleaner段字符串解释:
1000ms :每1秒钟做一次刷新页的操作。
loop took 13095ms:刷新循环的实际经历时间。
flushed=2000 :脏页数值。
evicted=0:lru 列表尾部刷新的页数
刷脏页的时机:
(1)redo log写满时,没有空间了,此时需要将checkpoint向前推进,推进的这部分日志对应的脏页刷入到磁盘,此时所有的更新全部阻塞,此时写的性能变为0,必须待刷一部分脏页后才能更新。
(2)系统内存不足时,需要将一部分数据页淘汰掉,如果淘汰的是脏页,需要先将脏页同步到磁盘。
(3)MySQL认为空闲的时间,这种没有性能问题。
(4) mysql正常关闭之前,会把所有脏页刷入磁盘,不存在性能问题。
『解决』
在线修改参数
Set global innodb_lru_scan_depth=256; ----------单个innodb_buffer_pool_instance刷新脏页的最大数量(深度),Innodb_io_capacity>innodb_lru_scan_depth*innodb_buffer_pool_instance,默认是1024,优化减少值
Set global innodb_max_dirty_pages_pct=50;----------最大脏页百分比,默认是75,优化减少值
Set global innodb_adaptive_hash_index=off;----------关闭.默认打开
Set global innodb_io_capacity = 2000;----------默认是200,单位是页。该参数设置的大小取决于硬盘的IOPS,即每秒的输入输出量(或读写次数)。
Set global innodb_io_max_capacity = 4000;
若是要永久生效,要在my.cnf里面修改参数。




