导读:
作者:郑松华,知数堂SQL优化班老师,网名:骑龟的兔子
今天给大家分享下,关于not in ,not exists相关的文章。其实这个可以归纳为exists to in的一类,mysql以前的版本中对in和exists的处理是完全不一样的,直到8.0.16版本。
16版本之前in可以优化成semi join那些关于semi join的几个优化,如:loosescan=on,firstmatch=on,duplicateweedout=on,materialization=on,都是对于in的 , exists只有一种运算就是DEPENDENT SUBQUERY,直到16版本开始exists可以和in等价了,也可以享受到semi join,这回最新版本中mysqlservertiam.com网站中新出现了关于anti join的文章。
Antijoin in MySQL 8 mysqlserverteam.com
复制
我们可以简单理解为not exists和not in在直白点就是子查询变成半连接了,下面我举个5.7和8.0.18版本同一种sql的两种不同的执行计划
首先5.7
--------------mysql Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using EditLine wrapper Connection id: 2 Current database: employees Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.7.14-log MySQL Community Server (GPL) Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 UNIX socket: /tmp/mysql3306.sock Uptime: 5 min 4 sec Threads: 1 Questions: 21 Slow queries: 0 Opens: 116 Flush tables: 1 Open tables: 109 Queries per second avg: 0.069 -------------- root@mysql3306.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+ | 1 | PRIMARY | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using index | +----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
复制
如上述5.7.14版本中not exists执行计划是DEPENDENT SUBQUERY
我们在8.0.18 版本中看下同一个sql:
root@mysql3308.sock>[employees]>\s -------------- /usr/local/mysql8/bin/mysql Ver 8.0.18 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL) Connection id: 7 Current database: employees Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 8.0.18 MySQL Community Server - GPL Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8 mb4Db characterset: utf8 mb4Client characterset: utf8 mb4Conn. characterset: utf8 mb4UNIX socket: /tmp/mysql3308.sock Uptime: 19 min 24 sec Threads: 1 Questions: 46 Slow queries: 0 Opens: 175 Flush tables: 3 Open tables: 95 Queries per second avg: 0.039 root@mysql3308.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ | 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using where; Not exists; Using index | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
复制
我们可以看到变成了anti join
root@mysql3308.sock>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Nested loop anti-join (cost=12.25 rows=10) -> Table scan on t (cost=1.25 rows=10) -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no) (cost=1.01 rows=1) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set, 1 warning (0.00 sec)
复制
因为5.7中没有8.0.18版本新增的explain format=tree查看执行计划的方法,我们比较show warnings\G
5.7
root@mysql3306.sock>[employees]>show warnings\G *************************** 1. row *************************** Level: Note Code: 1276 Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1 *************************** 2. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` where (not(exists(/* select#2 */ select 1 from `employees`.`employees` `e` where (`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`)))) 2 rows in set (0.00 sec)
复制
8.0
root@mysql3308.sock>[employees]>show warnings\G *************************** 1. row *************************** Level: Note Code: 1276 Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1 *************************** 2. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` anti join (`employees`.`employees` `e`) on((`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`)) where true 2 rows in set (0.00 sec)
复制
可以看出5.7中是还是not exists而8.0.18版本中变成了anti join,那有人问了 DEPENDENT SUBQUERY也好,anti join也好到底有啥用?
简单来说DEPENDENT SUBQUERY类似于函数调用 ,需要1个重要的前提:必须接受参数 !也就是说只能是nested loop join方式。但是anti join就不一定了它也可以是nested loop join方式,也可以是hash join方式;而hash join就是 8.0.18版本的新特性!
关于hash join的文章 可以看如下:
hash join mysqlserverteam.com
复制
而且DEPENDENT SUBQUERY随着外层表的结果集的数据量的增大而执行时间增大,以前如果碰到这种极端的问题,我们可以利用 left join来解决 ,具体的方法可以看我的公开课。
现在随着新版的anti join这些我们就可以省略了。
最后我想说的是,优化不能离开版本,离开版本谈优化,是不行的。
还有在这次的8.0.18 版本新引入的
explain format=tree / analyze 查看执行计划的方式中发现了一个很严重的问题
root@mysql3308.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ | 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using where; Not exists; Using index | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+ root@mysql3308.sock>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Nested loop anti-join (cost=12.17 rows=10) -> Table scan on t (cost=1.25 rows=10) -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no) (cost=1.00 rows=1) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set, 1 warning (0.00 sec)
复制
如上所述,可以看到发生了nested loop anti-join,现在,我要把这个改成如下:
root@mysql3308.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e ignore index(pri) where e.emp_no = t.emp_no) ; +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+ | 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL | | 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299246 | 100.00 | Using where; Not exists; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+root@mysql3308.sock>[employees]>desc analyze select * from t_group t where not exists (select 1 from employees e ignore index(pri) where e.emp_no = t.emp_no) ;ERROR 1235 (42000): This version of MySQL doesn't yet support 'EXPLAIN ANALYZE on this query'
复制
如上述所示,这回desc会出现执行计划,但是desc analyze就会发生错误!
我的推测这个有可能是一个bug ! 或者是还没来得及支持,毕竟新特性,希望以后版本中更改。
水平有限,如有错误,欢迎纠正。
谢谢大家~ 欢迎转发~
我是知数堂SQL 优化班老师~ ^^
如有关于SQL优化方面疑问和一起交流的请加:
高性能MySQL,SQL优化群
并且 @兔子@知数堂SQL优化
欢迎加入 知数堂大家庭。