01
前言
02
准备内容
CREATE TABLE `t1` (
`id` INT ( 11 ) NOT NULL,
`t1_a` INT ( 11 ) DEFAULT NULL,
`t1_b` INT ( 11 ) DEFAULT NULL,
PRIMARY KEY ( `id` ),
KEY `idx_a` ( `t1_a` )) ENGINE = INNODB;
CREATE TABLE `t2` (
`id` INT ( 11 ) NOT NULL,
`t2_a` INT ( 11 ) DEFAULT NULL,
`t2_b` INT ( 11 ) DEFAULT NULL,
PRIMARY KEY ( `id` ),
KEY `idx_a` ( `t2_a` )) ENGINE = INNODB;
CREATE TABLE `t3` (
`id` INT ( 11 ) NOT NULL,
`t3_a` INT ( 11 ) DEFAULT NULL,
`t3_b` INT ( 11 ) DEFAULT NULL,
PRIMARY KEY ( `id` ),
KEY `idx_a` ( `t3_a` )) ENGINE = INNODB;
-- 向t1添加100条数据
-- drop procedure idata;
delimiter ;;
create procedure idata()
begin
declare i int;
set i=1;
while(i<=100)do
insert into t1 values(i, i, i);
set i=i+1;
end while;
end;;
delimiter ;
call idata();
-- 向t2添加1000条数据
drop procedure idata;
delimiter ;;
create procedure idata()
begin
declare i int;
set i=101;
while(i<=1100)do
insert into t2 values(i, i, i);
set i=i+1;
end while;
end;;
delimiter ;
call idata();
-- 向t2添加1000条数据,且t3_a列的值为倒叙
drop procedure idata;
delimiter ;;
create procedure idata()
begin
declare i int;
set i=101;
while(i<=1100)do
insert into t3 values(i, 1101-i, i);
set i=i+1;
end while;
end;;
delimiter ;
call idata();
03
子查询的语法形式和分类
3.1 语法形式
3.1.2 WHERE或IN子句中
其他的还有 SELECT 子句中,ORDER BY 子句中,GROUP BY 子句中,虽然语法支持,但没啥意义,就不唠叨这些情况了。
3.2 分类
3.2.1 按返回的结果集区分
标量子查询,只返回一个单一值的子查询称之为标量子查询,比如: SELECT * FROM t1 WHERE m1 = (SELECT m1 FROM t1 LIMIT 1); 行子查询,就是只返回一条记录的子查询,不过这条记录需要包含多个列(只包含一个列就成了标量子查询了)。比如:SELECT * FROM t1 WHERE (m1, n1) = (SELECT m2, n2 FROM t2 LIMIT 1); 列子查询,就是只返回一个列的数据,不过这个列的数据需要包含多条记录(只包含一条记录就成了标量子查询了)。比如:SELECT * FROM t1 WHERE m1 IN (SELECT m2 FROM t2); 表子查询,就是子查询的结果既包含很多条记录,又包含很多个列,比如: SELECT * FROM t1 WHERE (m1, n1) IN (SELECT m2, n2 FROM t2); 其中的 (SELECT m2, n2 FROM t2) 就是一个表子查询,这里需要和行子查询对比一下,行子查询中我们用了 LIMIT 1 来保证子查询的结果只有一条记录。
3.2.2 按与外层查询关系来区分
04
子查询在MySQL中是怎么执行的
4.1 标量子查询、行子查询的执行方式
4.1.1 不相关子查询
mysql root@localhost:test> explain select * from t1 where t1_a = (select t2_a from t2 limit 1);
+----+-------------+-------+-------+---------------+-------+---------+--------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+--------+------+-------------+
| 1 | PRIMARY | t1 | ref | idx_a | idx_a | 5 | const | 1 | Using where |
| 2 | SUBQUERY | t2 | index | <null> | idx_a | 5 | <null> | 1000 | Using index |
+----+-------------+-------+-------+---------------+-------+---------+--------+------+-------------+
先单独执行 (select t2_a from t2 limit 1) 这个子查询。 然后在将上一步子查询得到的结果当作外层查询的参数再执行外层查询 select * from t1 where t1_a = ...。
4.1.2 相关的子查询
mysql root@localhost:test> explain select * from t1 where t1_a = (select t2_a from t2 where t1.t1_b=t2.t2_b limit 1);
+----+--------------------+-------+------+---------------+--------+---------+--------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+--------+---------+--------+------+-------------+
| 1 | PRIMARY | t1 | ALL | <null> | <null> | <null> | <null> | 100 | Using where |
| 2 | DEPENDENT SUBQUERY | t2 | ALL | <null> | <null> | <null> | <null> | 1000 | Using where |
+----+--------------------+-------+------+---------------+--------+---------+--------+------+-------------+
先从外层查询中获取一条记录,本例中也就是先从 t1 表中获取一条记录。 然后从上一步骤中获取的那条记录中找出子查询中涉及到的值,就是 t1 表中找出 t1.t1_b 列的值,然后执行子查询。 最后根据子查询的查询结果来检测外层查询 WHERE 子句的条件是否成立,如果成立,就把外层查询的那条记录加入到结果集,否则就丢弃。 然后重复以上步骤,直到 t1 中的记录全部匹配完。
4.2 IN子查询
4.2.1 物化
结果集太多,可能内存中都放不下~ 对于外层查询来说,如果子查询的结果集太多,那就意味着 IN 子句中的参数特别多,这就导致:
该临时表的列就是子查询结果集中的列。 写入临时表的记录会被去重,让临时表变得更小,更省地方。 一般情况下子查询结果集不大时,就会为它建立基于内存的使用 Memory 存储引擎的临时表,而且会为该表建立哈希索引。
如果子查询的结果集非常大,超过了系统变量 tmp_table_size或者 max_heap_table_size,临时表会转而使用基于磁盘的存储引擎来保存结果集中的记录,索引类型也对应转变为 B+ 树索引。
mysql root@localhost:test> explain select * from t3 where t3_a in (select t2_a from t2);
+----+--------------+-------------+--------+---------------+------------+---------+--------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+---------------+------------+---------+--------------+------+-------------+
| 1 | SIMPLE | t3 | ALL | idx_a | <null> | <null> | <null> | 1000 | Using where |
| 1 | SIMPLE | <subquery2> | eq_ref | <auto_key> | <auto_key> | 5 | test.t3.t3_a | 1 | <null> |
| 2 | MATERIALIZED | t2 | index | idx_a | idx_a | 5 | <null> | 1000 | Using index |
+----+--------------+-------------+--------+---------------+------------+---------+--------------+------+-------------+
其实上边的查询就相当于表 t3 和子查询物化表进行内连接:
mysql root@localhost:test> explain select * from t3 left join t2 on t3.t3_a=t2.t2_a;
+----+-------------+-------+------+---------------+--------+---------+--------------+------+--------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+--------------+------+--------+
| 1 | SIMPLE | t3 | ALL | <null> | <null> | <null> | <null> | 1000 | <null> |
| 1 | SIMPLE | t2 | ref | idx_a | idx_a | 5 | test.t3.t3_a | 1 | <null> |
+----+-------------+-------+------+---------------+--------+---------+--------------+------+--------+
-- 这里我们先构造了3条记录,其实也是构造不唯一的普通索引
+------+------+------+
| id | t2_a | t2_b |
+------+------+------+
| 1100 | 1000 | 1000 |
| 1101 | 1000 | 1000 |
| 1102 | 1000 | 1000 |
+------+------+------+
-- 加限制条件where t2.id>=1100是为了减少要显示的数据
mysql root@localhost:test> select * from t3 where t3_a in (select t2_a from t2 where t2.id>=1100);
+-----+------+------+
| id | t3_a | t3_b |
+-----+------+------+
| 101 | 1000 | 101 |
+-----+------+------+
1 row in set
Time: 0.016s
mysql root@localhost:test> select * from t3 left join t2 on t3.t3_a=t2.t2_a where t2.id>=1100;
+-----+------+------+------+------+------+
| id | t3_a | t3_b | id | t2_a | t2_b |
+-----+------+------+------+------+------+
| 101 | 1000 | 101 | 1100 | 1000 | 1000 |
| 101 | 1000 | 101 | 1101 | 1000 | 1000 |
| 101 | 1000 | 101 | 1102 | 1000 | 1000 |
+-----+------+------+------+------+------+
3 rows in set
Time: 0.018s
4.2.2 半联接的实现:
Table pullout (子查询中的表上拉)
mysql root@localhost:test> select * from t3 where t3_a in (select t2_a from t2 where t2.id=999)
+-----+------+------+
| id | t3_a | t3_b |
+-----+------+------+
| 102 | 999 | 102 |
+-----+------+------+
1 row in set
Time: 0.024s
mysql root@localhost:test> select * from t3 join t2 on t3.t3_a=t2.t2_a where t2.id=999;
+-----+------+------+-----+------+------+
| id | t3_a | t3_b | id | t2_a | t2_b |
+-----+------+------+-----+------+------+
| 102 | 999 | 102 | 999 | 999 | 999 |
+-----+------+------+-----+------+------+
1 row in set
Time: 0.028s
mysql root@localhost:test> explain select * from t3 where t3_a in (select t2_a from t2 where t2.id=999)
+----+-------------+-------+-------+---------------+---------+---------+-------+------+--------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+--------+
| 1 | SIMPLE | t2 | const | PRIMARY,idx_a | PRIMARY | 4 | const | 1 | <null> |
| 1 | SIMPLE | t3 | ref | idx_a | idx_a | 5 | const | 1 | <null> |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+--------+
FirstMatch execution strategy (首次匹配)
mysql root@localhost:test> explain select * from t3 where t3_a in (select t2_a from t2 where t2.t2_a=1000)
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------+
| 1 | SIMPLE | t3 | ref | idx_a | idx_a | 5 | const | 1 | <null> |
| 1 | SIMPLE | t2 | ref | idx_a | idx_a | 5 | const | 4 | Using index; FirstMatch(t3) |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------+
DuplicateWeedout execution strategy (重复值消除)
LooseScan execution strategy (松散扫描)
4.2.3 半联接的适用条件
SELECT ... FROM outer_tables
WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...
--
或者这样的形式也可以:
SELECT ... FROM outer_tables
WHERE (oe1, oe2, ...) IN (SELECT ie1, ie2, ... FROM inner_tables ...) AND ...
该子查询必须是和IN语句组成的布尔表达式,并且在外层查询的 WHERE 或者 ON 子句中出现
外层查询也可以有其他的搜索条件,只不过和 IN 子查询的搜索条件必须使用AND 连接起来
该子查询必须是一个单一的查询,不能是由若干查询由 UNION 连接起来的形式
该子查询不能包含 GROUP BY 或者 HAVING 语句或者聚集函数
4.2.4 转为 EXISTS 子查询
outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)
-- 可以被转换为:
EXISTS (SELECT inner_expr FROM ... WHERE subquery_where AND outer_expr=inner_expr)
mysql root@localhost:test> SELECT NULL IN (1, 2, 3);
+-------------------+
| NULL IN (1, 2, 3) |
+-------------------+
| <null> |
+-------------------+
1 row in set
mysql root@localhost:test> SELECT 1 FROM s1 WHERE NULL;
+---+
| 1 |
+---+
0 rows in set
Time: 0.016s
mysql root@localhost:test> SELECT 1 FROM s1 WHERE FALSE;
+---+
| 1 |
+---+
0 rows in set
Time: 0.033s
mysql root@localhost:test> explain select * from t3 where t3_a in (select t2_a from t2 where t2.t2_a>=999) or t3_b > 1000;
+----+-------------+-------+-------+---------------+--------+---------+--------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+--------+------+--------------------------+
| 1 | PRIMARY | t3 | ALL | <null> | <null> | <null> | <null> | 1000 | Using where |
| 2 | SUBQUERY | t2 | range | idx_a | idx_a | 5 | <null> | 107 | Using where; Using index |
+----+-------------+-------+-------+---------------+--------+---------+--------+------+--------------------------+
mysql root@localhost:test> explain select * from t3 where exists (select 1 from t2 where t2.t2_a>=999 and t2.t2_a=t3.t3_a) or t3_b > 1000;
+----+--------------------+-------+------+---------------+--------+---------+--------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+--------+---------+--------------+------+--------------------------+
| 1 | PRIMARY | t3 | ALL | <null> | <null> | <null> | <null> | 1000 | Using where |
| 2 | DEPENDENT SUBQUERY | t2 | ref | idx_a | idx_a | 5 | test.t3.t3_a | 1 | Using where; Using index |
+----+--------------------+-------+------+---------------+--------+---------+--------------+------+--------------------------+
需要注意的是,如果 IN 子查询不满足转换为 semi-join 的条件,又不能转换为物化表或者转换为物化表的成本太大,那么它就会被转换为 EXISTS 查询。或者转换为物化表的成本太大,那么它就会被转换为 EXISTS 查询。
05
总结
1. 如果IN子查询符合转换为 semi-join 的条件,查询优化器会优先把该子查询转换为 semi-join,然后再考虑下边执行半连接的策略中哪个成本最低,
1)Table pullout
2)DuplicateWeedout
3)LooseScan
4)FirstMatch
选择成本最低的那种执行策略来执行子查询。
2. 如果IN子查询不符合转换为 semi-join 的条件,那么查询优化器会从下边两种策略中找出一种成本更低的方式执行子查询:
1)先将子查询物化之后再执行查询
2)执行 IN to EXISTS 转换