搞清楚 MySQL 派生表、物化表、临时表

一树一溪 2022-02-24

621

以前只知道 MySQL 里有临时表，后来随着花在 MySQL 上的时间越来越多，奇怪的知识增加了不少，比如：物化、半连接、首次匹配……

刚开始看到一些新名词的时候，充满了疑惑，也傻傻的分不清楚，经过一段时间的折腾，对这些概念多了一些了解。

今天先来说说 3 种表：派生表、物化表、临时表，刚开始看到派生表、物化表的时候，虽然官方文档和一些书籍上都有介绍，但并不十分清楚它们都是干嘛的，会用在什么地方？

派生表、物化表，看起来高大上，实际上它们两个也是临时表，只是官方给某些场景下使用的临时表
取了个名字而已。

1. 派生表

派生表
，是用于存储子查询产生的结果的临时表，这个子查询
特指 FROM 子句
里的子查询，如果是出现在其它地方的子查询，就不叫这个名字了，所以本质上来说，派生表也是临时表。

explain select * from t1 inner join (
    select distinct i1 from t3 where id in (3, 666, 990)
) as a on t1.i1 = a.i1

复制

+----+-------------+------------+------------+-------+---------------+---------+---------+--------+------+----------+----------------------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref    | rows | filtered | Extra                                              |
+----+-------------+------------+------------+-------+---------------+---------+---------+--------+------+----------+----------------------------------------------------+
| 1  | PRIMARY     | <derived2> | <null>     | ALL   | <null>        | <null>  | <null>  | <null> | 3    | 100.0    | <null>                                             |
| 1  | PRIMARY     | t1         | <null>     | ALL   | <null>        | <null>  | <null>  | <null> | 8    |  12.5    | Using where; Using join buffer (Block Nested Loop) |
| 2  | DERIVED     | t3         | <null>     | range | PRIMARY       | PRIMARY | 4       | <null> | 3    | 100.0    | Using where; Using temporary                       |
+----+-------------+------------+------------+-------+---------------+---------+---------+--------+------+----------+----------------------------------------------------+

复制

通过上面的 explain 结果可以看到，select 中的子查询，会产生一个派生表，存储子查询的查询结果，然后用 t3 表和派生表
（derived2）进行连接操作

2. 物化表

物化表
，也是用于存储子查询产生的结果的临时表，这个子查询
特指 WHERE
子句中查询条件里的子查询。

物化表有两种使用场景：

对子查询进行半连接
优化时，使用物化
策略
IN 子查询转换为 SUBQUERY、UNCACHEABLE SUBQUERY 的 exists 相关子查询
时，把子查询的结果物化，避免对于主查询符合条件的每一条记录，子查询都要执行一次从原表里读取数据

explain select * from t1 where t1.i1 in (
    select i1 from t3 where id in (3, 666, 990, 887, 76) and i2 > 16384
)

复制

+----+--------------+-------------+------------+-------+---------------+---------+---------+--------+--------+----------+----------------------------------------------------+
| id | select_type  | table       | partitions | type  | possible_keys | key     | key_len | ref    | rows   | filtered | Extra                                              |
+----+--------------+-------------+------------+-------+---------------+---------+---------+--------+--------+----------+----------------------------------------------------+
| 1  | SIMPLE       | <subquery2> | <null>     | ALL   | <null>        | <null>  | <null>  | <null> | <null> | 100.0    | <null>                                             |
| 1  | SIMPLE       | t1          | <null>     | ALL   | <null>        | <null>  | <null>  | <null> | 8      |  12.5    | Using where; Using join buffer (Block Nested Loop) |
| 2  | MATERIALIZED | t3          | <null>     | range | PRIMARY       | PRIMARY | 4       | <null> | 5      |  33.33   | Using where                                        |
+----+--------------+-------------+------------+-------+---------------+---------+---------+--------+--------+----------+----------------------------------------------------+

复制

通过上面的 explain 结果可以看到，IN 子查询的结果物化之后（select_type = MATERIALIZED）和 t1 表进行连接操作

子查询有 5 种优化策略：子查询表上拉（table pullout）、重复值消除（duplicate weedout）、首次匹配（first match）、松散扫描（loose scan）、物化连接（materialization），我在执行上面的 select 语句的 explain 时，通过计算成本选择的是重复值消除
策略，为了演示，我用 set optimizer_switch='duplicateweedout=off'
禁用了重复值消除策略

explain 
select * from t1 where t1.i1 in (
    select i1 from t3 where id in (3, 666, 990, 887, 76) and i2 > rand() * 100
) and t1.str1 > 'abc';

复制

+----+----------------------+-------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
| id | select_type          | table | partitions | type  | possible_keys | key     | key_len | ref    | rows | filtered | Extra       |
+----+----------------------+-------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
| 1  | PRIMARY              | t1    | <null>     | ALL   | <null>        | <null>  | <null>  | <null> | 8    | 33.33    | Using where |
| 2  | UNCACHEABLE SUBQUERY | t3    | <null>     | range | PRIMARY       | PRIMARY | 4       | <null> | 5    | 33.33    | Using where |
+----+----------------------+-------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+

复制

通过上面的 explain 结果，实际上没有体现出来使用了物化表，这需要用上另一个杀手锏了（optimizer_trace
）

先执行以下命令，开启 optimizer_trace

set optimizer_trace="enabled=on";
set optimizer_trace_max_mem_size=1048576;

复制

然后执行 SQL 语句

-- 注意：
-- 查询 information_schema.optimizer_trace 表的 SQL 要和前面的 SQL 一起执行
-- 不然查不到执行过程的信息
select * from t1 where t1.i1 in (
    select i1 from t3 where id in (3, 666, 990, 887, 76) and i2 > rand() * 100
) and t1.str1 > 'abc';
select * from information_schema.optimizer_trace;

复制

{
    "transformation": {
        "select#": 2,
        "from": "IN (SELECT)",
        "to": "materialization",
        "chosen": true,
        "unknown_key_1": {
            "creating_tmp_table": {
                "tmp_table_info": {
                    "row_length": 5,
                    "key_length": 4,
                    "unique_constraint": false,
                    "location": "memory (heap)",
                    "row_limit_estimate": 3355443
                }
            }
        }
    }
}

复制

上面的 JSON 是摘取了 trace 结果中的一段，可以看到是把 IN 子查询的结果集物化
到临时表了，并且使用的是 Memory 引擎的临时表

3. 临时表

临时表
，除了派生表、物化表之外，其它会用到临时表的地方，都是为了用空间换时间的，主要有以下使用场景：

group by 不能使用索引时
distinct 不能使用索引时
unioin
...

文章转载自一树一溪，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

搞清楚 MySQL 派生表、物化表、临时表

1. 派生表

2. 物化表

3. 临时表

评论

相关阅读