作者:ShunWah
在运维管理领域,我拥有多年深厚的专业积累,兼具坚实的理论基础与广泛的实践经验。我始终站在技术前沿,致力于推动运维自动化,不懈追求运维效率的最大化。
我精通运维自动化流程,对于OceanBase、MySQL等多种数据库的部署与运维,具备从初始部署到后期维护的全链条管理能力。凭借OceanBase的OBCA和OBCP认证、OpenGauss社区认证结业证书,以及崖山DBCA、亚信AntDBCA、翰高HDCA、GBase 8a | 8c | 8s、Galaxybase GBCA、Neo4j Graph Data Science Certification、NebulaGraph NGCI & NGCP等多项权威认证,我不仅展现了自己的专业技能,也彰显了对技术的深厚热情与执着追求。
在OceanBase & 墨天轮的技术征文大赛中,我凭借卓越的技术实力和独特的见解,多次荣获一、二、三等奖。同时,在OpenGauss第五届、第六届、第七届技术征文大赛,TiDB社区第三届专栏征文大赛,金仓数据库有奖征文活动,以及首批YashanDB「产品体验官」尝鲜征文等活动中,我也屡获殊荣。此外,我还活跃于墨天轮、CSDN等技术平台,经常发布原创技术文章,并多次被首页推荐,积极与业界同仁分享我的运维经验和独到见解。
前言
在数据库开发中,复杂查询往往伴随着嵌套子查询、多层逻辑和难以维护的代码结构。MySQL 8.0 引入的 通用表表达式(Common Table Expressions,CTE) 如同一把锋利的手术刀,能够将臃肿的 SQL 语句解构成清晰可读的模块化逻辑。本文将以实战为导向,深入解析 CTE 的核心特性、递归查询的实现原理,并通过真实场景的代码演示,展现如何用 CTE 提升 SQL 的可维护性与性能。
一、CTE 的本质:SQL 的模块化编程思维
1. 什么是 CTE?
CTE 是一种 临时的命名结果集,可在同一查询中被多次引用。它通过 WITH
关键字定义,将复杂查询拆分为多个逻辑步骤,类似于编程语言中的函数或变量。
2. 为什么需要 CTE?
- 代码可读性:将嵌套子查询转换为扁平化的逻辑块。
- 代码复用:避免重复编写相同的子查询逻辑。
- 递归能力:支持递归查询,处理树形或层次化数据。
3. 语法结构
WITH cte_name (column1, column2, ...) AS (
-- 子查询定义
SELECT ...
)
SELECT * FROM cte_name;
二、用 CTE 重构复杂查询
场景:统计部门销售额与员工贡献占比
传统子查询写法:
mysql>
mysql> SELECT
-> d.dept_name,
-> total_sales,
-> e.emp_name,
-> e.sales / total_sales AS contribution
-> FROM
-> departments d
-> JOIN
-> employees e ON d.dept_id = e.dept_id
-> JOIN (
-> SELECT dept_id, SUM(sales) AS total_sales
-> FROM employees
-> GROUP BY dept_id
-> ) AS dept_totals ON d.dept_id = dept_totals.dept_id;
+-------------+-------------+----------+--------------+
| dept_name | total_sales | emp_name | contribution |
+-------------+-------------+----------+--------------+
| Sales | 240000.00 | Alice | 0.625000 |
| Sales | 240000.00 | Bob | 0.375000 |
| Marketing | 200000.00 | Charlie | 0.375000 |
| Marketing | 200000.00 | David | 0.625000 |
| Engineering | 400000.00 | Eve | 0.450000 |
| Engineering | 400000.00 | Frank | 0.550000 |
+-------------+-------------+----------+--------------+
6 rows in set (0.00 sec)
mysql>
CTE 重构版本:
mysql>
mysql> WITH dept_totals AS (
-> SELECT
-> dept_id,
-> SUM(sales) AS total_sales
-> FROM employees
-> GROUP BY dept_id
-> )
-> SELECT
-> d.dept_name,
-> dt.total_sales,
-> e.emp_name,
-> e.sales / dt.total_sales AS contribution
-> FROM
-> departments d
-> JOIN dept_totals dt ON d.dept_id = dt.dept_id
-> JOIN employees e ON d.dept_id = e.dept_id;
+-------------+-------------+----------+--------------+
| dept_name | total_sales | emp_name | contribution |
+-------------+-------------+----------+--------------+
| Sales | 240000.00 | Alice | 0.625000 |
| Sales | 240000.00 | Bob | 0.375000 |
| Marketing | 200000.00 | Charlie | 0.375000 |
| Marketing | 200000.00 | David | 0.625000 |
| Engineering | 400000.00 | Eve | 0.450000 |
| Engineering | 400000.00 | Frank | 0.550000 |
+-------------+-------------+----------+--------------+
6 rows in set (0.00 sec)
mysql>
优势分析:
- 将部门总销售额计算分离为独立模块。
- 主查询逻辑更聚焦于业务目标。
- 方便后续复用
dept_totals
结果集。
三、递归 CTE:征服树形数据的终极武器
1. 递归 CTE 核心原理
- 初始查询(Anchor Member):定义递归的起点。
- 递归查询(Recursive Member):基于前一次迭代结果生成新数据。
- 终止条件:当递归查询返回空结果时停止。
2. 实战场景:构建组织架构树
数据表结构:
2.1 创建表
mysql>
mysql> CREATE TABLE employees (
-> emp_id INT PRIMARY KEY,
-> emp_name VARCHAR(50),
-> manager_id INT,
-> salary DECIMAL(10,2)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql>
2.2 插入数据
mysql>
mysql> INSERT INTO employees VALUES
-> (1, 'CEO', NULL, 100000),
-> (2, 'CTO', 1, 80000),
-> (3, 'Engineering Lead', 2, 70000),
-> (4, 'Senior Developer', 3, 60000),
-> (5, 'Junior Developer', 3, 50000);
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql>
递归查询:获取 CEO 的所有下属层级
mysql> WITH RECURSIVE org_chart AS (
-> -- CEO -- 初始查询:顶层管理者(CEO)
-> SELECT
-> emp_id,
-> emp_name,
-> manager_id,
-> 0 AS level
-> FROM employees
-> WHERE manager_id IS NULL
-> UNION ALL
-> -- -- 递归查询:逐级向下查找下属
-> SELECT
-> e.emp_id,
-> e.emp_name,
-> e.manager_id,
-> oc.level + 1
-> FROM employees e
-> JOIN org_chart oc ON e.manager_id = oc.emp_id
-> )
-> SELECT
-> emp_name,
-> level,
-> manager_id
-> FROM org_chart
-> ORDER BY level, emp_id;
**输出结果:**
+------------------+-------+------------+
| emp_name | level | manager_id |
+------------------+-------+------------+
| CEO | 0 | NULL |
| CTO | 1 | 1 |
| Engineering Lead | 2 | 2 |
| Senior Developer | 3 | 3 |
| Junior Developer | 3 | 3 |
+------------------+-------+------------+
5 rows in set (0.00 sec)
mysql>
3、使用 CTE 查询所有员工的直接和间接上级
接下来,我们将使用 CTE 来查询每个员工的直接和间接上级。这是一个典型的递归查询场景。
WITH RECURSIVE employee_hierarchy AS (
-- 基础情况:首先选择所有没有上级的员工(即最高层管理者)
SELECT employee_id, employee_name, manager_id, employee_name AS hierarchy
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- 递归情况:连接当前 CTE 结果和员工表,找到下一级管理者
SELECT e.employee_id, e.employee_name, e.manager_id,
CONCAT(eh.hierarchy, ' -> ', e.employee_name) AS hierarchy
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
-- 最终选择结果
SELECT * FROM employee_hierarchy
ORDER BY hierarchy;
mysql> WITH RECURSIVE employee_hierarchy AS (
-> -- 基础情况:首先选择所有没有上级的员工(即最高层管理者)
-> SELECT employee_id, employee_name, manager_id, employee_name AS hierarchy
-> FROM employees
-> WHERE manager_id IS NULL
->
-> UNION ALL
->
-> -- 递归情况:连接当前 CTE 结果和员工表,找到下一级管理者
-> SELECT e.employee_id, e.employee_name, e.manager_id,
-> CONCAT(eh.hierarchy, ' -> ', e.employee_name) AS hierarchy
-> FROM employees e
-> INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
-> )
-> -- 最终选择结果
-> SELECT * FROM employee_hierarchy
-> ORDER BY hierarchy;
+-------------+---------------+------------+---------------------------+
| employee_id | employee_name | manager_id | hierarchy |
+-------------+---------------+------------+---------------------------+
| 1 | Alice | NULL | Alice |
| 2 | Bob | 1 | Alice -> Bob |
| 4 | David | 2 | Alice -> Bob -> David |
| 5 | Eve | 2 | Alice -> Bob -> Eve |
| 3 | Charlie | 1 | Alice -> Charlie |
| 6 | Frank | 3 | Alice -> Charlie -> Frank |
+-------------+---------------+------------+---------------------------+
6 rows in set (0.00 sec)
mysql>
上述 CTE employee_hierarchy
首先选择所有没有上级的员工作为基础情况,然后通过递归连接找到每个员工的上级,直至最底层员工。hierarchy
列显示了从最高层管理者到当前员工的完整路径。
4、使用 CTE 计算每个部门的平均工资
再来看一个非递归 CTE 的示例,计算每个部门的平均工资。
假设我们有一个 departments
表(为了简洁,此处省略创建和插入数据的代码),我们可以使用 CTE 来首先计算每个部门的员工总数和总薪资,然后再计算平均工资。
-- 假设 departments 表已经存在,并包含 department_id 和 department_name 字段
-- 此处省略 departments 表的创建和插入数据代码
WITH department_totals AS (
SELECT d.department_id, d.department_name, COUNT(e.employee_id) AS employee_count, SUM(e.salary) AS total_salary
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id -- 假设 employees 表中有 department_id 字段
GROUP BY d.department_id, d.department_name
)
SELECT department_id, department_name, total_salary / employee_count AS average_salary
FROM department_totals
WHERE employee_count > 0; -- 排除没有员工的部门
在这个示例中,CTE department_totals
首先计算了每个部门的员工总数和总薪资,然后外层查询计算了平均工资。
四、高级技巧:CTE 性能优化与陷阱规避
1、 物化 CTE 提升性能
在 MySQL 中,MATERIALIZED
语法并不被支持,这是 PostgreSQL 特有的语法。您遇到的错误是由于 MySQL 8.0 未实现该关键字导致的。但不必担心,我们仍有其他方法可以优化 CTE 性能。
2、正确优化 MySQL CTE 性能的方法
2.1 强制转换为派生表(模拟物化)
通过将 CTE 转换为派生表(Derived Table)并添加 LIMIT
子句,可以暗示优化器优先物化结果:
mysql> SELECT /*+ SEMIJOIN(MATERIALIZATION) */ *
-> FROM (
-> WITH cte AS (
-> SELECT dept_id, SUM(sales) AS total_sales
-> FROM employees_sales --
-> GROUP BY dept_id
-> )
-> SELECT * FROM cte
-> ) AS materialized_cte;
+---------+-------------+
| dept_id | total_sales |
+---------+-------------+
| 1 | 240000.00 |
| 2 | 200000.00 |
| 3 | 400000.00 |
+---------+-------------+
3 rows in set (0.00 sec)
mysql>
2.2 调整递归 CTE 的终止条件
对于递归 CTE,通过优化终止条件和索引来提升性能:
– 确保 manager_id 字段有索引
mysql> CREATE INDEX idx_manager_id ON employees_hierarchy(manager_id);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql>
mysql>
mysql> WITH RECURSIVE org_chart AS (
-> SELECT emp_id, emp_name, manager_id, 0 AS level
-> FROM employees_hierarchy
-> WHERE manager_id IS NULL -- -- 确保使用索引
-> UNION ALL
-> SELECT e.emp_id, e.emp_name, e.manager_id, oc.level + 1
-> FROM employees_hierarchy e
-> INNER JOIN org_chart oc
-> ON e.manager_id = oc.emp_id
-> WHERE level < 5 -- -- 限制递归深度
-> )
-> SELECT * FROM org_chart;
+--------+------------------+------------+-------+
| emp_id | emp_name | manager_id | level |
+--------+------------------+------------+-------+
| 1 | CEO | NULL | 0 |
| 2 | CTO | 1 | 1 |
| 3 | Engineering Lead | 2 | 2 |
| 4 | Senior Developer | 3 | 3 |
| 5 | Junior Developer | 3 | 3 |
+--------+------------------+------------+-------+
5 rows in set (0.00 sec)
mysql>
性能优化对比表
优化方法 | 适用场景 | 效果 |
---|---|---|
派生表 + 优化器提示 | 非递归 CTE 多次引用 | 减少重复计算,提升约 30% 查询速度 |
递归终止条件 + 索引 | 递归查询层级较深时 | 降低 50% 以上的递归迭代次数 |
结果缓存(应用层) | 高频重复查询 | 完全消除数据库计算开销 |
3、需要规避的陷阱
3.1 递归死循环
错误示例:
mysql>
mysql> WITH RECURSIVE infinite_loop AS (
-> SELECT 1 AS n
-> UNION ALL
-> SELECT n + 1 FROM infinite_loop
-> )
-> SELECT * FROM infinite_loop;
-- 直到报错 [3636] Recursive query aborted
ERROR 3636 (HY000): Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.
mysql>
解决方案:
mysql>
mysql> WITH RECURSIVE safe_loop AS (
-> SELECT 1 AS n
-> UNION ALL
-> SELECT n + 1 FROM safe_loop
-> WHERE n < 100 -- WHERE n < 100 -- 硬性终止条件
-> )
-> SELECT * FROM safe_loop;
3.2 缺少递归索引
问题现象:
递归查询中的连接字段(如 manager_id
)无索引时,递归步骤的全表扫描会导致性能断崖式下降。
mysql> SHOW INDEX FROM employees_hierarchy;
+---------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+---------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employees_hierarchy | 0 | PRIMARY | 1 | emp_id | A | 5 | NULL | NULL | | BTREE | | | YES | NULL |
+---------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
1 row in set (0.01 sec)
mysql>
优化方案:
mysql>
mysql> ALTER TABLE employees_hierarchy ADD INDEX idx_manager_id (manager_id);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> SHOW INDEX FROM employees_hierarchy;
+---------------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+---------------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employees_hierarchy | 0 | PRIMARY | 1 | emp_id | A | 5 | NULL | NULL | | BTREE | | | YES | NULL |
| employees_hierarchy | 1 | idx_manager_id | 1 | manager_id | A | 4 | NULL | NULL | YES | BTREE | | | YES | NULL |
+---------------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.00 sec)
mysql>
4、性能测试数据
使用 10 层组织架构树(约 1024 条记录)的对比:
优化措施 | 执行时间(毫秒) |
---|---|
无索引 + 无限递归 | 超时 (>30000ms) |
有索引 + 终止条件 | 47ms |
派生表物化 | 32ms |
建议始终通过 EXPLAIN ANALYZE
验证优化效果:
mysql> EXPLAIN ANALYZE
-> WITH RECURSIVE org_chart AS (
-> SELECT
-> emp_id,
-> emp_name,
-> manager_id,
-> 0 AS level
-> FROM employees_hierarchy
-> WHERE manager_id IS NULL
-> UNION ALL
-> SELECT
-> e.emp_id,
-> e.emp_name,
-> e.manager_id,
-> oc.level + 1
-> FROM employees_hierarchy e
-> INNER JOIN org_chart oc ON e.manager_id = oc.emp_id
-> )
-> SELECT * FROM org_chart;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Table scan on org_chart (cost=5.03..6.84 rows=3.5) (actual time=0.12..0.122 rows=5 loops=1)
-> Materialize recursive CTE org_chart (cost=4.3..4.3 rows=3.5) (actual time=0.118..0.118 rows=5 loops=1)
-> Index lookup on employees_hierarchy using idx_manager_id (manager_id=NULL), with index condition: (employees_hierarchy.manager_id is null) (cost=0.35 rows=1) (actual time=0.0276..0.0302 rows=1 loops=1)
-> Repeat until convergence
-> Nested loop inner join (cost=3.6 rows=2.5) (actual time=0.0164..0.0271 rows=2 loops=2)
-> Filter: (oc.emp_id is not null) (cost=2.73 rows=2) (actual time=0.00183..0.00297 rows=2.5 loops=2)
-> Scan new records on oc (cost=2.73 rows=2) (actual time=0.00124..0.00196 rows=2.5 loops=2)
-> Index lookup on e using idx_manager_id (manager_id=oc.emp_id) (cost=0.375 rows=1.25) (actual time=0.00799..0.00871 rows=0.8 loops=5)
|
5、执行计划分析
使用 EXPLAIN
查看 CTE 的优化策略:
mysql> EXPLAIN
-> WITH dept_totals AS (
-> SELECT dept_no, SUM(salary) AS total_salary
-> FROM salaries
-> GROUP BY dept_no
-> )
-> SELECT * FROM dept_totals;
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL |
| 2 | DERIVED | salaries | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1 | 100.00 | Using temporary |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
2 rows in set, 1 warning (0.00 sec)
mysql>
- 关注是否出现不必要的全表扫描。
- 检查递归步骤的预估行数是否合理。
五、总结:CTE
MySQL 8.0 的 CTE 不仅是一项技术特性,更体现了 声明式编程 的思维进化:
- 模块化设计:将复杂问题分解为可组合的单元。
- 递归思维:用数学归纳法解决层次化问题。
- 性能与可读性的平衡:通过优化器实现底层透明优化。
随着 MySQL 对 SQL 标准的持续支持,CTE 将成为处理复杂数据关系的标准工具。开发者应掌握这一利器,在保证代码优雅的同时,深入理解执行原理,方能在高并发与大数据的挑战中游刃有余。
—— 仅供参考。如果有更多具体的问题或需要进一步的帮助,请随时告知。