在之前的文章里介绍了,外连接条件有or只能走nl的优化,当连接条件有or时,执行计划只能走nl,此时会存在性能问题,因此需要进行等价改写达到改变执行计划的目的。本文主要是针对前面文章的进一步完善
如下SQL进行查询,执行计划中是nl:
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T2.ID T2_ID
,T2.NAME T2_NAME
FROM T1
LEFT JOIN T2
ON (T1.ID = T2.ID OR T1.AGE = T2.ID)
ORDER BY 1;
T1_ID T1_NAME T1_AGE T2_ID T2_NAME
---------- ---------- ---------- ---------- ----------
1 a 1 1 a
1 a 1 1 a
1 a 1 1 a
1 a 1 1 a
2 b 4 2 b
3 c 6 3 c
4 d 5
7 rows selected.
Plan hash value: 3004654521
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 |00:00:00.01 | 42 | | | |
| 1 | SORT ORDER BY | | 1 | 5 | 7 |00:00:00.01 | 42 | 2048 | 2048 | 2048 (0)|
| 2 | NESTED LOOPS OUTER | | 1 | 5 | 7 |00:00:00.01 | 42 | | | |
| 3 | TABLE ACCESS FULL | T1 | 1 | 5 | 5 |00:00:00.01 | 7 | | | |
| 4 | VIEW | | 5 | 1 | 6 |00:00:00.01 | 35 | | | |
|* 5 | TABLE ACCESS FULL| T2 | 5 | 1 | 6 |00:00:00.01 | 35 | | | |
------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter(("T1"."ID"="T2"."ID" OR "T1"."AGE"="T2"."ID"))
前面文章提到过,可以用union all加row_number()函数的方式进行等价改写,但是在改写时需要注意被驱动表的连接字段id是否有重复值(之前的方法是基于id没有重复值的),考虑到改写时的方便与准确,所以可以采用以下改写方式进行改写,同样适用于内连接,适用性更广。
SELECT TMP3.*
FROM (SELECT TMP2.*
,COUNT(*) OVER(PARTITION BY TMP2.T1_RID) CNT
FROM (SELECT TMP1.*
,ROW_NUMBER() OVER(PARTITION BY TMP1.T1_RID, TMP1.T2_RID ORDER BY TMP1.T2_RID) RN
FROM (SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
UNION ALL
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.AGE = T2.ID) TMP1) TMP2
WHERE RN = 1) TMP3
WHERE (TMP3.CNT > 1 AND TMP3.T2_RID IS NOT NULL)
OR TMP3.CNT = 1
T1_ID T1_NAME T1_AGE T1_RID T2_ID T2_NAME T2_RID RN CNT
---------- ---------- ---------- ------------------ ---------- ---------- ------------------ ---------- ----------
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAA 1 2
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAB 1 2
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAA 1 2
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAB 1 2
2 b 4 AAAVwOAAEAAABydAAC 2 b AAAVwPAAEAAABy9AAC 1 2
3 c 6 AAAVwOAAEAAABydAAD 3 c AAAVwPAAEAAABy9AAD 1 2
4 d 5 AAAVwOAAEAAABydAAE 1 1
7 rows selected.
Plan hash value: 55908890
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 |00:00:00.01 | 28 | | | |
|* 1 | VIEW | | 1 | 10 | 7 |00:00:00.01 | 28 | | | |
| 2 | WINDOW BUFFER | | 1 | 10 | 9 |00:00:00.01 | 28 | 2048 | 2048 | 2048 (0)|
|* 3 | VIEW | | 1 | 10 | 9 |00:00:00.01 | 28 | | | |
|* 4 | WINDOW SORT PUSHED RANK| | 1 | 10 | 14 |00:00:00.01 | 28 | 2048 | 2048 | 2048 (0)|
| 5 | VIEW | | 1 | 10 | 14 |00:00:00.01 | 28 | | | |
| 6 | UNION-ALL | | 1 | | 14 |00:00:00.01 | 28 | | | |
|* 7 | HASH JOIN OUTER | | 1 | 5 | 7 |00:00:00.01 | 14 | 1321K| 1321K| 769K (0)|
| 8 | TABLE ACCESS FULL | T1 | 1 | 5 | 5 |00:00:00.01 | 7 | | | |
| 9 | TABLE ACCESS FULL | T2 | 1 | 4 | 4 |00:00:00.01 | 7 | | | |
|* 10 | HASH JOIN OUTER | | 1 | 5 | 7 |00:00:00.01 | 14 | 1321K| 1321K| 648K (0)|
| 11 | TABLE ACCESS FULL | T1 | 1 | 5 | 5 |00:00:00.01 | 7 | | | |
| 12 | TABLE ACCESS FULL | T2 | 1 | 4 | 4 |00:00:00.01 | 7 | | | |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((("TMP3"."CNT">1 AND "TMP3"."T2_RID" IS NOT NULL) OR "TMP3"."CNT"=1))
3 - filter("RN"=1)
4 - filter(ROW_NUMBER() OVER ( PARTITION BY "TMP1"."T1_RID","TMP1"."T2_RID" ORDER BY NULL )<=1)
7 - access("T1"."ID"="T2"."ID")
10 - access("T1"."AGE"="T2"."ID")
接下来解释解释下为什么这样改写。
上图是表t1和表t2的连接情况,其中t1是有重复值的,t2也是有重复值的,相当于N:N的情况,这时候的连接会出现多条记录。
(假设还是从t1的第一行记录和t2表匹配)
第一步:t1的第一行和t2的第一行进行比较,发现满足,返回记录;然后继续和t2的第二行记录比较,满足条件,返回记录(红色表示t1的第一行记录)。
第二步:t1的第二行和t2的第一行进行比较,发现满足,返回记录;然后继续和t2的第二行记录比较,满足条件,返回记录(蓝色表示t1的第二行记录)。
如果还用之前的写法去改写,则会发现结果不正确,原因是把满足条件的记录也去除了:
SELECT *
FROM (SELECT T.*
,ROW_NUMBER() OVER(PARTITION BY T.T1_RID ORDER BY T.T2_ID) RN
FROM (SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T2.ID T2_ID
,T2.NAME T2_NAME
,T1.ROWID T1_RID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
UNION ALL
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T2.ID T2_ID
,T2.NAME T2_NAME
,T1.ROWID T1_RID
FROM T1
LEFT JOIN T2
ON T1.AGE = T2.ID) T)
WHERE RN = 1
ORDER BY 1;
T1_ID T1_NAME T1_AGE T2_ID T2_NAME T1_RID RN
---------- ---------- ---------- ---------- ---------- ------------------ ----------
1 a 1 1 a AAAVwOAAEAAABydAAA 1
1 a 1 1 a AAAVwOAAEAAABydAAB 1
2 b 4 2 b AAAVwOAAEAAABydAAC 1
3 c 6 3 c AAAVwOAAEAAABydAAD 1
4 d 5 AAAVwOAAEAAABydAAE 1
采用之前的改写方法则把满足条件的重复记录也去除了,因此考虑换一种改写方式。
改写共分三步:
改写第一步:把条件里的or两边分别查询,并且用union all连接
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T2.ID T2_ID
,T2.NAME T2_NAME
,T1.ROWID T1_RID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
UNION ALL
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T2.ID T2_ID
,T2.NAME T2_NAME
,T1.ROWID T1_RID
FROM T1
LEFT JOIN T2
ON T1.AGE = T2.ID
分析两部分的结果:
在进行union all之后,出现了重复的记录,而且也产生了不满足条件的记录,因此需要考虑去重(红色)以及把不满足条件(红色块)的记录去除。
改写第二步:对t1表的rowid进行分组,并且按照t2的id进行排序(之前思路)
如果还按照之前的t1的rowid进行分组,根据t2的id进行排序,然后去出每个分组里的第一行记录,那么会存在满足条件的记录也被筛出去了。
第一组里共有4条记录,满足条件的记录是2条,如果只取第一条,则少了记录;
第二组里也是共有4条记录,满足条件的记录也是2条,如果只取第一条,则少了记录。
由于之前是只对t1的rowid进行分组,因此对于满足条件的记录也无法取到,因此考虑根据t1的rowid和t2的rowid共同进行分组,这样也可以唯一确定一行记录,同时取出第一行记录则达到了去重。
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
UNION ALL
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.AGE = T2.ID
T1_ID T1_NAME T1_AGE T1_RID T2_ID T2_NAME T2_RID
---------- ---------- ---------- ------------------ ---------- ---------- ------------------
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAA
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAA
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAB
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAB
2 b 4 AAAVwOAAEAAABydAAC 2 b AAAVwPAAEAAABy9AAC
3 c 6 AAAVwOAAEAAABydAAD 3 c AAAVwPAAEAAABy9AAD
4 d 5 AAAVwOAAEAAABydAAE
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAA
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAA
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAB
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAB
3 c 6 AAAVwOAAEAAABydAAD
4 d 5 AAAVwOAAEAAABydAAE
2 b 4 AAAVwOAAEAAABydAAC
14 rows selected.
进行分组和排序:
SELECT TMP1.*
,ROW_NUMBER() OVER(PARTITION BY TMP1.T1_RID, TMP1.T2_RID ORDER BY TMP1.T2_RID) RN
FROM (SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
UNION ALL
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.AGE = T2.ID) TMP1
T1_ID T1_NAME T1_AGE T1_RID T2_ID T2_NAME T2_RID RN
---------- ---------- ---------- ------------------ ---------- ---------- ------------------ ----------
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAA 1
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAA 2
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAB 1
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAB 2
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAA 1
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAA 2
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAB 1
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAB 2
2 b 4 AAAVwOAAEAAABydAAC 2 b AAAVwPAAEAAABy9AAC 1
2 b 4 AAAVwOAAEAAABydAAC 1
3 c 6 AAAVwOAAEAAABydAAD 3 c AAAVwPAAEAAABy9AAD 1
3 c 6 AAAVwOAAEAAABydAAD 1
4 d 5 AAAVwOAAEAAABydAAE 1
4 d 5 AAAVwOAAEAAABydAAE 2
14 rows selected.
这样通过rowid进行分组,然后取每组第一行记录,则可以得到不重复的记录。
取出每一组的第一行记录:
SELECT TMP2.*
FROM (SELECT TMP1.*
,ROW_NUMBER() OVER(PARTITION BY TMP1.T1_RID, TMP1.T2_RID ORDER BY TMP1.T2_RID) RN
FROM (SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
UNION ALL
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.AGE = T2.ID) TMP1) TMP2
WHERE RN = 1
通过结果可以看到,目前是没有重复的记录,但是存在不满足条件的记录,因此需要想办法把不满足条件的记录去除。
上面结果中(红框内),是通过age=id得出的记录,是不满足条件的记录,因此需要想办法去除这些记录。
想办法去除不满足条件的记录:
SELECT TMP2.*
,COUNT(*) OVER(PARTITION BY TMP2.T1_RID) CNT
FROM (SELECT TMP1.*
,ROW_NUMBER() OVER(PARTITION BY TMP1.T1_RID, TMP1.T2_RID ORDER BY TMP1.T2_RID) RN
FROM (SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
UNION ALL
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.AGE = T2.ID) TMP1) TMP2
WHERE RN = 1
根据T1的rowid进行聚合,计算出每一组的行数,然后根据行数进行过滤,达到去除不满足条件的记录的目的
通过上图可以明显的看到,不满足条件的记录的T2_RID是空的,所以可以根据此情况作为过滤条件。
根据每一组的行数,找到满足条件的记录。
根据行数进行过滤:
第一部分数据:当cnt>1时,并且t2_rid不为空,得到满足条件的记录
第二部分数据:当cnt=1时,得到不匹配的数据
当TMP3.CNT > 1 AND TMP3.T2_RID IS NOT NULL
T1_ID T1_NAME T1_AGE T1_RID T2_ID T2_NAME T2_RID RN CNT
---------- ---------- ---------- ------------------ ---------- ---------- ------------------ ---------- ----------
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAA 1 2
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAB 1 2
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAA 1 2
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAB 1 2
2 b 4 AAAVwOAAEAAABydAAC 2 b AAAVwPAAEAAABy9AAC 1 2
3 c 6 AAAVwOAAEAAABydAAD 3 c AAAVwPAAEAAABy9AAD 1 2
6 rows selected.
当TMP3.CNT = 1
T1_ID T1_NAME T1_AGE T1_RID T2_ID T2_NAME T2_RID RN CNT
---------- ---------- ---------- ------------------ ---------- ---------- ------------------ ---------- ----------
4 d 5 AAAVwOAAEAAABydAAE 1 1
经过过滤,得到了最终满足条件的记录
最终改写:此写法目前来说是适用于多数场景
SELECT TMP3.*
FROM (SELECT TMP2.*
,COUNT(*) OVER(PARTITION BY TMP2.T1_RID) CNT
FROM (SELECT TMP1.*
,ROW_NUMBER() OVER(PARTITION BY TMP1.T1_RID, TMP1.T2_RID ORDER BY TMP1.T2_RID) RN
FROM (SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
UNION ALL
SELECT T1.ID T1_ID
,T1.NAME T1_NAME
,T1.AGE T1_AGE
,T1.ROWID T1_RID
,T2.ID T2_ID
,T2.NAME T2_NAME
,T2.ROWID T2_RID
FROM T1
LEFT JOIN T2
ON T1.AGE = T2.ID) TMP1) TMP2
WHERE RN = 1) TMP3
WHERE (TMP3.CNT > 1 AND TMP3.T2_RID IS NOT NULL)
OR TMP3.CNT = 1
T1_ID T1_NAME T1_AGE T1_RID T2_ID T2_NAME T2_RID RN CNT
---------- ---------- ---------- ------------------ ---------- ---------- ------------------ ---------- ----------
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAA 1 2
1 a 1 AAAVwOAAEAAABydAAA 1 a AAAVwPAAEAAABy9AAB 1 2
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAA 1 2
1 a 1 AAAVwOAAEAAABydAAB 1 a AAAVwPAAEAAABy9AAB 1 2
2 b 4 AAAVwOAAEAAABydAAC 2 b AAAVwPAAEAAABy9AAC 1 2
3 c 6 AAAVwOAAEAAABydAAD 3 c AAAVwPAAEAAABy9AAD 1 2
4 d 5 AAAVwOAAEAAABydAAE 1 1
7 rows selected.
Plan hash value: 55908890
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 |00:00:00.01 | 28 | | | |
|* 1 | VIEW | | 1 | 10 | 7 |00:00:00.01 | 28 | | | |
| 2 | WINDOW BUFFER | | 1 | 10 | 9 |00:00:00.01 | 28 | 2048 | 2048 | 2048 (0)|
|* 3 | VIEW | | 1 | 10 | 9 |00:00:00.01 | 28 | | | |
|* 4 | WINDOW SORT PUSHED RANK| | 1 | 10 | 14 |00:00:00.01 | 28 | 2048 | 2048 | 2048 (0)|
| 5 | VIEW | | 1 | 10 | 14 |00:00:00.01 | 28 | | | |
| 6 | UNION-ALL | | 1 | | 14 |00:00:00.01 | 28 | | | |
|* 7 | HASH JOIN OUTER | | 1 | 5 | 7 |00:00:00.01 | 14 | 1321K| 1321K| 919K (0)|
| 8 | TABLE ACCESS FULL | T1 | 1 | 5 | 5 |00:00:00.01 | 7 | | | |
| 9 | TABLE ACCESS FULL | T2 | 1 | 4 | 4 |00:00:00.01 | 7 | | | |
|* 10 | HASH JOIN OUTER | | 1 | 5 | 7 |00:00:00.01 | 14 | 1321K| 1321K| 798K (0)|
| 11 | TABLE ACCESS FULL | T1 | 1 | 5 | 5 |00:00:00.01 | 7 | | | |
| 12 | TABLE ACCESS FULL | T2 | 1 | 4 | 4 |00:00:00.01 | 7 | | | |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((("TMP3"."CNT">1 AND "TMP3"."T2_RID" IS NOT NULL) OR "TMP3"."CNT"=1))
3 - filter("RN"=1)
4 - filter(ROW_NUMBER() OVER ( PARTITION BY "TMP1"."T1_RID","TMP1"."T2_RID" ORDER BY NULL )<=1)
7 - access("T1"."ID"="T2"."ID")
10 - access("T1"."AGE"="T2"."ID")
最终完成了SQL的等价改写,并且执行计划已经调整,逻辑读由42降到24。