优化器转换：连接因式分解

原创赵勇 2022-11-26

348

我们用一个介绍连接因式分解的转换，来继续我们优化器转换的系列。连接因式分解是在Oracle 11g Release 2时引入的，并被用于UNION ALL查询。UNION ALL常被用于数据库应用程序，特别是数据整合的应用程序中。在很多场景中，UNION ALL查询中的分支会共用一个处理过程。例如，引用同样的表。在当下的Oracle执行策略中，UNION ALL查询中的每一个分支是被单独评估的，这导致了重复的处理，包括数据的访问和连接。连接因式分解提供了一个机会在UNION ALL的分支中共享共同的算法。目前，连接因式分解还仅能分解对共同基表的访问，而不能是视图。
请看一个简单示例查询Q1

Q1:

  select t1.c1, t2.c2
   from t1, t2, t3
   where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c2 = 2 and t2.c2 = t3.c2 
 union all
   select t1.c1, t2.c2
   from t1, t2, t4
   where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3;


复制

表T1出现在两个分支中。T1表上的过滤谓词（t1.c1 > 1) 和涉及的连接谓词（t1.c1 = t2.c1)也是一样的。然而，不做转换时，T1表上的扫描（以及过滤）必须做两次，每个分支一次。这样的查询是可以从连接因式分解中受益的，就像下面从Q1转换为Q2：

Q2:

  select t1.c1, VW_JF_1.item_2
   from t1, (select t2.c1 item_1, t2.c2 item_2
                  from t2, t3
                  where t2.c2 = t3.c2 and t2.c2 = 2                 
                union all
                  select t2.c1 item_1, t2.c2 item_2
                  from t2, t4 
                  where t2.c3 = t4.c3) VW_JF_1
   where t1.c1 = VW_JF_1.item_1 and t1.c1 > 1;


复制

在Q2中，T1是“因式”，并因此在T1上的表扫描和过滤只做了一次（即共享）。假如T1很大，那么避免在T1上额外的一次扫描，会产生极大的性能改善。

连接因式分解的另一个好处是可以开放更多的连接顺序，让我们来看一下查询Q3。

Q3:

 select *
   from t5, (select t1.c1, t2.c2
                 from t1, t2, t3
                 where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c2 = 2 and t2.c2 = t3.c2 
               union all
                 select t1.c1, t2.c2
                 from t1, t2, t4
                 where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3) V;
  where t5.c1 = V.c1

复制

在Q3中，视图V和Q1是一样的。没有连接因式分解时，T1,T2和T3必须先连接，然后才能和T5做连接。但是，如果连接因式分解中来自于视图V的因式T1,可以和T5做连接，这就开放了新的连接顺序。如开始所说，连接因式分解施加了特定的连接顺序。例如Q2，在视图VW_JF_1中，T2和T3出现在UNION ALL查询的第一个分支中。T2在可以和视图VW_JF_1外边的T1连接前，必须先和T3连接。该连接顺序的施行，可能并不是最优的连接顺序。出于这个原因，连接因式分解是基于成本的转换架构；这意味着，我们会评估有或没有连接因式分解的执行计划的成本，并选择成本低的执行计划。

请注意，如果UNION ALL分支中有DISTINCT子句，连接因式分解并不生效。比如，Q4和Q5就不是在语义上等价的。

Q4:

    select distinct t1.* 
     from t1, t2
     where t1.c1 = t2.c1
  union all
     select distinct t1.*
     from t1, t2
     where t1.c1 = t2.c1


复制

Q5:

    select distinct t1.* 
    from t1, (select t2.c1 item_1 
                  from t2
                union all 
                  select t2.c1 item_1
                  from t2) VW_JF_1 
    where t1.c1 = VW_JF_1.item_1


复制

Q4可能比Q5返回更多的行。Q5的结果保证无重复，因为DISTINCT关键字出现在顶层。而Q4的结果可能含有重复。

到目前为止的示例中，还只涉及内连接。连接因式分解也支持外连接，反连接和半连接。但只有在外连接，反连接和半连接的右表可以被因式分解。对外连接，反连接和半连接中的左表做连接因式分解，在语义上是不正确的。比如，Q6和Q7在语义上就是不等价的。

Q6:

  select t1.c1, t2.c2
   from t1, t2
   where t1.c1 = t2.c1(+) and t2.c2 (+) = 2 
union all
   select t1.c1, t2.c2
   from t1, t2 
   where t1.c1 = t2.c1(+) and t2.c2 (+) = 3


复制

Q7:

    select t1.c1, VW_JF_1.item_2
    from t1, (select t2.c1 item_1, t2.c2 item_2
                  from t2
                  where t2.c2 = 2
                union all
                  select t2.c1 item_1, t2.c2 item_2
                  from t2                                                                                  
                  where t2.c2 = 3) VW_JF_1     
  where t1.c1 = VW_JF_1.item_1(+)                                                                 


复制

然而，外连接中的右边是可以因式的。比如，连接因式分解可以通过因式T2，其是外连接中的右侧表，从而转换Q8为Q9：

Q8:

  select t1.c2, t2.c2
   from t1, t2
   where t1.c1 = t2.c1 (+) and t1.c1 = 1
union all
   select t1.c2, t2.c2
   from t1, t2
   where t1.c1 = t2.c1(+) and t1.c1 = 2


复制

Q9:

 select VW_JF_1.item_2, t2.c2
  from t2,
          (select t1.c1 item_1, t1.c2 item_2
           from t1
           where t1.c1 = 1
          union all
           select t1.c1 item_1, t1.c2 item_2
           from t1
           where t1.c1 = 2) VW_JF_1
  where VW_JF_1.item_1 = t2.c1(+)


复制

本文中的全部示例均是从两个分支中因式一个单表。这仅仅是为了方便演示，连接因式分解是可以因式多个表和多于两个的UNION ALL分支的。

总结

连接因式分解是基于成本的转换。它可以从UNION ALL查询的分支中因式共同的运算，从而产生极大的性能改善。

原文链接：https://blogs.oracle.com/optimizer/post/optimizer-transformations-join-factorization
Optimizer Transformations: Join Factorization
January 1, 2020 | 4 minute read
Hong Su

We continue our series on optimizer transformations with a post that describes the Join Factorization transformation. The Join Factorization transformation was introduced in Oracle 11g Release 2 and applies to UNION ALL queries. Union all queries are commonly used in database applications, especially in data integration applications. In many scenarios the branches in a UNION All query share a common processing, i.e, refer to the same tables. In the current Oracle execution strategy, each branch of a UNION ALL query is evaluated independently, which leads to repetitive processing, including data access and join. The join factorization transformation offers an opportunity to share the common computations across the UNION ALL branches. Currently, join factorization only factorizes common references to base tables only, i.e, not views.

Consider a simple example of query Q1.

Q1:

  select t1.c1, t2.c2
   from t1, t2, t3
   where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c2 = 2 and t2.c2 = t3.c2 
 union all
   select t1.c1, t2.c2
   from t1, t2, t4
   where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3;


复制

Table t1 appears in both the branches. As does the filter predicates on t1 (t1.c1 > 1) and the join predicates involving t1 (t1.c1 = t2.c1). Nevertheless, without any transformation, the scan (and the filtering) on t1 has to be done twice, once per branch. Such a query may benefit from join factorization which can transform Q1 into Q2 as follows:

Q2:

  select t1.c1, VW_JF_1.item_2
   from t1, (select t2.c1 item_1, t2.c2 item_2
                  from t2, t3
                  where t2.c2 = t3.c2 and t2.c2 = 2                 
                union all
                  select t2.c1 item_1, t2.c2 item_2
                  from t2, t4 
                  where t2.c3 = t4.c3) VW_JF_1
   where t1.c1 = VW_JF_1.item_1 and t1.c1 > 1;


复制

In Q2, t1 is “factorized” and thus the table scan and the filtering on t1 is done only once (it’s shared). If t1 is large, then avoiding one extra scan of t1 can lead to a huge performance improvement.

Another benefit of join factorization is that it can open up more join orders. Let’s look at query Q3.

Q3:

 select *
   from t5, (select t1.c1, t2.c2
                 from t1, t2, t3
                 where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c2 = 2 and t2.c2 = t3.c2 
               union all
                 select t1.c1, t2.c2
                 from t1, t2, t4
                 where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3) V;
  where t5.c1 = V.c1

复制

In Q3, view V is same as Q1. Before join factorization, t1, t2 and t3 must be joined first before they can be joined with t5. But if join factorization factorizes t1 from view V, t1 can then be joined with t5. This opens up new join orders. That being said, join factorization imposes certain join orders. For example, in Q2, t2 and t3 appear in the first branch of the UNION ALL query in view VW_JF_1. T2 must be joined with t3 before it can be joined with t1 which is outside of the VW_JF_1 view. The imposed join order may not necessarily be the best join order. For this reason, join factorization is performed under cost-based transformation framework; this means that we cost the plans with and without join factorization and choose the cheapest plan.

Note that if the branches in UNION ALL have DISTINCT clauses, join factorization is not valid. For example, Q4 is NOT semantically equivalent to Q5.

Q4:

    select distinct t1.* 
     from t1, t2
     where t1.c1 = t2.c1
  union all
     select distinct t1.*
     from t1, t2
     where t1.c1 = t2.c1


复制

Q5:

    select distinct t1.* 
    from t1, (select t2.c1 item_1 
                  from t2
                union all 
                  select t2.c1 item_1
                  from t2) VW_JF_1 
    where t1.c1 = VW_JF_1.item_1


复制

Q4 might return more rows than Q5. Q5’s results are guaranteed to be duplicate free because of the DISTINCT key word at the top level while Q4’s results might contain duplicates.

The examples given so far involve inner joins only. Join factorization is also supported in outer join, anti join and semi join. But only the right tables of outer join, anti join and semi joins can be factorized. It is not semantically correct to factorize the left table of outer join, anti join or semi join. For example, Q6 is NOT semantically equivalent to Q7.

Q6:

  select t1.c1, t2.c2
   from t1, t2
   where t1.c1 = t2.c1(+) and t2.c2 (+) = 2 
union all
   select t1.c1, t2.c2
   from t1, t2 
   where t1.c1 = t2.c1(+) and t2.c2 (+) = 3


复制

Q7:

    select t1.c1, VW_JF_1.item_2
    from t1, (select t2.c1 item_1, t2.c2 item_2
                  from t2
                  where t2.c2 = 2
                union all
                  select t2.c1 item_1, t2.c2 item_2
                  from t2                                                                                  
                  where t2.c2 = 3) VW_JF_1     
  where t1.c1 = VW_JF_1.item_1(+)                                                                 


复制

However, the right side of an outer join can be factorized. For example, join factorization can transform Q8 to Q9 by factorizing t2, which is the right table of an outer join.

Q8:

  select t1.c2, t2.c2
   from t1, t2
   where t1.c1 = t2.c1 (+) and t1.c1 = 1
union all
   select t1.c2, t2.c2
   from t1, t2
   where t1.c1 = t2.c1(+) and t1.c1 = 2


复制

Q9:

 select VW_JF_1.item_2, t2.c2
  from t2,
          (select t1.c1 item_1, t1.c2 item_2
           from t1
           where t1.c1 = 1
          union all
           select t1.c1 item_1, t1.c2 item_2
           from t1
           where t1.c1 = 2) VW_JF_1
  where VW_JF_1.item_1 = t2.c1(+)


复制

All of the examples in this blog show factorizing a single table from two branches. This is just for ease of illustration. Join factorization can factorize multiple tables and from more than two UNION ALL branches.

Summary

Join factorization is a cost-based transformation. It can factorize common computations from branches in a UNION ALL query which can lead to huge performance improvement.