Fetch First vs. Rownum

原创赵勇 2020-08-30

1320

Filed under: Execution plans,Oracle,Partitioning,Performance,Problem Solving,Tuning — Jonathan Lewis @ 10:11 am BST Jul 15,2020
我曾经多次指出，如果你使用的是标准版，但又想使用分区功能的优势，那么你可以利用分区视图（我刚刚搜索了一下我的博客想找到一个适合的文章链接，但我发现我没有发表过任何有关分区视图的文章）来做一些适当的操作。

我还指出，12c支持使用“fetch first N rows”来替代“where rownum <= N”，但使用这个功能会有一个隐藏的风险。因为“fetch first N”隐藏了对分析函数row_number() over()的使用。

今天这篇文章汇集了这两个主题，是由一个关于解决某个特定编码问题的请求引起的，这个问题在Oracle开发人员论坛上出现过几次，并且可能被许多人所遇到。

我会从一个非常简单的模型来演示对“fetch first N”的最简单的使用：

rem
rem     Script:         fetch_first_union.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t_odd
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_odd modify(flag not null, class not null, id not null);
alter table t_odd add constraint to_chk_odd check (flag = 'O');

create index to_i1 on t_odd(class, id);
复制

使用该数据，我想写一个查询，选择id > 9500且class为A的行，按ID排序输出。但是，只要前两行。下面这个非常简单的查询就可以得到我想要的的结果，随后的是来自于12.2.0.1库的执行计划（其A-Rows 和 E-Rows的内容与19.3的库有细微的差别）：

set serveroutput off
set linesize 180

alter session set statistics_level = all;

select  /*+ index(t_odd (class, id)) */
        *
from
        t_odd
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    95 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  VIEW                         |       |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       6 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T_ODD |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       6 |
|*  4 |     INDEX RANGE SCAN          | TO_I1 |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_ODD"."CLASS","T_ODD"."ID")<=2)
   4 - access("CLASS"='A' AND "ID">9500)
复制

这个带有执行时统计数据的执行计划，向我们表明使用索引的索引范围扫描，索引会按我们希望的精准的顺序返回数据。并且 “fetch first 2 rows”如我们所预料地，被转换为row_number() over() ；令我们欣喜的是，“window sort stopkey”使处理很早就停止了，因为Oracle认识到基础数据是以正确的顺序到达的，因此不必获取所有这些数据并对其进行排序。A-Rows列证实了对所发生事情的这种解释。

顺便说一句，您可能会注意到，优化器已经计算了查询的成本，就好像它在获取所有行一样，即使它“知道”它只会获取前两行。这就是为什么我必须包含index提示以使优化器使用明显的索引–另外一种常用的替代方法是使用/+ first_rows（N）/提示，其中N与要获取的行数匹配或接近。如果我忽略了这个提示，优化器将完成一个全表扫描，然后应用一个“window sort push rank”操作来排序并将结果限制在2行。

所以现在我们来看看真正的问题：用户有一个“current”表和一个完全相同的“history”表，并希望对于运行标准版的客户机，用对union all视图的引用替换表引用，或者对运行企业版的客户机，使用分区表的引用来替换表引用，而他们实际上不想执行任何其他代码变化。所以让我们看看当我们建立UNION ALL模型时会发生什么。我从一个名为t_odd的表开始，它只保存id的奇数值，所以我要添加一个名为t_even的表，它只保存id的偶数值。

create table t_even
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_even modify(flag not null, class not null, id not null);
alter table t_even add constraint te_chk_even check (flag = 'E');

create index te_i1 on t_even(class, id);

create or replace view v_bare 
as
select * from t_odd
union all
select * from t_even
/

select
        /*+ 
                index(vw.t_odd  (class, id)) 
                index(vw.t_even (class, id)) 
        */
        *
from
        v_bare vw
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;
复制

如你所见，T_EVEN表（与T_ODD表）有相同的结构和类似的数据。同时，我已经在这两个表上创建了一个UNION ALL的视图，改变查询引用新视图，而不是某一个表。由于成本异常（加上表的尺寸小），我不得不提供两个“全局”提示，告诉优化器使用索引来访问这两个表。那么，当我们有一个union all视图时，优化器的工作做得如何呢？

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |        |      1 |        |   192 (100)|      2 |00:00:00.01 |     190 |       |       |          |
|*  1 |  VIEW                                   |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |     190 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK               |        |      1 |    404 |   192   (2)|      2 |00:00:00.01 |     190 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                                 | V_BARE |      1 |    404 |   191   (1)|    404 |00:00:00.01 |     190 |       |       |          |
|   4 |     UNION-ALL                           |        |      1 |        |            |    404 |00:00:00.01 |     190 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_ODD  |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | TO_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_EVEN |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  8 |       INDEX RANGE SCAN                  | TE_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "VW"."CLASS","VW"."ID")<=2)
   6 - access("CLASS"='A' AND "ID">9500)
   8 - access("CLASS"='A' AND "ID">9500)
复制

答案是：很不幸，优化器还没有聪明到为这个查询找到一个低成本的方法。它在进行窗口排序（相对于pushed rank会更有效）产生正确的结果前，要从两张表上获取全部的相关数据。正如你从A_ROWS列上看到的，它必须从每个表中查询约200行，然后才能得到我们想要的2行。

分区表
因此，让我们尝试通过购买分区选项来解决这个问题，并创建一个具有两个分区的列表分区表，一个标记当前数据，一个标记历史数据，或者像在我的模型中创建的那样，创建奇数和偶数数据。

create table t_pt (
        flag,
        class,
        id,
        v1,
        padding
)
partition by list (flag) (
        partition pO values('O'),
        partition pE values('E')
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

insert into t_pt
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;


create index tp_i1 on t_pt(class, id) local;
alter table t_pt modify (flag not null, class not null, id not null);

execute dbms_stats.gather_table_stats(user,'t_pt',method_opt=>'for all columns size 1', cascade=>true, granularity=>'ALL')
复制

特别要注意的是，我已经在这个分区表上创建了一个本地索引，因此上一个示例中的两个表与本例中的两个分区之间有非常接近的对应关系。以下是我从分区表查询前两行的计划：

select  /*+ index(t_pt (class, id)) */
        *
from
        t_pt
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;


--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |       |      1 |        |   190 (100)|      2 |00:00:00.01 |     189 |       |       |          |
|*  1 |  VIEW                                        |       |      1 |      2 |   190   (2)|      2 |00:00:00.01 |     189 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK                    |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |     189 |  2048 |  2048 | 2048  (0)|
|   3 |    PARTITION LIST ALL                        |       |      1 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|*  5 |      INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|    404 |00:00:00.01 |       4 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_PT"."CLASS","T_PT"."ID")<=2)
   5 - access("CLASS"='A' AND "ID">9500)
复制

优化器又让我们失望了。这个计划告诉我们，在应用row_number（）分析函数之前，我们必须从这两个分区中获取所有相关的数据，并丢弃我们想要的两行之外的所有数据。（同样，我们可以检查A-Rows列，看到我们已经从表中获取了总共404行）

但是，如果我们回到传统的（非标准）rownum方法，会发生什么：

select
        *
from    (
        select  /*+ index(t_pt (class, id)) */
                *
        from
                t_pt
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        )
where
        rownum <= 2
;

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |       |      1 |        |   190 (100)|      2 |00:00:00.01 |       6 |       |       |          |
|*  1 |  COUNT STOPKEY                                 |       |      1 |        |            |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   VIEW                                         |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY                       |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |  2048 |  2048 | 2048  (0)|
|   4 |     PARTITION LIST ALL                         |       |      1 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  5 |      COUNT STOPKEY                             |       |      2 |        |            |      4 |00:00:00.01 |       6 |       |       |          |
|   6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  7 |        INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|      4 |00:00:00.01 |       4 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=2)
   3 - filter(ROWNUM<=2)
   5 - filter(ROWNUM<=2)
   7 - access("CLASS"='A' AND "ID">9500)
复制

Oracle确实知道如何用好rownum–请注意，在partition list all操作的子操作中有一个count stopkey操作，就是我们的rownum<=2谓词首次应用的地方。对于每个分区，Oracle都会找到“前两行”，在从每个分区收集到两行之后，它会对这些行进行排序（又一次使用stopkey）以找到该子集中的前两行。检查A-Rows列，我们从表中总共选择了4行（每个分区2行），并在操作3中将其减少到2行。

结论
在某些处理方式中，分区表比分区视图对编码人员更友好；如果您必须坚持使用标准版，则通常可以得到您想要的结果，但编码方面的投资可能要高得多。不过，即使使用分区表，也有一些“老式”的Oracle方法比一些新的“ANSI”机制做得更好。

附注
这里出现的问题的一部分，解决了保留已经存在的SQL生成模式的愿望，只不过是用视图（或分区表）名称替换表名。
如前所述，如果从一个简单的堆表开始，并尝试用分区表替换它，则必须使用rownum机制，而不是fetch first N rows机制。

如果您使用的是标准版，那么就不能简单地用union all视图的名称替换表名；您必须更改代码生成器，以应用所有谓词两次（每个表一次），然后在union all之外再次应用rownum谓词或fetch first指令。换句话说，您必须精确地模拟oraclee对分区表和rownum的处理方式。

select
        flag, class, id, v1
from
        (
        select
                /*+ index(t_odd (class, id)) */
                flag, class, id, v1
        from
                t_odd
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
        union all
        (
        select
                /*+ index(t_even (class, id)) */
                flag, class, id, v1
        from
                t_even
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
order by
        class, id
fetch 
        first 2 rows only
;

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |        |      1 |        |   192 (100)|      2 |00:00:00.01 |       8 |       |       |          |
|*  1 |  VIEW                              |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |       8 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK          |        |      1 |      4 |   192   (2)|      2 |00:00:00.01 |       8 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                            |        |      1 |      4 |   191   (1)|      4 |00:00:00.01 |       8 |       |       |          |
|   4 |     UNION-ALL                      |        |      1 |        |            |      4 |00:00:00.01 |       8 |       |       |          |
|   5 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  6 |       VIEW                         |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY       |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|   8 |         TABLE ACCESS BY INDEX ROWID| T_ODD  |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|*  9 |          INDEX RANGE SCAN          | TO_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
|* 10 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|* 11 |       WINDOW NOSORT STOPKEY        |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|  12 |        TABLE ACCESS BY INDEX ROWID | T_EVEN |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|* 13 |         INDEX RANGE SCAN           | TE_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_007"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "from$_subquery$_006"."CLASS","from$_subquery$_006"."ID")<=2)
   6 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=2)
   7 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
   9 - access("CLASS"='A' AND "ID">9500)
  10 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
  11 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
  13 - access("CLASS"='A' AND "ID">9500)

复制

如上所示，E-Rows仍然预测会有很多工作，但是A-Rows告诉我们工作如我们所愿地保持了最小。

原文链接：https://jonathanlewis.wordpress.com/2020/07/15/fetch-first-vs-rownum/
原文内容：
Filed under: Execution plans,Oracle,Partitioning,Performance,Problem Solving,Tuning — Jonathan Lewis @ 10:11 am BST Jul 15,2020

I’ve pointed out fairly frequently that if you’re running Standard Edition but would like to take advantage of a few features of the Partitioning option then you might be able to do something appropriate with Partition Views (but I’ve just discovered while searching my blog for a suitable item to link to that I haven’t published any of my PV notes on the blog).

I’ve also pointed out that while 12c allows you to use “fetch first N rows” instead of “where rownum <= N” there’s a hidden threat to using the feature because “fetch first N” turns into a hidden row_number() over() analytic function.

Today’s note is a combination of these two topics, prompted by a request to solve a particular coding issue that has appeared a few times on the Oracle Developer Forum and is likely to be recognisable to a number of people.

I’ll start with a very simple model demonstrateing the simplest use of “fetch first N”:

rem
rem     Script:         fetch_first_union.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t_odd
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_odd modify(flag not null, class not null, id not null);
alter table t_odd add constraint to_chk_odd check (flag = 'O');

create index to_i1 on t_odd(class, id);
复制

With this data set I want to write a query that selects rows for class A where id > 9500, ordered by id – but I only want the first two rows. Here’s a very simple query that gets the result I want, followed by the execution plan from 12.2.0.1 (the A-Rows and E-Rows from 19.3 are slightly different):

set serveroutput off
set linesize 180

alter session set statistics_level = all;

select  /*+ index(t_odd (class, id)) */
        *
from
        t_odd
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    95 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  VIEW                         |       |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       6 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T_ODD |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       6 |
|*  4 |     INDEX RANGE SCAN          | TO_I1 |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_ODD"."CLASS","T_ODD"."ID")<=2)
   4 - access("CLASS"='A' AND "ID">9500)
复制

The plan – with its stats – shows us that we’ve done an index range scan of an index which will return the data in exactly the order we want, and the “fetch first 2 rows” has been translated into the row_number() over() that we expected; but to our great joy the “window sort stopkey” makes the processing stop very early because Oracle recognises that the base data is arriving in the right order so it isn’t necessary to fetch all of it and sort it. The A-Rows column confirms this interpretation of what has happened.

You might notice, by the way, that the optimizer has costed the query as if it were fetching all the rows even though it “knows” that it’s going to fetch only the first two rows. That’s why I had to include the index hint to make the optimizer use the obvious index – a popular alternative is to use the /*+ first_rows(N) */ hint where N matches, or is similar to, the number of rows you want to fetch. If I had omitted the hint the optimizer would have done a full tablescan and then applied a “window sort pushed rank” operation to sort and limit the result to 2 rows.

So now we come to the real problem: the user has a “current” table and an identical “history” table, and would like to replace the table reference with a reference to a union all view for their clients on Standard Edition, or to a partitioned table for clients running Enterprise Edition – and they don’t really want to do any other code changes. So let’s see what happens when we model the union all. I started with a table called t_odd that held only odd values for id, so I’m going to add a table called t_even that holds only even values for id.

create table t_even
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_even modify(flag not null, class not null, id not null);
alter table t_even add constraint te_chk_even check (flag = 'E');

create index te_i1 on t_even(class, id);

create or replace view v_bare 
as
select * from t_odd
union all
select * from t_even
/

select
        /*+ 
                index(vw.t_odd  (class, id)) 
                index(vw.t_even (class, id)) 
        */
        *
from
        v_bare vw
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;
复制

As you can see t_even is an identically structured table with similar data, and I’ve created a union all view on top of the two tables, changing the query to reference the view rather than referencing a table. Thanks to the costing anomaly (combined with the small size of the tables) I’ve had to supply a couple of “global” hints to tell the optimizer to use the indexes to access the two tables. So how well does the optimizer do its job when we have a union all view?

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |        |      1 |        |   192 (100)|      2 |00:00:00.01 |     190 |       |       |          |
|*  1 |  VIEW                                   |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |     190 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK               |        |      1 |    404 |   192   (2)|      2 |00:00:00.01 |     190 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                                 | V_BARE |      1 |    404 |   191   (1)|    404 |00:00:00.01 |     190 |       |       |          |
|   4 |     UNION-ALL                           |        |      1 |        |            |    404 |00:00:00.01 |     190 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_ODD  |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | TO_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_EVEN |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  8 |       INDEX RANGE SCAN                  | TE_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "VW"."CLASS","VW"."ID")<=2)
   6 - access("CLASS"='A' AND "ID">9500)
   8 - access("CLASS"='A' AND "ID">9500)
复制

Answer: Bad luck, the optimizer isn’t smart enought to find a cheap way through this query. It’s fetched all the relevant data from the two tables before applying the window sort (which it does with some efficiency – the pushed rank) to produce the right answer. As you can see from the A-Rows column, though, it’s had to acquire a couple of hundred rows from each table before getting down to the 2 rows we wanted.
Partitioned Tables

So let’s try to solve the problem by buying into the partitioning option and creating a list-partitioned table with two partitions, one flagged for current data and one flagged for history data – or ‘O’dd and ‘E’ven data as I’ve created them in my model.

create table t_pt (
        flag,
        class,
        id,
        v1,
        padding
)
partition by list (flag) (
        partition pO values('O'),
        partition pE values('E')
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

insert into t_pt
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;


create index tp_i1 on t_pt(class, id) local;
alter table t_pt modify (flag not null, class not null, id not null);

execute dbms_stats.gather_table_stats(user,'t_pt',method_opt=>'for all columns size 1', cascade=>true, granularity=>'ALL')
复制

Note particularly that I have created a local index on this partitioned table – so there’s a very close correspondance between the two tables in the previous example and the two partitions in this example. Here’s the plan when I query the partitioned table for the first two rows:

select  /*+ index(t_pt (class, id)) */
        *
from
        t_pt
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;


--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |       |      1 |        |   190 (100)|      2 |00:00:00.01 |     189 |       |       |          |
|*  1 |  VIEW                                        |       |      1 |      2 |   190   (2)|      2 |00:00:00.01 |     189 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK                    |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |     189 |  2048 |  2048 | 2048  (0)|
|   3 |    PARTITION LIST ALL                        |       |      1 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|*  5 |      INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|    404 |00:00:00.01 |       4 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_PT"."CLASS","T_PT"."ID")<=2)
   5 - access("CLASS"='A' AND "ID">9500)
复制

The optimizer has let us down again. The plan shows us that we have to acquire all the relevant data from the two partitions before applying the row_number() analytic function and discarding all but the two rows we want. (Again we can check the A-Rows column to see that we have started by fetching a total of 404 rows from the table.)

But what happens if we fall back to the good old-fashioned (non-standard) rownum method:

select
        *
from    (
        select  /*+ index(t_pt (class, id)) */
                *
        from
                t_pt
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        )
where
        rownum <= 2
;

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |       |      1 |        |   190 (100)|      2 |00:00:00.01 |       6 |       |       |          |
|*  1 |  COUNT STOPKEY                                 |       |      1 |        |            |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   VIEW                                         |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY                       |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |  2048 |  2048 | 2048  (0)|
|   4 |     PARTITION LIST ALL                         |       |      1 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  5 |      COUNT STOPKEY                             |       |      2 |        |            |      4 |00:00:00.01 |       6 |       |       |          |
|   6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  7 |        INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|      4 |00:00:00.01 |       4 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=2)
   3 - filter(ROWNUM<=2)
   5 - filter(ROWNUM<=2)
   7 - access("CLASS"='A' AND "ID">9500)
复制

Oracle really knows how to use rownum well – notice how there is a count stopkey operation as a child to the partition list all operation, and that’s where our rownum <= 2 predicate is first applied. For each partition Oracle finds “the first two rows” and after it has collected two rows from every partition it sorts them (again with a stopkey) to find the top two in that subset. Check the A-Rows column – we selected a total of 4 rows from the table (2 per partition) and reduced that to 2 rows at operation 3.
Conclusion

There are some patterns of processing where partitioned tables can be a lot friendlier to coders than partition views; if you do have to stick with Standard Edition you can usually get what you want but the coding investment may be significantly higher. Even with partitioned tables, though, there are some “old-fashioned” Oracle methods that do a much nicer job than some of the new-fangled “ANSI” mechanisms.
Footnote

Part of the problem presented here revolves around the desire to keep a pattern of SQL generation that already exists, doing nothing more than replacing a table name with a view (or partitioned table) name.

As we’ve seen, if you start with a simple heap table and try to replace it with a partitioned table you have to use the rownum mechanism rather than the fetch first N rows mechanism.

If you’re running with Standard Edition you can’t do anything simple to replace a table name with the name of a union all view; you’d have to change your code generator to apply all the predicates twice (once for each table) and then apply the rownum predicate or fetch first directive again outside the union all. In other words you have to emulate exactly what Oracle EE manages to do with partitioned tables and rownum.

select
        flag, class, id, v1
from
        (
        select
                /*+ index(t_odd (class, id)) */
                flag, class, id, v1
        from
                t_odd
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
        union all
        (
        select
                /*+ index(t_even (class, id)) */
                flag, class, id, v1
        from
                t_even
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
order by
        class, id
fetch 
        first 2 rows only
;

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |        |      1 |        |   192 (100)|      2 |00:00:00.01 |       8 |       |       |          |
|*  1 |  VIEW                              |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |       8 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK          |        |      1 |      4 |   192   (2)|      2 |00:00:00.01 |       8 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                            |        |      1 |      4 |   191   (1)|      4 |00:00:00.01 |       8 |       |       |          |
|   4 |     UNION-ALL                      |        |      1 |        |            |      4 |00:00:00.01 |       8 |       |       |          |
|   5 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  6 |       VIEW                         |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY       |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|   8 |         TABLE ACCESS BY INDEX ROWID| T_ODD  |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|*  9 |          INDEX RANGE SCAN          | TO_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
|* 10 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|* 11 |       WINDOW NOSORT STOPKEY        |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|  12 |        TABLE ACCESS BY INDEX ROWID | T_EVEN |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|* 13 |         INDEX RANGE SCAN           | TE_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_007"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "from$_subquery$_006"."CLASS","from$_subquery$_006"."ID")<=2)
   6 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=2)
   7 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
   9 - access("CLASS"='A' AND "ID">9500)
  10 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
  11 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
  13 - access("CLASS"='A' AND "ID">9500)

复制

As you can see, the E-Rows still predicts a lot of work, but the A-Rows tells us the work was kept to the minimum we want.

最后修改时间：2020-08-31 10:34:08

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

1人已赞赏

Fetch First vs. Rownum

评论

相关阅读