实际上,在介绍近期的新特性之前,让我先介绍一下Oracle现存的用于聚合的方法的另一方面。此前用到的例子都是基于向子分区插入新行,尽管这是一些表和一些系统所采用和适合的方法,但在一个近实时的数据仓库上,你在某些时刻,可能会使用分区交换。这意味着,我们需要理解统计信息,是如何被收集并聚合向上到分区和表一级。
虽然可以有其它方法,但我要说,你更可能会使用以下两种方法:
1)创建一个临时的中间表,用它加载数据,收集它的统计信息,然后与最终表中的子分区做分区交换。
2)创建一个临时的中间表,用它加载数据,与相关的子分区做分区交换,然后在子分区上收集统计信息。
Pete Scott在前一篇贴子中发表评论,称他很少使用方法1 。所以,不用怀疑,他会在这时发表另一个评论,阐述他的理由。:-)我想向你展示的是,如果你使用方法1 会发生什么。并介绍隐藏参数 _minimal_stats_aggregation,一个从8i时就存在的参数。其默认设置是TRUE,这意味着Oracle会最小化自动统计信息聚合的活动。让我们看看实际的表现。
首先,我会重建存在于本系列贴开始时的TEST_TAB1表,并添加一个新的分区(隐含创建相应的子分区),以及创建一个独立的表,用它来加载数据。
SQL> ALTER TABLE TEST_TAB1 2 ADD PARTITION P_20100209 VALUES LESS THAN (20100210); Table altered. SQL> DROP TABLE LOAD_TAB1; Table dropped. SQL> CREATE TABLE LOAD_TAB1 2 AS SELECT * FROM TEST_TAB1 WHERE 1=0; Table created. SQL> CREATE UNIQUE INDEX LOAD_TAB1_IX1 ON LOAD_TAB1 2 (REPORTING_DATE, SOURCE_SYSTEM, SEQ_ID) 3 NOPARALLEL COMPRESS 1; Index created.
复制
现在,我会为LOAD_TAB1中4个不同的子分区重复同样的处理过程–插入数据到LOAD_TAB1,收集其统计信息,然后与TEST_TAB1中的相应子分区做交换
SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 400, 'P'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 600, 'P'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 900, 'Z'); 1 row created. SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1'); PL/SQL procedure successfully completed. SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_GROT WITH TABLE load_tab1; Table altered. SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20100209_GROT REBUILD UNUSABLE LOCAL INDEXES; Table altered. SQL> TRUNCATE TABLE LOAD_TAB1; Table truncated. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 400, 'U'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 600, 'U'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 900, 'U'); 1 row created. SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1'); PL/SQL procedure successfully completed. SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_JUNE WITH TABLE load_tab1; Table altered. SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20100209_JUNE REBUILD UNUSABLE LOCAL INDEXES; Table altered. SQL> TRUNCATE TABLE LOAD_TAB1; Table truncated. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 400, 'N'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 600, 'N'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 900, 'N'); 1 row created. SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1'); PL/SQL procedure successfully completed. SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_HALO WITH TABLE load_tab1; Table altered. SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20100209_HALO REBUILD UNUSABLE LOCAL INDEXES; Table altered. SQL> TRUNCATE TABLE LOAD_TAB1; Table truncated. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 400, 'P'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 600, 'P'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 900, 'Z'); 1 row created. SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1'); PL/SQL procedure successfully completed. SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_OTHERS WITH TABLE load_tab1; Table altered. SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20100209_OTHERS REBUILD UNUSABLE LOCAL INDEXES; Table altered.
复制
P_20100209的全部子分区都有了统计信息,该统计信息是在分区交换操作过程中被交换进来的。所以,我们对将会聚合全局统计信息抱有期望。
SQL> select table_name, global_stats, last_analyzed, num_rows 2 from dba_tables 3 where table_name='TEST_TAB1' 4 and owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ --- -------------------- ---------- TEST_TAB1 NO SQL> select table_name, partition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_partitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME PARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131 NO TEST_TAB1 P_20100201 NO TEST_TAB1 P_20100202 NO TEST_TAB1 P_20100203 NO TEST_TAB1 P_20100204 NO TEST_TAB1 P_20100205 NO TEST_TAB1 P_20100206 NO TEST_TAB1 P_20100207 NO TEST_TAB1 P_20100209 NO 9 rows selected.
复制
哦,看起来好象没有作用。也许LOAD_TAB1的统计信息没有被正确收集,或者不能出现在分区交换操作中?
SQL> select table_name, subpartition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_subpartitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME SUBPARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131_GROT NO TEST_TAB1 P_20100131_HALO NO TEST_TAB1 P_20100131_JUNE NO TEST_TAB1 P_20100131_OTHERS NO <<output snipped>> TEST_TAB1 P_20100209_GROT NO 28-FEB-2010 21:41:47 3 TEST_TAB1 P_20100209_HALO NO 28-FEB-2010 21:41:49 3 TEST_TAB1 P_20100209_JUNE NO 28-FEB-2010 21:41:49 3 TEST_TAB1 P_20100209_OTHERS NO 28-FEB-2010 21:41:50 3 36 rows selected.
复制
子分区的统计信息是好的,但聚合过程并没有发生是由于_miminal_stats_aggregation参数设为了TRUE(默认值),这会让Oracle最小化聚合操作。而最小化聚合操作的方法之一,就是不对分区交换操作的统计信息做聚合。但你可以手动在表分区上手动收集统计信息。如果我们修改该参数为非默认值(对于下划线打头的参数,风险自负。。。),将会看到不同的行为。我运行同样的脚本,只是在会话级增加了修改参数的变化。
SQL> alter session set "_minimal_stats_aggregation"=FALSE; Session altered.
复制
这会让结果变为如下这样:
SQL> select table_name, global_stats, last_analyzed, num_rows 2 from dba_tables 3 where table_name='TEST_TAB1' 4 and owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ --- -------------------- ---------- TEST_TAB1 NO SQL> SQL> select table_name, partition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_partitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME PARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131 NO TEST_TAB1 P_20100201 NO TEST_TAB1 P_20100202 NO TEST_TAB1 P_20100203 NO TEST_TAB1 P_20100204 NO TEST_TAB1 P_20100205 NO TEST_TAB1 P_20100206 NO TEST_TAB1 P_20100207 NO TEST_TAB1 P_20100209 NO 28-FEB-2010 21:41:53 12 9 rows selected.
复制
我们可以看到,在表级别仍然没有统计信息,这是因为还没有在所有分区上都有统计信息,因此聚合不会发生。但是在P_20100209分区上发生了聚合,因为所有相关的子分区上已经有了统计信息。
你需要记住的是,_minimal_stats_aggregation的默认设置,意味着除非你在刚刚交换的分区上显示收集统计信息,否则聚合不会发生。实际上,拷贝统计信息同样也涉及聚合操作,但我会在下一篇中讨论它。
(晚些的更新:最后一句也许不能成真了,我刚刚在家中尝试了一下,得到了不同的结果,所以,需要做更多的研究了)
嗯,在 Randolf Geist’s的博客中,有更多相关的内容。
原文链接:http://oracledoug.com/serendipity/index.php?/archives/1570-Statistics-on-Partitioned-Tables-Part-5.html
原文内容:
Actually, before looking at any recent features, let me introduce one more aspect of the existing aggregation approach used by Oracle. The examples used to date have been based on INSERTing new rows into subpartitions and, although that’s the approach used for some of our tables and will suit some systems, the likelihood is that in a near-real-time DW you will be using partition exchange at some point. Which means we need to understand how the stats might be gathered and then aggregated up to the partition and table-level stats.
Although there might be other approaches, I’d say that there are two distinct approaches you are likely to use.
-
Create a temporary load table, load it with data, gather statistics on it and then exchange it with the relevant subpartition in the real table.
-
Create a temporary load table, load it with data, exchange it with the relevant subpartition and then gather stats on the subpartition.
Pete Scott left a comment on a previous post stating that he rarely uses approach 1 so no doubt he’ll leave another comment here expanding on his reasons 😉 What I want to show you is what happens if you do use approach 1 and introduce the _minimal_stats_aggregation hidden parameter that’s been kicking around since Oracle 8i. The default setting of the parameter is TRUE, which means that Oracle minimises automatic stats aggregation activity. Let’s see that in action.
First of all I’ll recreate TEST_TAB1 as it was at the start of the series and add a new partition (and, by implication, the related subpartitions) and create a seperate table that I’ll load the data into.
SQL> ALTER TABLE TEST_TAB1 2 ADD PARTITION P_20100209 VALUES LESS THAN (20100210); Table altered. SQL> DROP TABLE LOAD_TAB1; Table dropped. SQL> CREATE TABLE LOAD_TAB1 2 AS SELECT * FROM TEST_TAB1 WHERE 1=0; Table created. SQL> CREATE UNIQUE INDEX LOAD_TAB1_IX1 ON LOAD_TAB1 2 (REPORTING_DATE, SOURCE_SYSTEM, SEQ_ID) 3 NOPARALLEL COMPRESS 1; Index created.
复制
Now I’ll use LOAD_TAB1 to repeat the same process for the four different subpartitions - INSERT data into LOAD_TAB1, gather stats on it and then exchange it with the relevant subpartition of TEST_TAB1.
SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 400, 'P'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 600, 'P'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'GROT', 900, 'Z'); 1 row created. SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1'); PL/SQL procedure successfully completed. SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_GROT WITH TABLE load_tab1; Table altered. SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20100209_GROT REBUILD UNUSABLE LOCAL INDEXES; Table altered. SQL> TRUNCATE TABLE LOAD_TAB1; Table truncated. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 400, 'U'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 600, 'U'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'JUNE', 900, 'U'); 1 row created. SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1'); PL/SQL procedure successfully completed. SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_JUNE WITH TABLE load_tab1; Table altered. SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20100209_JUNE REBUILD UNUSABLE LOCAL INDEXES; Table altered. SQL> TRUNCATE TABLE LOAD_TAB1; Table truncated. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 400, 'N'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 600, 'N'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'HALO', 900, 'N'); 1 row created. SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1'); PL/SQL procedure successfully completed. SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_HALO WITH TABLE load_tab1; Table altered. SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20100209_HALO REBUILD UNUSABLE LOCAL INDEXES; Table altered. SQL> TRUNCATE TABLE LOAD_TAB1; Table truncated. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 400, 'P'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 600, 'P'); 1 row created. SQL> INSERT INTO LOAD_TAB1 VALUES (20100209, 'ZZZZ', 900, 'Z'); 1 row created. SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1'); PL/SQL procedure successfully completed. SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20100209_OTHERS WITH TABLE load_tab1; Table altered. SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20100209_OTHERS REBUILD UNUSABLE LOCAL INDEXES; Table altered.
复制
All of the P_20100209 subpartitions have stats that were swapped in as part of the partition exchange operation so hopefully there’ll be some aggregated global statistics.
SQL> select table_name, global_stats, last_analyzed, num_rows 2 from dba_tables 3 where table_name='TEST_TAB1' 4 and owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ --- -------------------- ---------- TEST_TAB1 NO SQL> select table_name, partition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_partitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME PARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131 NO TEST_TAB1 P_20100201 NO TEST_TAB1 P_20100202 NO TEST_TAB1 P_20100203 NO TEST_TAB1 P_20100204 NO TEST_TAB1 P_20100205 NO TEST_TAB1 P_20100206 NO TEST_TAB1 P_20100207 NO TEST_TAB1 P_20100209 NO 9 rows selected.
复制
Oh, well, that doesn’t seem to have worked. Maybe the LOAD_TAB1 stats weren’t gathered correctly or didn’t appear as part of the subpartition exchange operation?
SQL> select table_name, subpartition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_subpartitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME SUBPARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131_GROT NO TEST_TAB1 P_20100131_HALO NO TEST_TAB1 P_20100131_JUNE NO TEST_TAB1 P_20100131_OTHERS NO <<output snipped>> TEST_TAB1 P_20100209_GROT NO 28-FEB-2010 21:41:47 3 TEST_TAB1 P_20100209_HALO NO 28-FEB-2010 21:41:49 3 TEST_TAB1 P_20100209_JUNE NO 28-FEB-2010 21:41:49 3 TEST_TAB1 P_20100209_OTHERS NO 28-FEB-2010 21:41:50 3 36 rows selected.
复制
The subpartition stats are ok, then, but the aggregation process hasn’t happened and that’s because _miminal_stats_aggregation is set to TRUE (the default) which instructs Oracle to minimise aggregation operations and one of the ways it does so is to not aggregate statistics as a result of a partition exchange operation but to leave you to do that manually by gathering stats on the table partition. If we were to modify the parameter to a non-default value (and, being an underscore parameter, that’s your own choice at your own risk …), we would see different behaviour. I ran the same script, but with this small addition that changes the parameter setting at the session level.
SQL> alter session set "_minimal_stats_aggregation"=FALSE; Session altered. Which will change the end result to this ... SQL> select table_name, global_stats, last_analyzed, num_rows 2 from dba_tables 3 where table_name='TEST_TAB1' 4 and owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ --- -------------------- ---------- TEST_TAB1 NO SQL> SQL> select table_name, partition_name, global_stats, last_analyzed, num_rows 2 from dba_tab_partitions 3 where table_name='TEST_TAB1' 4 and table_owner='TESTUSER' 5 order by 1, 2, 4 desc nulls last; TABLE_NAME PARTITION_NAME GLO LAST_ANALYZED NUM_ROWS ------------------------------ ------------------------------ --- -------------------- ---------- TEST_TAB1 P_20100131 NO TEST_TAB1 P_20100201 NO TEST_TAB1 P_20100202 NO TEST_TAB1 P_20100203 NO TEST_TAB1 P_20100204 NO TEST_TAB1 P_20100205 NO TEST_TAB1 P_20100206 NO TEST_TAB1 P_20100207 NO TEST_TAB1 P_20100209 NO 28-FEB-2010 21:41:53 12 9 rows selected.
复制
Note that there are still no statistics at the table level because not all of the partitions have stats yet, so aggregation can’t take place, but there are aggregated statistics on the P_20100209 partition, because all of the relevant subpartitions do have stats.
All you need to remember is that the default setting of _minimal_stats_aggregation means that, unless you explicitly gather statistics on the partitions you’ve just exchanged, aggregation will not take place! Actually, copying stats will also invoke the aggregation process too, but I’ll deal with that in the next post. (Updated later. That last sentence might not be true. I’ve just tried something at home and I’m seeing different results at work, so more investigation needed.)
Oh, and there’s much more on this subject over on Randolf Geist’s blog post.
评论
