暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle多列统计信息与直方图对有关联多列查询影响

原创 你好我是李白 2021-05-18
1223

最近在阅读崔华老师《基于Oracle的SQL优化》,测试其中动态采样以及多列统计信息用例时,在自己测试环境结果与书中有出入,进行了一些研究,发现直方图是对测试影响的原因,特将研究过程以及基于Oracle的多列关联查询相关知识整理一下,分享出来。

  1. 测试版本为11.2.0.4.190115 PSU版本
  2. 测试用例以及一些知识来自崔华老师《基于Oracle的SQL优化》
  3. 下文主要涉及Oracle多列统计信息、基数反馈。

1.创建测试表收集统计信息

SQL> create table t2 (c1 varchar2(1),c2 char(2000),n1 number, n2 number); SQL> insert into t2 select 'a','a',trunc(dbms_random.value(0,20)),trunc(dbms_random. value(0,25)) from dba_objects where rownum<10001; SQL> commit; SQL> exec dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'T2',CASCADE=>true, estimate_percent=>100);
复制

2 执行SQL,查看cardinality

2.1 T2表统计信息

*********** Table Level *********** Table Number Empty Average Chain Average Global User Sample Date Name of Rows Blocks Blocks Space Count Row Len Stats Stats Size MM-DD-YYYY --------------- ------------------ -------- ------------ ------- -------- ------- ------ ------ ------------------ ---------- T2 10,000 3,394 0 0 0 #### YES NO 10,000 04-16-2021 Elapsed: 00:00:00.03 Column Column Distinct Number Number Global User Sample Date Name Details Values Density Buckets Nulls Stats Stats Size MM-DD-YYYY ------------------------- ------------------------ ------------ ------- ------- ---------- ------ ------ ------------------ ---------- C1 VARCHAR2(1) 1 1 1 0 YES NO 10,000 04-16-2021 C2 CHAR(2000) 1 1 1 0 YES NO 10,000 04-16-2021 N1 NUMBER(22) 20 0 1 0 YES NO 10,000 04-16-2021 N2 NUMBER(22) 25 0 1 0 YES NO 10,000 04-16-2021
复制

2.2 执行SQL,查看cardinality

SQL> select * from t2 where n1=3 and n2=3 and c1='a'; Execution Plan ---------------------------------------------------------- Plan hash value: 1513984157 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 40180 | 922 (1)| 00:00:12 | |* 1 | TABLE ACCESS FULL| T2 | 20 | 40180 | 922 (1)| 00:00:12 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("N2"=3 AND "N1"=3 AND "C1"='a') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 3360 consistent gets 0 physical reads 0 redo size 5069 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 20 rows processed --可以看到cardinality为20,计算公式为cardinality = num_rows*selectivity= t2_num_rows(n1_selectivity*n2_selectivity)=10000*(1/20*1/25)=20,这在n1 n2列没有任何联系,即不存在任何可以大量筛选数据的关联条件。
复制

2.3 关联列创造联系,再次测试SQL,计算cardinality

SQL> update t2 set n2=n1; SQL> commit; --经过上述update,n1与n2产生了联系,即n1列与n2列使用and关联查询时,两列组合筛选条件已经不能独立使用两列各自selectivity相乘的方法了,会造成cardinality会比实际返回结果小很多。 --再次收集统计信息,确保统计信息准确性。 SQL> exec dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'T2',CASCADE=>true, estimate_percent=>100,method_opt=>'for all columns size 1'); --清空shared pool,防止之前执行计划干扰测试 SQL> alter system flush shared_pool; SQL> alter session set statistics_level=all; SQL> select * from t2 where n1=3 and n2=3 and c1='a'; SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID 92nytgpv0mb76, child number 0 ------------------------------------- select * from t2 where n1=3 and n2=3 and c1='a' Plan hash value: 1513984157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 500 |00:00:00.01 | 3383 | |* 1 | TABLE ACCESS FULL| T2 | 1 | 25 | 500 |00:00:00.01 | 3383 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("N1"=3 AND "N2"=3 AND "C1"='a')) --可以看到cardinality为25,即cardinality = t2_num_rows (n1_selectivity*n2_selectivity)=10000(1/20*1/20)=25 --实际返回501行,显然CBO预估已经严重不准了,因为CBO并不知道两列其实是有联系的。
复制

3. 解决办法

3.1 创建组合索引

SQL> create index idx_t2 on t2(n1,n2); --不收集直方图 SQL> exec dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'T2',CASCADE=>true, estimate_percent=>100,method_opt=>'for all columns size 1'); SQL> alter system flush shared_pool; SQL> alter session set statistics_level=all; SQL> select * from t2 where n1=3 and n2=3 and c1='a'; SQL > select * from table(dbms_xplan.display_cursor(format=>'allstats last')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID 92nytgpv0mb76, child number 0 ------------------------------------- select * from t2 where n1=3 and n2=3 and c1='a' Plan hash value: 2008370210 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 500 |00:00:00.01 | 514 | |* 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 500 | 500 |00:00:00.01 | 514 | |* 2 | INDEX RANGE SCAN | IDX_T2 | 1 | 500 | 500 |00:00:00.01 | 37 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("C1"='a') 2 - access("N1"=3 AND "N2"=3) --可以看到cardinality已经正确,估算是按照idx_t2的distinct值参考的selectivity,即cardinality = t2_num_rows * (1/20) = 500 --估算已经非常准确了。
复制

3.2 收集T2表直方图,再次测试

SQL> exec dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'T2',CASCADE=>true, estimate_percent=>100,method_opt=>'for all columns size auto'); SQL> select table_name,column_name,HISTOGRAM from user_tab_columns where table_name='T2'; Table Column Name Name HISTOGRAM ---------------------------------------- ------------------------- --------------------------------------------- T2 C1 FREQUENCY T2 C2 NONE T2 N1 FREQUENCY T2 N2 FREQUENCY SQL> alter system flush shared_pool; SQL> select * from t2 where n1=3 and n2=3 and c1='a'; SQL> select * from t2 where n1=3 and n2=3 and c1='a'; PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID 92nytgpv0mb76, child number 0 ------------------------------------- select * from t2 where n1=3 and n2=3 and c1='a' Plan hash value: 2008370210 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 500 |00:00:00.01 | 514 | |* 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 25 | 500 |00:00:00.01 | 514 | |* 2 | INDEX RANGE SCAN | IDX_T2 | 1 | 25 | 500 |00:00:00.01 | 37 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("C1"='a') 2 - access("N1"=3 AND "N2"=3) --可以看到列上有了直方图之后,cbo无法参考复合索引统计信息了,所以cardinality已经不准确了,所以要注意直方图对这种情况的影响。
复制

3.3 创建多列统计信息

--创建虚拟列组合 set serveroutput on declare cg_name varchar2(30); begin cg_name := sys.dbms_stats.create_extended_stats('HR','T2','(n1,n2)'); dbms_output.put_line(cg_name); end; / --查看虚拟列组合 SQL> select extension_name, extension from dba_stat_extensions where table_name=upper('&tbl_name'); EXTENSION_NAME EXTENSION ------------------------------ ------------- SYS_STUBZH0IHA7K$KEBJVXO5LOHAS ("N1","N2") SQL> select sys.dbms_stats.show_extended_stats_name('HR','T2','(n1,n2)') col_group_name from dual; COL_GROUP_NAME -------------------------------- SYS_STUBZH0IHA7K$KEBJVXO5LOHAS --查看虚拟列统计信息 SQL> select e.extension col_group, t.num_distinct, t.histogram from user_stat_extensions e, user_tab_col_statistics t where e.extension_name = t.column_name and e.table_name = t.table_name and t.table_name = 'T2'; Distinct COL_GROUP Values HISTOGRAM -------------------- ------------ --------------------------------------------- ("N1","N2") 20 FREQUENCY --重新收集统计信息(只单独针对n1,n2组合列),重新执行SQL,查看cardinality SQL> alter system flush shared_pool; SQL> select * from t2 where n1=3 and n2=3 and c1='a'; SQL> exec dbms_stats.gather_table_stats(ownname => 'HR',tabname => 'T2',method_opt => 'for columns(n1,n2) size auto',estimate_percent => 100,no_invalidate=>false); SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID 92nytgpv0mb76, child number 0 ------------------------------------- select * from t2 where n1=3 and n2=3 and c1='a' Plan hash value: 2008370210 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 500 |00:00:00.01 | 514 | |* 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 500 | 500 |00:00:00.01 | 514 | |* 2 | INDEX RANGE SCAN | IDX_T2 | 1 | 500 | 500 |00:00:00.01 | 37 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("C1"='a') 2 - access("N1"=3 AND "N2"=3) --删除虚拟列 SQL> exec dbms_stats.drop_extended_stats('hr','t2','(n1,n2)');
复制

3.4 测试结论

--经过测试,细分为以下几种情形: (1)当列n1,n2组合未被使用记录如col_usage$时,不会收集n1,n2组合列直方图信息,此时如果n1,n2列各自有直方图信息时,第一次执行会参考n1,n2各自列selectivity,导致第一次执行上面SQL时,cardinality=10000*(1/20*1/20)=25,第二次执行时,cbo会自动使用cardinality feedback修正cardinality。 (2)当列n1,n2组合使用过一次col_usage$有虚拟组合列使用记录时,再收集统计信息,n1,n2组合虚拟列会收集直方图,此时cardinality会参考n1,n2组合列直方图以及统计信息,cbo预估cardinality基本准确。 (3)当表缺失统计信息时,会采用动态采样,cardinality预估也会较为准确。 (4)当表有统计信息,没有虚拟列时,也会有cardinality feedback特性修正cardinality,当SQL反复执行时,cardinality也不会太偏离准确结果。
复制
最后修改时间:2021-05-18 17:46:54
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论