最近在阅读崔华老师《基于Oracle的SQL优化》,测试其中动态采样以及多列统计信息用例时,在自己测试环境结果与书中有出入,进行了一些研究,发现直方图是对测试影响的原因,特将研究过程以及基于Oracle的多列关联查询相关知识整理一下,分享出来。
- 测试版本为11.2.0.4.190115 PSU版本
- 测试用例以及一些知识来自崔华老师《基于Oracle的SQL优化》
- 下文主要涉及Oracle多列统计信息、基数反馈。
1.创建测试表收集统计信息
SQL> create table t2 (c1 varchar2(1),c2 char(2000),n1 number, n2 number);
SQL> insert into t2 select 'a','a',trunc(dbms_random.value(0,20)),trunc(dbms_random. value(0,25)) from dba_objects where rownum<10001;
SQL> commit;
SQL> exec dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'T2',CASCADE=>true, estimate_percent=>100);
复制
2 执行SQL,查看cardinality
2.1 T2表统计信息
***********
Table Level
***********
Table Number Empty Average Chain Average Global User Sample Date
Name of Rows Blocks Blocks Space Count Row Len Stats Stats Size MM-DD-YYYY
--------------- ------------------ -------- ------------ ------- -------- ------- ------ ------ ------------------ ----------
T2 10,000 3,394 0 0 0 #### YES NO 10,000 04-16-2021
Elapsed: 00:00:00.03
Column Column Distinct Number Number Global User Sample Date
Name Details Values Density Buckets Nulls Stats Stats Size MM-DD-YYYY
------------------------- ------------------------ ------------ ------- ------- ---------- ------ ------ ------------------ ----------
C1 VARCHAR2(1) 1 1 1 0 YES NO 10,000 04-16-2021
C2 CHAR(2000) 1 1 1 0 YES NO 10,000 04-16-2021
N1 NUMBER(22) 20 0 1 0 YES NO 10,000 04-16-2021
N2 NUMBER(22) 25 0 1 0 YES NO 10,000 04-16-2021
复制
2.2 执行SQL,查看cardinality
SQL> select * from t2 where n1=3 and n2=3 and c1='a';
Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 40180 | 922 (1)| 00:00:12 |
|* 1 | TABLE ACCESS FULL| T2 | 20 | 40180 | 922 (1)| 00:00:12 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("N2"=3 AND "N1"=3 AND "C1"='a')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3360 consistent gets
0 physical reads
0 redo size
5069 bytes sent via SQL*Net to client
534 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
20 rows processed
--可以看到cardinality为20,计算公式为cardinality = num_rows*selectivity= t2_num_rows(n1_selectivity*n2_selectivity)=10000*(1/20*1/25)=20,这在n1 n2列没有任何联系,即不存在任何可以大量筛选数据的关联条件。
复制
2.3 关联列创造联系,再次测试SQL,计算cardinality
SQL> update t2 set n2=n1;
SQL> commit;
--经过上述update,n1与n2产生了联系,即n1列与n2列使用and关联查询时,两列组合筛选条件已经不能独立使用两列各自selectivity相乘的方法了,会造成cardinality会比实际返回结果小很多。
--再次收集统计信息,确保统计信息准确性。
SQL> exec dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'T2',CASCADE=>true, estimate_percent=>100,method_opt=>'for all columns size 1');
--清空shared pool,防止之前执行计划干扰测试
SQL> alter system flush shared_pool;
SQL> alter session set statistics_level=all;
SQL> select * from t2 where n1=3 and n2=3 and c1='a';
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 92nytgpv0mb76, child number 0
-------------------------------------
select * from t2 where n1=3 and n2=3 and c1='a'
Plan hash value: 1513984157
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 500 |00:00:00.01 | 3383 |
|* 1 | TABLE ACCESS FULL| T2 | 1 | 25 | 500 |00:00:00.01 | 3383 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("N1"=3 AND "N2"=3 AND "C1"='a'))
--可以看到cardinality为25,即cardinality = t2_num_rows (n1_selectivity*n2_selectivity)=10000(1/20*1/20)=25
--实际返回501行,显然CBO预估已经严重不准了,因为CBO并不知道两列其实是有联系的。
复制
3. 解决办法
3.1 创建组合索引
SQL> create index idx_t2 on t2(n1,n2);
--不收集直方图
SQL> exec dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'T2',CASCADE=>true, estimate_percent=>100,method_opt=>'for all columns size 1');
SQL> alter system flush shared_pool;
SQL> alter session set statistics_level=all;
SQL> select * from t2 where n1=3 and n2=3 and c1='a';
SQL > select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 92nytgpv0mb76, child number 0
-------------------------------------
select * from t2 where n1=3 and n2=3 and c1='a'
Plan hash value: 2008370210
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 500 |00:00:00.01 | 514 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 500 | 500 |00:00:00.01 | 514 |
|* 2 | INDEX RANGE SCAN | IDX_T2 | 1 | 500 | 500 |00:00:00.01 | 37 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C1"='a')
2 - access("N1"=3 AND "N2"=3)
--可以看到cardinality已经正确,估算是按照idx_t2的distinct值参考的selectivity,即cardinality = t2_num_rows * (1/20) = 500
--估算已经非常准确了。
复制
3.2 收集T2表直方图,再次测试
SQL> exec dbms_stats.gather_table_stats(ownname=>'HR',tabname=>'T2',CASCADE=>true, estimate_percent=>100,method_opt=>'for all columns size auto');
SQL> select table_name,column_name,HISTOGRAM from user_tab_columns where table_name='T2';
Table Column
Name Name HISTOGRAM
---------------------------------------- ------------------------- ---------------------------------------------
T2 C1 FREQUENCY
T2 C2 NONE
T2 N1 FREQUENCY
T2 N2 FREQUENCY
SQL> alter system flush shared_pool;
SQL> select * from t2 where n1=3 and n2=3 and c1='a';
SQL> select * from t2 where n1=3 and n2=3 and c1='a';
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 92nytgpv0mb76, child number 0
-------------------------------------
select * from t2 where n1=3 and n2=3 and c1='a'
Plan hash value: 2008370210
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 500 |00:00:00.01 | 514 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 25 | 500 |00:00:00.01 | 514 |
|* 2 | INDEX RANGE SCAN | IDX_T2 | 1 | 25 | 500 |00:00:00.01 | 37 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C1"='a')
2 - access("N1"=3 AND "N2"=3)
--可以看到列上有了直方图之后,cbo无法参考复合索引统计信息了,所以cardinality已经不准确了,所以要注意直方图对这种情况的影响。
复制
3.3 创建多列统计信息
--创建虚拟列组合
set serveroutput on
declare
cg_name varchar2(30);
begin
cg_name := sys.dbms_stats.create_extended_stats('HR','T2','(n1,n2)');
dbms_output.put_line(cg_name);
end;
/
--查看虚拟列组合
SQL> select extension_name, extension from dba_stat_extensions where table_name=upper('&tbl_name');
EXTENSION_NAME EXTENSION
------------------------------ -------------
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS ("N1","N2")
SQL> select sys.dbms_stats.show_extended_stats_name('HR','T2','(n1,n2)') col_group_name
from dual;
COL_GROUP_NAME
--------------------------------
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
--查看虚拟列统计信息
SQL> select e.extension col_group, t.num_distinct, t.histogram
from user_stat_extensions e, user_tab_col_statistics t
where e.extension_name = t.column_name
and e.table_name = t.table_name
and t.table_name = 'T2';
Distinct
COL_GROUP Values HISTOGRAM
-------------------- ------------ ---------------------------------------------
("N1","N2") 20 FREQUENCY
--重新收集统计信息(只单独针对n1,n2组合列),重新执行SQL,查看cardinality
SQL> alter system flush shared_pool;
SQL> select * from t2 where n1=3 and n2=3 and c1='a';
SQL> exec dbms_stats.gather_table_stats(ownname => 'HR',tabname => 'T2',method_opt => 'for columns(n1,n2) size auto',estimate_percent => 100,no_invalidate=>false);
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 92nytgpv0mb76, child number 0
-------------------------------------
select * from t2 where n1=3 and n2=3 and c1='a'
Plan hash value: 2008370210
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 500 |00:00:00.01 | 514 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 500 | 500 |00:00:00.01 | 514 |
|* 2 | INDEX RANGE SCAN | IDX_T2 | 1 | 500 | 500 |00:00:00.01 | 37 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C1"='a')
2 - access("N1"=3 AND "N2"=3)
--删除虚拟列
SQL> exec dbms_stats.drop_extended_stats('hr','t2','(n1,n2)');
复制
3.4 测试结论
--经过测试,细分为以下几种情形:
(1)当列n1,n2组合未被使用记录如col_usage$时,不会收集n1,n2组合列直方图信息,此时如果n1,n2列各自有直方图信息时,第一次执行会参考n1,n2各自列selectivity,导致第一次执行上面SQL时,cardinality=10000*(1/20*1/20)=25,第二次执行时,cbo会自动使用cardinality feedback修正cardinality。
(2)当列n1,n2组合使用过一次col_usage$有虚拟组合列使用记录时,再收集统计信息,n1,n2组合虚拟列会收集直方图,此时cardinality会参考n1,n2组合列直方图以及统计信息,cbo预估cardinality基本准确。
(3)当表缺失统计信息时,会采用动态采样,cardinality预估也会较为准确。
(4)当表有统计信息,没有虚拟列时,也会有cardinality feedback特性修正cardinality,当SQL反复执行时,cardinality也不会太偏离准确结果。
复制
最后修改时间:2021-05-18 17:46:54
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
【纯干货】Oracle 19C RU 19.27 发布,如何快速升级和安装?
Lucifer三思而后行
710次阅读
2025-04-18 14:18:38
Oracle RAC 一键安装翻车?手把手教你如何排错!
Lucifer三思而后行
639次阅读
2025-04-15 17:24:06
Oracle数据库一键巡检并生成HTML结果,免费脚本速来下载!
陈举超
551次阅读
2025-04-20 10:07:02
【ORACLE】你以为的真的是你以为的么?--ORA-38104: Columns referenced in the ON Clause cannot be updated
DarkAthena
502次阅读
2025-04-22 00:13:51
【活动】分享你的压箱底干货文档,三篇解锁进阶奖励!
墨天轮编辑部
497次阅读
2025-04-17 17:02:24
【ORACLE】记录一些ORACLE的merge into语句的BUG
DarkAthena
492次阅读
2025-04-22 00:20:37
一页概览:Oracle GoldenGate
甲骨文云技术
473次阅读
2025-04-30 12:17:56
火焰图--分析复杂SQL执行计划的利器
听见风的声音
421次阅读
2025-04-17 09:30:30
3月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
373次阅读
2025-04-15 14:48:05
OR+DBLINK的关联SQL优化思路
布衣
363次阅读
2025-05-05 19:28:36