Top Frequency直方图是在oracle12C中出现现的,官网上对Top Frequency直方图的描述如下:
A top frequency histogram is a variation on a frequency histogram that ignores nonpopular values that are statistically insignificant.
For example, if a pile of 1000 coins contains only a single penny, then you can ignore the penny when sorting the coins into buckets. A top frequency histogram can produce a better histogram for highly popular values.
11.6.1 Criteria For Top Frequency Histograms
If a small number of values occupies most of the rows, then creating a frequency histogram on this small set of values is useful even when the NDV is greater than the number of requested histogram buckets. To create a better quality histogram for popular values, the optimizer ignores the nonpopular values and creates a top frequency histogram.
As shown in the logic diagram in “How Oracle Database Chooses the Histogram Type”, the database creates a top frequency histogram when the following criteria are met:
NDV is greater than n, where n is the number of histogram buckets (default 254).
The percentage of rows occupied by the top n frequent values is equal to or greater than threshold p, where p is (1-(1/n))*100.
The estimate_percent parameter in the DBMS_STATS statistics gathering procedure is set to AUTO_SAMPLE_SIZE.
产生Top Frequency 直方图的条件
NDV(重复值数量)大于n,其中n是柱状图存储桶的数量(默认为254)。
前n个频繁值占行的百分比等于或大于阈值p,其中p为(1-(1/n))*100。
DBMS_STATS统计信息收集过程中的estimate_percent参数设置为AUTO_SAMPLE_SIZE。
实验-如何产生Top Frequency Histogram
创建表并插入合适的数据
创建表
create table topfreq_t(id number);
复制
插入数据
insert into topfreq_t values(52792); insert into topfreq_t values(52793); insert into topfreq_t values(52793); insert into topfreq_t values(52793); insert into topfreq_t values(52793); insert into topfreq_t values(52793); insert into topfreq_t values(52794); insert into topfreq_t values(52794); insert into topfreq_t values(52795); insert into topfreq_t values(52796); insert into topfreq_t values(52797); insert into topfreq_t values(52797); insert into topfreq_t values(52798); insert into topfreq_t values(52798); insert into topfreq_t values(52799); insert into topfreq_t values(52799); insert into topfreq_t values(52799); insert into topfreq_t values(52799); insert into topfreq_t values(52799); insert into topfreq_t values(52799); insert into topfreq_t values(52799); insert into topfreq_t values(52799); insert into topfreq_t values(52799); commit;
复制
表topfreq_t中有23条数据,与8种不同的值。如下:
SQL> select t.id, count(*) from TOPFREQ_T t group by t.id order by 1,2 ; ID COUNT(*) ---------- ---------- 52792 1 52793 5 52794 2 52795 1 52796 1 52797 2 52798 2 52799 9 8 rows selected.
复制
生成 top frequency histogram
收集统计信息
BEGIN DBMS_STATS.GATHER_TABLE_STATS ( ownname => 'LEI' , tabname => 'TOPFREQ_T' , method_opt => 'FOR COLUMNS id SIZE 7' ); END; / PL/SQL procedure successfully completed.
复制
注意:id为表的列名,7是分组数要小于不同值的总数(8)。
查看id列的统计信息
SQL> col table_name for a10 SQL> col COLUMN_NAME for a20 SQL> SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM FROM USER_TAB_COL_STATISTICS WHERE TABLE_NAME='TOPFREQ_T' AND COLUMN_NAME='ID'; TABLE_NAME COLUMN_NAME NUM_DISTINCT HISTOGRAM ---------------------- ----------------- -------------- TOPFREQ_T ID 8 TOP-FREQUENCY
复制
产生的原因:首先肯定是满足了前面的条件分组数7小于了不同值的总数8
由于n=7,所以p=85.7,而7个出现最频繁的值所占的比例是95.65>85.7
查看Oracle所收集的7个频率最高的值
SQL> SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE FROM USER_HISTOGRAMS WHERE TABLE_NAME='TOPFREQ_T' AND COLUMN_NAME='ID'; ENDPOINT_NUMBER ENDPOINT_VALUE --------------- -------------- 1 52792 6 52793 8 52794 9 52796 11 52797 13 52798 22 52799 7 rows selected.
复制
可以看到,少了52795(只有一条记录),ENDPOINT_VALUE表示值出现频率的总和。
官方文档地址:
https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/histograms.html#GUID-DA1B97DA-DFE5-47CA-B8A0-57AB248B10EF