ORACLE 中的Top Frequency直方图

原创唐祖亮 2020-07-26

1821

Top Frequency直方图是在oracle12C中出现现的，官网上对Top Frequency直方图的描述如下：
A top frequency histogram is a variation on a frequency histogram that ignores nonpopular values that are statistically insignificant.

For example, if a pile of 1000 coins contains only a single penny, then you can ignore the penny when sorting the coins into buckets. A top frequency histogram can produce a better histogram for highly popular values.

11.6.1 Criteria For Top Frequency Histograms
If a small number of values occupies most of the rows, then creating a frequency histogram on this small set of values is useful even when the NDV is greater than the number of requested histogram buckets. To create a better quality histogram for popular values, the optimizer ignores the nonpopular values and creates a top frequency histogram.

As shown in the logic diagram in “How Oracle Database Chooses the Histogram Type”, the database creates a top frequency histogram when the following criteria are met:

NDV is greater than n, where n is the number of histogram buckets (default 254).

The percentage of rows occupied by the top n frequent values is equal to or greater than threshold p, where p is (1-(1/n))*100.

The estimate_percent parameter in the DBMS_STATS statistics gathering procedure is set to AUTO_SAMPLE_SIZE.

产生Top Frequency 直方图的条件

NDV(重复值数量)大于n，其中n是柱状图存储桶的数量(默认为254)。
前n个频繁值占行的百分比等于或大于阈值p，其中p为(1-(1/n))*100。
DBMS_STATS统计信息收集过程中的estimate_percent参数设置为AUTO_SAMPLE_SIZE。

实验-如何产生Top Frequency Histogram

创建表并插入合适的数据
创建表

create table topfreq_t(id number);
复制

插入数据

insert into topfreq_t values(52792);
insert into topfreq_t values(52793);
insert into topfreq_t values(52793);
insert into topfreq_t values(52793);
insert into topfreq_t values(52793);
insert into topfreq_t values(52793);
insert into topfreq_t values(52794);
insert into topfreq_t values(52794);
insert into topfreq_t values(52795);
insert into topfreq_t values(52796);
insert into topfreq_t values(52797);
insert into topfreq_t values(52797);
insert into topfreq_t values(52798);
insert into topfreq_t values(52798);
insert into topfreq_t values(52799);
insert into topfreq_t values(52799);
insert into topfreq_t values(52799);
insert into topfreq_t values(52799);
insert into topfreq_t values(52799);
insert into topfreq_t values(52799);
insert into topfreq_t values(52799);
insert into topfreq_t values(52799);
insert into topfreq_t values(52799);
commit;
复制

表topfreq_t中有23条数据，与8种不同的值。如下：

SQL> select t.id, count(*) from TOPFREQ_T t group by t.id order by 1,2  ;
 
ID   COUNT(*)
---------- ----------
     52792	    1
     52793	    5
     52794	    2
     52795	    1
     52796	    1
     52797	    2
     52798	    2
     52799	    9
 
8 rows selected.
复制

生成 top frequency histogram

收集统计信息

BEGIN
  DBMS_STATS.GATHER_TABLE_STATS (
    ownname    => 'LEI'
,   tabname    => 'TOPFREQ_T'
,   method_opt => 'FOR COLUMNS id SIZE 7'    
);
END;
/

PL/SQL procedure successfully completed.
复制

注意：id为表的列名，7是分组数要小于不同值的总数(8)。

查看id列的统计信息

SQL> col table_name for a10
SQL> col COLUMN_NAME for a20
SQL> SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM
FROM   USER_TAB_COL_STATISTICS
WHERE  TABLE_NAME='TOPFREQ_T'
AND    COLUMN_NAME='ID';    

TABLE_NAME COLUMN_NAME	NUM_DISTINCT HISTOGRAM
----------------------  ----------------- --------------
TOPFREQ_T  	ID	   8 TOP-FREQUENCY
复制

产生的原因：首先肯定是满足了前面的条件分组数7小于了不同值的总数8
由于n=7，所以p=85.7，而7个出现最频繁的值所占的比例是95.65>85.7

查看Oracle所收集的7个频率最高的值

SQL> SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE
FROM   USER_HISTOGRAMS
WHERE  TABLE_NAME='TOPFREQ_T'
AND    COLUMN_NAME='ID'; 

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
      1  52792
      6  52793
      8  52794
      9  52796
     11  52797
     13  52798
     22  52799
7 rows selected.
复制

可以看到，少了52795(只有一条记录)，ENDPOINT_VALUE表示值出现频率的总和。
官方文档地址：
https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/histograms.html#GUID-DA1B97DA-DFE5-47CA-B8A0-57AB248B10EF

oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

ORACLE 中的Top Frequency直方图

产生Top Frequency 直方图的条件

实验-如何产生Top Frequency Histogram

生成 top frequency histogram

查看Oracle所收集的7个频率最高的值

评论

相关阅读