使用过DBMS_STATS 的人会知道,你可以指定数据库采样的行的比例,来计算优化器的统计信息。参数ESTIMATE_PERCENT就是来控制它的。
Oracle的参考资料建议你使用默认的ESTIMATE_PERCENT,换句话说,不设置(如果你没有覆盖其默认值)该参数或者显式设置其为DBMS_STATS.AUTO_SAMPLE_SIZE(即自动确定采样大小)。
尽管如此,我曾听到一些人使用值100来采样所有行。我可以理解这样做的动机,毕竟,还有比读取全部行并检查全部的值更好的吗?我们首先要意识到,AUTO_SAMPLE_SIZE也会读取全部的行!其精巧的部分是优化了如何计算唯一值数量。其使用的算法避免了需要记住或排序指定值的清单(如果唯一值的数量很高,这可能会是一个值得关注的开销)。auto的算法节省了时间和系统资源。当然,得到的NDV(唯一值数量)是一个概略值,但其准确度是非常高的。
使用默认设置还有以下一些原因:
- 混合和TOP频率直方图需要你使用自动采样。如果你不这样做,数据库会创建准确度低一些的高度平衡直方图。
- 增量统计信息收集 需要自动采样。
总之: 你应该使用自动采样,除非你有一个非常确定的原因不这样做。
更多有关自动采样的信息可以看这里.
原文链接:https://blogs.oracle.com/optimizer/post/setting-estimate-percent
What should I set ESTIMATE_PERCENT to?
September 24, 2021 | 1 minute read
Nigel Bayliss
Product Manager
Those who have used the DBMS_STATS API will know that you can specify the proportion of rows the database will sample to calculate optimizer statistics. The parameter ESTIMATE_PERCENT controls this.
Oracle’s collateral recommends that you use the default ESTIMATE_PERCENT. In other words, leave the parameter unset (if you have not overridden the default) or explicitly set it to DBMS_STATS.AUTO_SAMPLE_SIZE.
Nevertheless, I have heard that some people use a value of 100 to sample all rows. I can understand the motivation to do this because, after all, what is better than reading all rows and checking all values? The first thing to realize is that AUTO_SAMPLE_SIZE reads all rows too! The clever part is that it optimizes how the number of distinct values (NDV) is calculated. It uses an algorithm that avoids the need to remember or sort a list of specific column values (which can otherwise be a considerable overhead if the number of distinct values is high). The auto algorithm saves both time and system resources. Yes, the resulting NDV is an approximation, but the accuracy is very high.
There are other reasons to use the default setting:
- Hybrid and top frequency histograms require you to use auto sample size. If you don’t, the database will create less accurate height-balanced histograms.
- Incremental statistics gathering requires auto sample size.
In summary: you should use auto sample size unless you have a very specific reason not to.
More information on auto sample size can be found here.