为改善优优器生成好的执行计划的能力,引入了动态采样(Dynamic sampling,简写为DS)。该特性在 Oracle Database 12c中得到了增强,并改名为动态统计(Dynamic Statistics)。最常见的错误认识是认为DS可以做为统计信息的替代,而DS的目标却是增强统计信息;当常规的统计信息不足以得到高质量的评估基数时,可以使用DS。
那么,如何以及何时来使用动态统计呢?在SQL语句的编译期间,优化通过考虑可用的统计信息是否足够生成良好的执行计划,来决定是否使用动态统计。如果可用的统计信息不足,将会使用DS。它可以补偿缺失或不充分的统计信息,否则,就会导致一个差的执行计划。对于一个查询中有一个或多个表没有统计信息的情况,DS会被优化器在优化语句前,用于在这些表上收集基础统计信息。此时收集的统计信息,不如使用DBMS_STATS包收集的统计信息的质量好或者完整。这种权衡是为了在语句的编译时间产生有限的影响。
如果SQL语句包含有复杂的谓词表达式和扩展统计信息不可用时,DS也可以被使用。扩展统计信息是在Oracle Database 11g Release 1时引入的,其目标是帮助优化器针对复杂的谓词表达式,可以获得高质量的基数评估。比如,如果你有一个简单的查询,其WHERE子句在两个相关列上有谓词(译者注:意为在相关列上有过滤条件),标准统计信息是不够的。请看下面的语句及其执行计划:
EXPLAIN PLAN FOR SELECT * FROM CUSTOMERS WHERE cust_city = 'Los Angeles' AND cust_state_province = 'CA'; SELECT * FROM table(dbms_xplan.display(format=>'basic rows')); ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | 8 | | 1 | TABLE ACCESS FULL| CUSTOMERS | 8 | -----------------------------------------------
复制
使用标准统计信息,优化器评估基数为8行。而该查询实际返回的行数是916.没有扩展统计信息,优化器不知道在"Los Angeles" 和州"CA"之间存在完美的相关性。但是通过设置 optimizer_dynamic_sampling为级别4,优化器将使用动态采样,来收集有关该谓词表达式的额外信息。由动态采样提供的额外信息,允许优化器产生更精确的基数评估。
ALTER SESSION SET optimizer_dynamic_sampling=4; EXPLAIN PLAN FOR SELECT * FROM customers WHERE cust_city = 'Los Angeles' AND cust_state_province = 'CA'; SELECT * FROM table(dbms_xplan.display(format=>'basic rows')); ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | 916 | | 1 | TABLE ACCESS FULL| CUSTOMERS | 916 | -----------------------------------------------
复制
本例中改善后的基数评估,并没有影响SQL的执行计划,但对于更复杂的查询,一个更好的基数评估,往往会导致一个更好的执行计划,这又会导致更快的查询执行时间的结果。
你也许好奇我们为什么必须设置参数optimizer_dynamic_sampling为4。动态统计特性是由于参数optimizer_dynamic_sampling控制的,它可以被设置为不同的级别(0–11)。这些级别控制两个不同的事情;何时动态采样触发,以及使用多大的采样来收集统计信息。更大的采样大小会对查询的编译时间有更大的影响。
Level | 何时使用动态统计 | 采样尺寸 (块) |
---|---|---|
0 | 关闭动态统计 | N/A |
1 | 语句中至少有一个非分区表无统计信息 | 32 |
2 (default) | 语句中有一个或多个表无统计信息。该设置适用于大多数的系统 | 64 |
3 | 满足LEVEL 2的标准,并且在查询的WHERE子句中有一个或多个表达式,例如 Where substr(CUSTLASTNAME,1,3) 或 Where a + b =5 | 64 |
4 | 满足LEVEL 3的标准,并且在语句中有复杂谓词,在同一个表上的多个谓词上有OR或AND操作符。 | 64 |
5 | 满足LEVEL 4的标准 | 128 |
6 | 满足LEVEL 4的标准 | 256 |
7 | 满足LEVEL 4的标准 | 512 |
8 | 满足LEVEL 4的标准 | 1024 |
9 | 满足LEVEL 4的标准 | 4086 |
10 | 所有语句 | 全部块 |
11 | 需要动态统计时由数据库自动决定 | 自动确定 |
有关动态采样级别更多的信息,请参考 Chapter 12 of the SQL Tuning Guide for Oracle Database 12c Release 2.
在我们的示例中,在CUSTOMERS表上的两个简单谓词之间有一个AND操作符。从上表中,你可以明白为何我设置optimizer_dynamic_sampling为4,以便在我们的示例中触发动态采样。
从Oracle Database 11g Release 2起,优化器自动决定动态采样是否有用,以及对于并行执行的SQL语句使用何种动态采样的级别。这一决定是基于语句中表的大小和谓词的复杂程度。你可以通过查看执行计划的NOTE部分,来判断是否触发了动态采样。
auto_dynamic_sampling.png
串行执行的SQL语句的动态采样级别是由参数optimizer_dynamic_sampling来控制的,但需注意,自Oracle Database 12c Release 1起,SQL plan directives的存在,也可以在查询编译时,触发动态统计信息收集。这是一个自适应统计信息的特性,其是由Oracle Database 12c Release 1中的参数optimizer_adaptive_features (OAF),和Oracle Database 12c Release 2中的参数optimizer_adaptive_statistics (OAS)来控制的。换句话说,自Oracle Database 12c Release 1 起,如果特定自适应特性由相关参数设置为TRUE而启用,那么DS也将被使用。
总结:
数据库版本 | 控制相关自适应特性的数据库参数名称 | 默认值 |
---|---|---|
Oracle Database 12c Release 1 | optimizer_adaptive_features (OAF) | TRUE |
Oracle Database 12c Release 2 Onwards | optimizer_adaptive_statistics (OAS) | FALSE |
串行语句通常运行时间短,而编译时的DS的开销会对整个系统性能(如果语句频繁硬解析)上产生重大影响。对于匹配此种情形的系统,建议设置OAF=FALSE,对于Oracle Database 12c Release 2及之后的版本,建议使用OAS=FALSE的默认值。
并行语句通常耗费更多的资源(译者注:执行更长的时间),所以,是值得在编译时间上花费更多的开销,以找到潜在更优的执行计划。
我们最初的SQL语句例子,其是串行的,这也是为什么我们需要手动设置optimizer_dynamic_sampling的值(我们假设不存在相关的SQL plan directives)。如果我们在一个有并行属性设置的大表上,,发出一个类似风格的查询,我们会看到动态采样被触发。
何时应该使用动态采样呢?当你知道会因复杂谓词而得到一个差的执行计划时,DS通常是建议的。但是,你应该尝试并使用alter session语句设置optimizer_dynamic_sampling参数,因为在系统范围内进行设置可能是非常困难的。
何时使用动态采样不好呢?如果查询的编译时间需要尽可能的短,比如,不会重复执行的OLTP查询,这会使你不能从多次执行中,分摊其(译者注:DS所花费的)额外的代价。
原文链接:https://blogs.oracle.com/optimizer/post/dynamic-sampling-and-its-impact-on-the-optimizer
Dynamic sampling and its impact on the Optimizer
January 2, 2020 | 5 minute read
Nigel Bayliss
Product Manager
Dynamic sampling (DS) was introduced to improve the optimizer’s ability to generate good execution plans. This feature was enhanced and renamed Dynamic Statistics in Oracle Database 12c. The most common misconception is that DS can be used as a substitute for optimizer statistics, whereas the goal of DS is to augment optimizer statistics; it is used when regular statistics are not sufficient to get good quality cardinality estimates.
So how and when will dynamic statistics be used? During the compilation of a SQL statement, the optimizer decides whether to use DS or not by considering whether the available statistics are sufficient to generate a good execution plan. If the available statistics are not enough, DS will be used. It is typically used to compensate for missing or insufficient statistics that would otherwise lead to a very bad plan. For the case where one or more of the tables in the query does not have statistics, DS is used by the optimizer to gather basic statistics on these tables before optimizing the statement. The statistics gathered in this case are not as high a quality or as complete as the statistics gathered using the DBMS_STATS package. This trade off is made to limit the impact on the compile time of the statement.
DS may be used if a SQL statement contains a complex predicate expression and extended statistics are not available. Extended statistics were introduced in Oracle Database 11g Release 1 with the goal of helping the optimizer get good quality cardinality estimates for complex predicate expressions. For example, if you had a simple query that has where clause predicates on two correlated columns, standard statistics would not be sufficient. Consider the following SQL statement and its execution plan :
EXPLAIN PLAN FOR SELECT * FROM CUSTOMERS WHERE cust_city = 'Los Angeles' AND cust_state_province = 'CA'; SELECT * FROM table(dbms_xplan.display(format=>'basic rows')); ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | 8 | | 1 | TABLE ACCESS FULL| CUSTOMERS | 8 | -----------------------------------------------
复制
With standard statistics the optimizer estimates the cardinality as 8 rows. The actual number of rows returned by this query is 916. Without extended statistics, the optimizer does not know that there is a perfect correlation between “Los Angeles” and the state “CA” but by setting optimizer_dynamic_sampling to level 4, the optimizer will use dynamic sampling to gather additional information about this predicate expression. The additional information provided by dynamic sampling allows the optimizer to generate a more accurate cardinality estimate.
ALTER SESSION SET optimizer_dynamic_sampling=4; EXPLAIN PLAN FOR SELECT * FROM customers WHERE cust_city = 'Los Angeles' AND cust_state_province = 'CA'; SELECT * FROM table(dbms_xplan.display(format=>'basic rows')); ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | 916 | | 1 | TABLE ACCESS FULL| CUSTOMERS | 916 | -----------------------------------------------
复制
In this case the improved cardinality estimate has not affected our SQL execution plan, but for more complicated queries a better cardinality estimate will often result in a better SQL execution plan, which will in turn result in a faster query execution time.
You may now be wondering why we had to set the parameter optimizer_dynamic_sampling to 4 . The dynamic statistics feature is controlled by the parameter optimizer_dynamic_sampling, which can be set to different levels (0-11). These levels control two different things; when dynamic sampling kicks in and how large a sample size will be used to gather the statistics. The greater the sample size the bigger impact DS has on the compilation time of a query.
Level | When Dynamic Statistics will be used | Sample size (blocks) |
---|---|---|
0 | Switches off dynamic statistics | N/A |
1 | At least one non-partitioned table in the statement has no statistics | 32 |
2 (default) | One or more tables in the statement have no statistics This setting is suitable for the majority of systems | 64 |
3 | Any statement that meets level 2 criteria and any statement that has one or more expressions used in the where clause predicates e.g. Where substr(CUSTLASTNAME,1,3) or Where a + b =5 | 64 |
4 | Any statement that meets level 3 criteria and any statement that has complex predicates. An OR or AND operator between multiple predicates on the same table | 64 |
5 | Any statement that meets level 4 criteria | 128 |
6 | Any statement that meets level 4 criteria | 256 |
7 | Any statement that meets level 4 criteria | 512 |
8 | Any statement that meets level 4 criteria | 1024 |
9 | Any statement that meets level 4 criteria | 4086 |
10 | All statements | All Blocks |
11 | The database determines automatically if dynamic statistics are required | Automatically determined |
There more information on dynamic sampling levels, refer to the Chapter 12 of the SQL Tuning Guide for Oracle Database 12c Release 2.
In our example we had an AND operator between two simple predicates on the CUSTOMERS table. From the above table you can now see why I chose to set optimizer_dynamic_sampling to level 4 in order to have dynamic sampling kick in for our example.
From Oracle Database 11g Release 2 onwards, the optimizer automatically decides if dynamic sampling will be useful and what dynamic sampling level will be used for SQL statements executed in parallel. This decision is based on size of the tables in the statement and the complexity of the predicates. You can tell if dynamic sampling kicks in by looks in the note section of the execution plan.
auto_dynamic_sampling.png
For serial SQL statements the dynamic sampling level is controlled by the optimizer_dynamic_sampling parameter but note that from Oracle Database 12c Release 1 the existence of SQL plan directives can also initiate dynamic statistics gathering when a query is compiled. This is a feature of adaptive statistics and is controlled by the database parameter optimizer_adaptive_features (OAF) in Oracle Database 12c Release 1 and optimizer_adaptive_statistics (OAS) in Oracle Database 12c Release 2. In other words, from Oracle Database 12c Release 1 onwards, DS will be used if certain adaptive features are enabled by setting the relevant parameter to TRUE. To summarize:
Database Version | Name of Database Parameter Controlling Relevant Adaptive Features | Default Value |
---|---|---|
Oracle Database 12c Release 1 | optimizer_adaptive_features (OAF) | TRUE |
Oracle Database 12c Release 2 Onwards | optimizer_adaptive_statistics (OAS) | FALSE |
Serial statements are typically short running and any DS overhead at compile time can have a large impact on overall system performance (if statements are frequently hard parsed). For systems that match this profile, setting OAF=FALSE is recommended. For Oracle Database 12c Release 2 onwards, using the default OAS=FALSE is recommended.
Parallel statements are generally more resource intensive, so it’s often worth investing in additional overhead at compile time to potentially find a better SQL execution plan.
In our original example the SQL statement is serial, which is why we needed to manual set the value for optimizer_dynamic_sampling parameter (and we’ll assume that there were no relevant SQL plan directives). If we were to issue a similar style of query against a larger table that had the parallel attribute set we can see the dynamic sampling kicking in.
When should you use dynamic sampling? DS is typically recommended when you know you are getting a bad execution plan due to complex predicates. However, you should try and use an alter session statement to set the value for optimizer_dynamic_sampling parameter as it can be extremely difficult to come up with a system-wide setting.
When is it not a good idea to use dynamic sampling? If the queries compile times need to be as fast as possible, for example, unrepeated OLTP queries where you can’t amortize the additional cost of compilation over many executions.