Postgres execution plan cache

原创大表哥 All China Database Union 2023-12-18

862

大家好，今天聊聊 PG的执行计划的缓存机制。

熟悉oracle的小伙伴都知道，ORACLE采用shared pool这种global cache area的方式缓存SQL 语句的执行计划，有效的减少了 SQL 硬解析的次数。

熟悉MYSQL的小伙伴们都知道，mysql 并不会cache 执行计划。（query cache 从8.0开始已经废弃了，并且cache的也不是执行计划本身）

我们今天聊的PG的execution plan cache 恰恰介于上面2者之间，没有老大哥ORACLE 放到global cache中那么彻底，放到了session 级别的cache 中，

相对于同门兄弟mysql 来说，PG在连接池 + 高并发的场景下，session 级别的cache 也能有效的规避一些硬解析。

首先我们先看一下，同一个session 中一条SQL语句执行多少次计划会被cache住？

（有人说是5次，有人抱怨根本不会cache计划）

具体的原则需要看你的客户端是什么？ PSQL or Java client (很多图形工具实际上也是用的JDBC 的驱动连接) 是否采用了 prepare 或者 preparedstatement 的方式

服务器端参数 plan_cache_mode 是如何设置？这个参数是PG是version 12版本开始引入的

简单地说：
默认值auto: PG 自行逻辑判断是否被缓存（基于成本）
force_generic_plan：可以重用缓存的计划
force_custom_plan：每次连接刷新缓存，就是不被缓存

PG plan_cache_mode 默认的参数值 auto, 在源码中有较为详尽的解释：src/backend/utils/cache/plancache.c
有兴趣的朋友可以详细阅读一下，这里篇幅受限，只粘贴了一部分重点。

*
* plancache.c
*      Plan cache management.
*
* The plan cache manager has two principal responsibilities: deciding when
* to use a generic plan versus a custom (parameter-value-specific) plan,
* and tracking whether cached plans need to be invalidated because of schema
* changes in the objects they depend on.
*
* The logic for choosing generic or custom plans is in choose_custom_plan,
* which see for comments.

复制

plancache.c 文件实现了 plan cache的管理功能，
包含2大核心功能：

决定使用generic plan 还是 custom plan
维护已经被缓存plan 是否会失效，由于对象的定义发生了变化

choose_custom_plan这个函数逻辑是判断generic plan or custom plan :

主要考虑到如下几点：
1判断是否是第一次执行（is_oneshot）
2)判断例如java客户端传入的参数设置（boundParams）
3)判断是否收到transaction statemant control
4)数据库端是否设置了参数 PLAN_CACHE_MODE_FORCE_GENERIC_PLAN or PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN
5)用户session 级别设置了参数 PLAN_CACHE_MODE_FORCE_GENERIC_PLAN or PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN
6）执行的次数大于5次 (plansource->num_custom_plans < 5)
7）最后基于成本计算判断（generic_cost < avg_custom_cost）

/*
* choose_custom_plan: choose whether to use custom or generic plan
*
* This defines the policy followed by GetCachedPlan.
*/
static bool
choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
   double        avg_custom_cost;


   /* One-shot plans will always be considered custom */
   if (plansource->is_oneshot)
       return true;


   /* Otherwise, never any point in a custom plan if there's no parameters */
   if (boundParams == NULL)
       return false;
   /* ... nor for transaction control statements */
   if (IsTransactionStmtPlan(plansource))
       return false;


   /* Let settings force the decision */
   if (plan_cache_mode == PLAN_CACHE_MODE_FORCE_GENERIC_PLAN)
       return false;
   if (plan_cache_mode == PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN)
       return true;


   /* See if caller wants to force the decision */
   if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
       return false;
   if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
       return true;


   /* Generate custom plans until we have done at least 5 (arbitrary) */
   if (plansource->num_custom_plans < 5)
       return true;


   avg_custom_cost = plansource->total_custom_cost / plansource->num_custom_plans;


   /*
    * Prefer generic plan if it's less expensive than the average custom
    * plan.  (Because we include a charge for cost of planning in the
    * custom-plan costs, this means the generic plan only has to be less
    * expensive than the execution cost plus replan cost of the custom
    * plans.)
    *
    * Note that if generic_cost is -1 (indicating we've not yet determined
    * the generic plan cost), we'll always prefer generic at this point.
    */
   if (plansource->generic_cost < avg_custom_cost)
       return false;


   return true;
}

复制

我们再来看看cache的plan在什么条件下会失效？

我们看一下RevalidateCachedQuery这个函数主要负责cache的有效性的检查（失效后会重新进行解析）

1.判断客户端的的变量search_path是否发生变化

/*
    * If the query is currently valid, we should have a saved search_path ---
    * check to see if that matches the current environment.  If not, we want
    * to force replan.
    */
   if (plansource->is_valid)
   {
       Assert(plansource->search_path != NULL);
       if (!OverrideSearchPathMatchesCurrent(plansource->search_path))
       {
           /* Invalidate the querytree and generic plan */
           plansource->is_valid = false;
           if (plansource->gplan)
               plansource->gplan->is_valid = false;
       }
   }

复制

2.行级别的安全保护和用户角色是否发生了改变

/*
    * If the query rewrite phase had a possible RLS dependency, we must redo
    * it if either the role or the row_security setting has changed.
    */
   if (plansource->is_valid && plansource->dependsOnRLS &&
       (plansource->rewriteRoleId != GetUserId() ||
        plansource->rewriteRowSecurity != row_security))
       plansource->is_valid = false;

复制

3.如果前2个步骤的验证都通过的话，会加上锁AcquirePlannerLocks 会结束验证并且返回当前的计划的valid

AcquirePlannerLocks(plansource->query_list, true);  
if (plansource->is_valid)
       {
           /* Successfully revalidated and locked the query. */
           return NIL;
       }

复制

4.接下来，是验证失败的情况，需要重新parse并生成执行计划。
先清除MemoryConext中的语句，清内存中的plansource 的对象

MemoryContextDelete(qcxt);
ReleaseGenericPlan(plansource);

复制

5.重新进行 parse analysis and rule rewriting

rawtree = copyObject(plansource->raw_parse_tree);
   if (rawtree == NULL)
       tlist = NIL;
   else if (plansource->parserSetup != NULL)
       tlist = pg_analyze_and_rewrite_params(rawtree,
                                             plansource->query_string,
                                             plansource->parserSetup,
                                             plansource->parserSetupArg,
                                             queryEnv);
   else
       tlist = pg_analyze_and_rewrite(rawtree,
                                      plansource->query_string,
                                      plansource->param_types,
                                      plansource->num_params,
                                      queryEnv);

复制

6.判断元祖的定义（表的DDL）是否需要更新

if (resultDesc == NULL && plansource->resultDesc == NULL)
    {
        /* OK, doesn't return tuples */
    }
    else if (resultDesc == NULL || plansource->resultDesc == NULL ||
             !equalTupleDescs(resultDesc, plansource->resultDesc))
    {
        /* can we give a better error message? */
        if (plansource->fixed_result)
            ereport(ERROR,
                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                     errmsg("cached plan must not change result type")));
        oldcxt = MemoryContextSwitchTo(plansource->context);

复制

7.最后设置对象plansource 相关属性：roleID, Row Level Security , search_path,query_context,query_list,is_valid
返回查询树 tlist

plansource->rewriteRoleId = GetUserId();
plansource->rewriteRowSecurity = row_security;
plansource->search_path = GetOverrideSearchPath(querytree_context);
plansource->query_context = querytree_context;
plansource->query_list = qlist;
return tlist;

复制

我们结束源码的阅读，接下来是动手实验部分：

先来通过PSQL的客户端进行测试：（PSQL只有在prepare 命令下才能触发执行计划缓存）

关于PREPARE的使用可以参考： https://www.postgresql.org/docs/current/sql-prepare.html

实验步骤如下:

创建一张表，加入一些测试数据，并打开 auto_plan 插件，把SQL的执行计划打印到日志中迁移（auto_explain.log_min_duration = 0 设置为0 方便我们打印所有的）

postgres=# create table t1 (id int primary key, name varchar(100));
CREATE TABLE
postgres=# insert into t1 select 1, 'jason';
INSERT 0 1

postgres=# LOAD 'auto_explain';
LOAD
postgres=# set auto_explain.log_min_duration = 0;
SET

复制

我们设置参数 plan_cache_mode 由默认的auto-》 force_generic_plan：

postgres=# set plan_cache_mode = 'force_generic_plan';
SET
postgres=# show plan_cache_mode;
 plan_cache_mode  
--------------------
force_generic_plan
(1 row)

复制

我们采用PREPARE的方式执行一条查询语句：

postgres=# PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
PREPARE
postgres=# EXECUTE demo(1);
name  
-------
jason
(1 row)

复制

从PG的日志中观察执行计划：走的主键索引t1_pkey 访问方式

INFRA [postgres@ljzdccapp007 log]# tail -f postgresql-14.log | grep t1


   Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
   Index Scan using t1_pkey on t1  (cost=0.15..4.17 rows=1 width=218)

复制

我们继续执行同样的语句直到5次：

postgres=# EXECUTE demo(1);
name  
-------
jason
(1 row)

...
...

postgres=# EXECUTE demo(1);
name  
-------
jason
(1 row)

复制

根据我们的参数设置 force_generic_plan，语句SELECT name FROM t1 WHERE id = $1;的执行计划会被cache住：

我们尝试关闭一些GUC的参数，来影响一下SQL的执行计划：

postgres=# set enable_indexscan = off;
SET
postgres=# set enable_bitmapscan = off;
SET
postgres=#  select * from t1 where id = 1;
id | name  
----+-------
 1 | jason
(1 row)

复制

我们发现执行日志中的SQL语句： select * from t1 where id = 1; 注意这里采用的是硬解析的方式，而不是prepare的方式，
查看执行计划变成了 seqscan 证明GUC的参数生效: (红色部分)

Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)

Query Text: select * from t1 where id = 1;
Seq Scan on t1 (cost=0.00…14.00 rows=1 width=222)

我们再次执行 EXECUTE demo(1);

postgres=# EXECUTE demo(1);
name  
-------
jason
(1 row)

复制

发现执行计划依然是原来的：Index Scan using t1_pkey ，证明cache 已经生效了（最后的绿色部分）

进一步验证，我们打开一个新的session 窗口，这次设置同样的GUC参数，看看计划是否会改变：

postgres=# set plan_cache_mode = 'force_generic_plan';
SET
postgres=# LOAD 'auto_explain';
LOAD
postgres=# set auto_explain.log_min_duration = 0;
SET
postgres=# set enable_indexscan = off;
SET
postgres=# set enable_bitmapscan = off;
SET
postgres=# PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
PREPARE
postgres=# EXECUTE demo(1);
name  
-------
jason
(1 row)

复制

我们看到计划是GUC参数设置所预期的：Seq Scan on t1 （最后的红色部分）

所以我们通过动手实验得到结论，同样的的SQL 在不同的session中由于缓存的存在，可能执行计划不一样，站在应用系统的体验角度，会出现同一个功能，性能时快时慢的情况。

我们进入下一个实验，采用目前JAVA程序 spring boot 进行JDBC的程序测试：

Postgres JDBC 官方文档： https://jdbc.postgresql.org/documentation/use/

prepareThreshold (int) ：默认SQL执行5次被cache
preparedStatementCacheQueries (int) : 默认保存执行计划的SQL的条数是256
preparedStatementCacheSizeMiB (int) : 默认保存cache的大小是5MB，超过会通过LRU 抛弃。

写一个简单的测试小程序：

目的为了打印JAVA客户端设置的PrepareThreshold，以及缓存的pgstatment内容，以及是否会使用缓存isUseServerPrepare()

 @RequestMapping(value="/planCacheTest", method=RequestMethod.GET)
 public String planCacheTest() throws Exception {
     StringBuilder str_result = new StringBuilder();
     String sql = "SELECT name FROM t1 WHERE id = ?";
     PreparedStatement statement = jdbcTemplate.getDataSource().getConnection().prepareStatement(sql);
     org.postgresql.PGStatement pgstatment = statement.unwrap(org.postgresql.PGStatement.class);
     str_result.append("The default JDBC cache preparedstatement:" + pgstatment.getPrepareThreshold()+"
");
/*      CachedQuery cachedQuery = ReflectionUtils.getFieldValue(
             pgstatment ,  org.postgresql.PGStatement.class.getDeclaredField("preparedQuery")


     );*/


     str_result.append("The default JDBC cache Query:" + pgstatment.toString()+"
");
     for (int i=1; i<=10; i++)
     {
         statement.setInt(1,i);
         boolean usingServerPrepare = pgstatment.isUseServerPrepare();
         ResultSet rs = statement.executeQuery();
         rs.next();
         str_result.append("Execution: "+i+", Used server side: " + usingServerPrepare + ", Result: "+rs.getString(1) + "</br>");
         rs.close();
     }


     statement.close();




     return str_result.toString();


 }

复制

启动springboot后, 在浏览器中进行测试：一切符合我们的预期.

接下来我们讨论一种情况，由于表的数据分布不均匀，造成cache的执行时候往往不是有效的？这个时候如何解决？
（ORACLE 早在10几年前的11G版本中，引入了ACS Adapt cursor sharing 游标自适应特性，就是为了解决缓存低效执行计划的问题）

个人观点参考如下：

1.修改全局或者user或者database级别的参数 plan_cache_mode = force_custom_plan 每次生成新的计划
2.使用pg_hint_plan, 绑定有效的计划可以参考： https://www.modb.pro/db/609428
3.jdbc 客户端设置 setPrepareThreshold() = 0, 关闭缓存
4.代码改写：直接针对有问题的SQL，不用preparestatement绑定变量的方式，直接走SQL硬解析。

所以到底用不用plan cache是一个根据自己数据库系统特点来trade off的选择。

最后总结：
1.源码层面阅读了cacheplan manager , 核心2功能： choose_custom_plan 函数-》custom or generic plan? RevalidateCachedQuery 函数 => 验证缓存的计划是否失效
2.PSQL 客户端的prepare command 进行了缓存测试实验
3.JDBC 通过springboot 测试了JAVA 端缓存测试
4.讨论了数据分布不均，执行计划不稳定的情况下的解决方法。

Have a fun 🙂 !

参考链接：
src/backend/utils/cache/plancache.c
https://vladmihalcea.com/postgresql-jdbc-statement-caching/
https://jdbc.postgresql.org/documentation/use/
https://www.cnblogs.com/wangzhen3798/p/12206811.html
https://use-the-index-luke.com/blog/2011-07-16/planning-for-reuse

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

Postgres execution plan cache

评论