暂无图片
暂无图片
5
暂无图片
暂无图片
暂无图片

Postgres execution plan cache

Image.png
大家好,今天聊聊 PG的执行计划的缓存机制。

熟悉oracle的小伙伴都知道,ORACLE采用shared pool这种global cache area的方式缓存SQL 语句的执行计划,有效的减少了 SQL 硬解析的次数。

熟悉MYSQL的小伙伴们都知道,mysql 并不会cache 执行计划。(query cache 从8.0开始已经废弃了,并且cache的也不是执行计划本身)

我们今天聊的PG的execution plan cache 恰恰介于上面2者之间, 没有老大哥ORACLE 放到global cache中那么彻底 ,放到了session 级别的cache 中,

相对于同门兄弟mysql 来说,PG在连接池 + 高并发的场景下,session 级别的cache 也能有效的规避一些硬解析。

首先我们先看一下,同一个session 中一条SQL语句执行多少次计划会被cache住?

(有人说是5次,有人抱怨根本不会cache计划)

具体的原则需要看你的客户端是什么? PSQL or Java client (很多图形工具实际上也是用的JDBC 的驱动连接) 是否采用了 prepare 或者 preparedstatement 的方式

服务器端参数 plan_cache_mode 是如何设置? 这个参数是PG是version 12版本开始引入的

简单地说:
默认值auto: PG 自行逻辑判断是否被缓存(基于成本)
force_generic_plan:可以重用缓存的计划
force_custom_plan: 每次连接刷新缓存,就是不被缓存
Image.png
PG plan_cache_mode 默认的参数值 auto, 在源码中有较为详尽的解释:src/backend/utils/cache/plancache.c
有兴趣的朋友可以详细阅读一下,这里篇幅受限,只粘贴了一部分重点。

* * plancache.c * Plan cache management. * * The plan cache manager has two principal responsibilities: deciding when * to use a generic plan versus a custom (parameter-value-specific) plan, * and tracking whether cached plans need to be invalidated because of schema * changes in the objects they depend on. * * The logic for choosing generic or custom plans is in choose_custom_plan, * which see for comments.
复制

plancache.c 文件实现了 plan cache的管理功能,
包含2大核心功能:

  1. 决定使用generic plan 还是 custom plan
  2. 维护已经被缓存plan 是否会失效,由于对象的定义发生了变化

choose_custom_plan这个函数逻辑是判断generic plan or custom plan :

主要考虑到如下几点:
1判断是否是第一次执行 (is_oneshot)
2)判断例如java客户端传入的参数设置(boundParams)
3)判断是否收到transaction statemant control
4)数据库端是否设置了参数 PLAN_CACHE_MODE_FORCE_GENERIC_PLAN or PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN
5)用户session 级别设置了参数 PLAN_CACHE_MODE_FORCE_GENERIC_PLAN or PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN
6)执行的次数大于5次 (plansource->num_custom_plans < 5)
7)最后基于成本计算判断(generic_cost < avg_custom_cost)

/* * choose_custom_plan: choose whether to use custom or generic plan * * This defines the policy followed by GetCachedPlan. */ static bool choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams) { double avg_custom_cost; /* One-shot plans will always be considered custom */ if (plansource->is_oneshot) return true; /* Otherwise, never any point in a custom plan if there's no parameters */ if (boundParams == NULL) return false; /* ... nor for transaction control statements */ if (IsTransactionStmtPlan(plansource)) return false; /* Let settings force the decision */ if (plan_cache_mode == PLAN_CACHE_MODE_FORCE_GENERIC_PLAN) return false; if (plan_cache_mode == PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN) return true; /* See if caller wants to force the decision */ if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN) return false; if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN) return true; /* Generate custom plans until we have done at least 5 (arbitrary) */ if (plansource->num_custom_plans < 5) return true; avg_custom_cost = plansource->total_custom_cost / plansource->num_custom_plans; /* * Prefer generic plan if it's less expensive than the average custom * plan. (Because we include a charge for cost of planning in the * custom-plan costs, this means the generic plan only has to be less * expensive than the execution cost plus replan cost of the custom * plans.) * * Note that if generic_cost is -1 (indicating we've not yet determined * the generic plan cost), we'll always prefer generic at this point. */ if (plansource->generic_cost < avg_custom_cost) return false; return true; }
复制

我们再来看看cache的plan在什么条件下会失效?

我们看一下RevalidateCachedQuery这个函数主要负责cache的有效性的检查(失效后会重新进行解析)

1.判断客户端的的变量search_path是否发生变化

/* * If the query is currently valid, we should have a saved search_path --- * check to see if that matches the current environment. If not, we want * to force replan. */ if (plansource->is_valid) { Assert(plansource->search_path != NULL); if (!OverrideSearchPathMatchesCurrent(plansource->search_path)) { /* Invalidate the querytree and generic plan */ plansource->is_valid = false; if (plansource->gplan) plansource->gplan->is_valid = false; } }
复制

2.行级别的安全保护和用户角色是否发生了改变

/* * If the query rewrite phase had a possible RLS dependency, we must redo * it if either the role or the row_security setting has changed. */ if (plansource->is_valid && plansource->dependsOnRLS && (plansource->rewriteRoleId != GetUserId() || plansource->rewriteRowSecurity != row_security)) plansource->is_valid = false;
复制

3.如果前2个步骤的验证都通过的话,会加上锁AcquirePlannerLocks 会结束验证并且返回当前的计划的valid

AcquirePlannerLocks(plansource->query_list, true); if (plansource->is_valid) { /* Successfully revalidated and locked the query. */ return NIL; }
复制

4.接下来,是验证失败的情况,需要重新parse并生成执行计划。
先清除MemoryConext中的语句, 清内存中的plansource 的对象

MemoryContextDelete(qcxt); ReleaseGenericPlan(plansource);
复制

5.重新进行 parse analysis and rule rewriting

rawtree = copyObject(plansource->raw_parse_tree); if (rawtree == NULL) tlist = NIL; else if (plansource->parserSetup != NULL) tlist = pg_analyze_and_rewrite_params(rawtree, plansource->query_string, plansource->parserSetup, plansource->parserSetupArg, queryEnv); else tlist = pg_analyze_and_rewrite(rawtree, plansource->query_string, plansource->param_types, plansource->num_params, queryEnv);
复制

6.判断元祖的定义(表的DDL)是否需要更新

if (resultDesc == NULL && plansource->resultDesc == NULL) { /* OK, doesn't return tuples */ } else if (resultDesc == NULL || plansource->resultDesc == NULL || !equalTupleDescs(resultDesc, plansource->resultDesc)) { /* can we give a better error message? */ if (plansource->fixed_result) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cached plan must not change result type"))); oldcxt = MemoryContextSwitchTo(plansource->context);
复制

7.最后设置对象plansource 相关属性:roleID, Row Level Security , search_path,query_context,query_list,is_valid
返回查询树 tlist

plansource->rewriteRoleId = GetUserId(); plansource->rewriteRowSecurity = row_security; plansource->search_path = GetOverrideSearchPath(querytree_context); plansource->query_context = querytree_context; plansource->query_list = qlist; return tlist;
复制

我们结束源码的阅读,接下来是动手实验部分:

先来通过PSQL的客户端进行测试: (PSQL只有在prepare 命令下才能触发执行计划缓存)

关于PREPARE的使用可以参考: https://www.postgresql.org/docs/current/sql-prepare.html

实验步骤如下:

创建一张表,加入一些测试数据,并打开 auto_plan 插件,把SQL的执行计划打印到日志中迁移(auto_explain.log_min_duration = 0 设置为0 方便我们打印所有的)

postgres=# create table t1 (id int primary key, name varchar(100)); CREATE TABLE postgres=# insert into t1 select 1, 'jason'; INSERT 0 1 postgres=# LOAD 'auto_explain'; LOAD postgres=# set auto_explain.log_min_duration = 0; SET
复制

我们设置参数 plan_cache_mode 由默认的auto-》 force_generic_plan:

postgres=# set plan_cache_mode = 'force_generic_plan'; SET postgres=# show plan_cache_mode; plan_cache_mode -------------------- force_generic_plan (1 row)
复制

我们采用PREPARE的方式执行一条查询语句:

postgres=# PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1; PREPARE postgres=# EXECUTE demo(1); name ------- jason (1 row)
复制

从PG的日志中观察执行计划: 走的主键索引t1_pkey 访问方式

INFRA [postgres@ljzdccapp007 log]# tail -f postgresql-14.log | grep t1 Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1; Index Scan using t1_pkey on t1 (cost=0.15..4.17 rows=1 width=218)
复制

我们继续执行同样的语句直到5次:

postgres=# EXECUTE demo(1); name ------- jason (1 row) ... ... postgres=# EXECUTE demo(1); name ------- jason (1 row)
复制

根据我们的参数设置 force_generic_plan, 语句SELECT name FROM t1 WHERE id = $1;的执行计划会被cache住:

我们尝试关闭一些GUC的参数,来影响一下SQL的执行计划:

postgres=# set enable_indexscan = off; SET postgres=# set enable_bitmapscan = off; SET postgres=# select * from t1 where id = 1; id | name ----+------- 1 | jason (1 row)
复制

我们发现执行日志中的SQL语句: select * from t1 where id = 1; 注意这里采用的是硬解析的方式,而不是prepare的方式,
查看执行计划变成了 seqscan 证明GUC的参数生效: (红色部分)

Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)

Query Text: select * from t1 where id = 1;
Seq Scan on t1 (cost=0.00…14.00 rows=1 width=222)

我们再次执行 EXECUTE demo(1);

postgres=# EXECUTE demo(1); name ------- jason (1 row)
复制

发现执行计划依然是原来的:Index Scan using t1_pkey , 证明cache 已经生效了(最后的绿色部分)

Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)

Query Text: select * from t1 where id = 1;
Seq Scan on t1 (cost=0.00…14.00 rows=1 width=222)


Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)

进一步验证,我们打开一个新的session 窗口,这次设置同样的GUC参数,看看计划是否会改变:

postgres=# set plan_cache_mode = 'force_generic_plan'; SET postgres=# LOAD 'auto_explain'; LOAD postgres=# set auto_explain.log_min_duration = 0; SET postgres=# set enable_indexscan = off; SET postgres=# set enable_bitmapscan = off; SET postgres=# PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1; PREPARE postgres=# EXECUTE demo(1); name ------- jason (1 row)
复制

我们看到计划是GUC参数设置所预期的 :Seq Scan on t1 (最后的红色部分)

Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)
Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)

Query Text: select * from t1 where id = 1;
Seq Scan on t1 (cost=0.00…14.00 rows=1 width=222)


Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Index Scan using t1_pkey on t1 (cost=0.15…4.17 rows=1 width=218)


Query Text: PREPARE demo (int) AS SELECT name FROM t1 WHERE id = $1;
Seq Scan on t1 (cost=0.00…14.00 rows=1 width=218)

所以我们通过动手实验得到结论,同样的的SQL 在不同的session中由于缓存的存在,可能执行计划不一样,站在应用系统的体验角度,会出现同一个功能,性能时快时慢的情况。

我们进入下一个实验,采用目前JAVA程序 spring boot 进行JDBC的程序测试:

Postgres JDBC 官方文档: https://jdbc.postgresql.org/documentation/use/

prepareThreshold (int) : 默认SQL执行5次被cache
preparedStatementCacheQueries (int) : 默认保存执行计划的SQL的条数是256
preparedStatementCacheSizeMiB (int) : 默认保存cache的大小是5MB,超过会通过LRU 抛弃。

Image.png

写一个简单的测试小程序:

目的为了打印JAVA客户端设置的PrepareThreshold,以及缓存的pgstatment内容,以及是否会使用缓存isUseServerPrepare()

@RequestMapping(value="/planCacheTest", method=RequestMethod.GET) public String planCacheTest() throws Exception { StringBuilder str_result = new StringBuilder(); String sql = "SELECT name FROM t1 WHERE id = ?"; PreparedStatement statement = jdbcTemplate.getDataSource().getConnection().prepareStatement(sql); org.postgresql.PGStatement pgstatment = statement.unwrap(org.postgresql.PGStatement.class); str_result.append("The default JDBC cache preparedstatement:" + pgstatment.getPrepareThreshold()+" "); /* CachedQuery cachedQuery = ReflectionUtils.getFieldValue( pgstatment , org.postgresql.PGStatement.class.getDeclaredField("preparedQuery") );*/ str_result.append("The default JDBC cache Query:" + pgstatment.toString()+" "); for (int i=1; i<=10; i++) { statement.setInt(1,i); boolean usingServerPrepare = pgstatment.isUseServerPrepare(); ResultSet rs = statement.executeQuery(); rs.next(); str_result.append("Execution: "+i+", Used server side: " + usingServerPrepare + ", Result: "+rs.getString(1) + "</br>"); rs.close(); } statement.close(); return str_result.toString(); }
复制

启动springboot后, 在浏览器中进行测试: 一切符合我们的预期.

Image.png

接下来我们讨论一种情况, 由于表的数据分布不均匀,造成cache的执行时候往往不是有效的? 这个时候如何解决?
(ORACLE 早在10几年前的11G版本中,引入了ACS Adapt cursor sharing 游标自适应特性,就是为了解决缓存低效执行计划的问题)

个人观点参考如下:

1.修改全局或者user或者database级别的参数 plan_cache_mode = force_custom_plan 每次生成新的计划
2.使用pg_hint_plan, 绑定有效的计划 可以参考 : https://www.modb.pro/db/609428
3.jdbc 客户端设置 setPrepareThreshold() = 0, 关闭缓存
4.代码改写:直接针对有问题的SQL,不用preparestatement绑定变量的方式,直接走SQL硬解析。

所以到底用不用plan cache是一个根据自己数据库系统特点来trade off的选择。

最后总结:
1.源码层面阅读了cacheplan manager , 核心2功能: choose_custom_plan 函数-》custom or generic plan? RevalidateCachedQuery 函数 => 验证缓存的计划是否失效
2.PSQL 客户端的prepare command 进行了缓存测试实验
3.JDBC 通过springboot 测试了JAVA 端缓存测试
4.讨论了数据分布不均,执行计划不稳定的情况下的解决方法。

Have a fun 🙂 !

参考链接:
src/backend/utils/cache/plancache.c
https://vladmihalcea.com/postgresql-jdbc-statement-caching/
https://jdbc.postgresql.org/documentation/use/
https://www.cnblogs.com/wangzhen3798/p/12206811.html
https://use-the-index-luke.com/blog/2011-07-16/planning-for-reuse

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论