暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

关于PostgreSQL的绑定变量窥视的问题详解

中启乘数 2023-06-02
93

绑定变量窥视的原理说明

Oracle DBA都知道,绑定变量窥视功能是Oracle数据库的一个特性,自ORACLE9i版本开始引入,是可以通过参数数“_optim_peek_user_binds”来控制是否开启,默认是开启,即为TRUE。这就意味着,第一次以变量的方式执行某类SQL时,会生成第一个执行计划,后续执行该类SQL语句,即使变量的传入值不同,但因变量窥视的作用,依然会沿用第一次SQL语句执行时生成的执行计划,这种特性非常适用于业务表数据分布比较均匀的场景,执行计划比较稳定。但对于数据分布严重倾斜的业务表,可能会出现错误的执行计划,在极端情况下,会引发严重的性能问题。。

当”_optim_peek_user_binds”参数设置为FALSE。即,将绑定变量窥视参数特性禁用。那么已经执行过的某类值的执行计划将不会发生变化,一旦传入某个新值时,优化器会自动根据被访问对象的统计、直方图等信息,产生它认为效率最高、成本最低的执行计划。也就是说,在特性关闭的情况下,该类SQL语句可能会产生更优的执行计划。

所以为了让系统的性能不至于大起大落,在很多用户那里会关闭绑定变量窥视的功能。

那么PostgreSQL数据库在绑定变量的执行计划这一块的行为是什么呢?
PostgreSQL数据库的行为有一些复杂:

前5次执行时,每次都会根据实际传入的实际绑定变量新生成执行计划进行执行,即每次都是硬解析,同时会记录这5次的执行计划;

当第6次开始执行时,会生成一个通用的执行计划(generic plan),同时与前5次的执行计划进行比较,如果比较的结果是通用执行计划不比前5次的执行计划差,以后就会把这个通用的执行计划固定下来,这之后即使传入的值发生变化后,执行计划也不再变化。这就相当于Oracle打开了绑定变量窥视的功能。

当然,当第6次开始执行时,如果通用的执行计划(generic plan)比前5次的某一个执行计划差,则以后则每次都重新生成执行计划,即以后永远都是硬解析了。

从上面原理可以看出,PostgreSQL数据库能否不走硬解析,与前5次执行时传入的实际值有很大的关系,可以想象如果前5次执行时都是一个固定的值,第6次执行时的通用执行计划与前5次又一样,这时执行计划就会固定,如果以后传进来的值可以生成更好的执行计划,也不会生成了,这时可能会导致比较大的问题,这与Oracle打开了绑定变量窥视产生了一样的问题。只是因为PostgreSQL因为有先执行5次,然后第6次比较的机制,让这个问题出现的概率低了很多,但实际上还是会出现的。

当然,如果每次都是重新生成执行计划,对于高并发,会降低一些性能。实际上,对于一些重要的系统,每次重新生成执行计划,会更好一些,因为这种方式防止了系统的性能大起大落。

目前,网上很少有文章介绍这个原理,即使有也是把这个原理介绍的不清楚。

实际测试

造测试表和数据

下面我们实际测试,来更深的理解这个原理。

    create table test01(id serial, t text);
    insert into test01(t) select 'tang' from generate_series(1, 1000000);
    insert into test01(t) select 'osdba' from generate_series(1, 2);
    CREATE INDEX idx_test01_t ON test01(t);
    analyze test01;
    复制

    上面的例子中我们创建了不均匀的数据,即为“tang”的数据是100万,而为“osdba”的数据是2条。

    如果我们按常量来查,执行计划会走正确:

      osdba=# explain SELECT count(*) FROM test01 WHERE t = 'tang';
      QUERY PLAN
      ------------------------------------------------------------------------------------------
      Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
      -> Gather (cost=12656.01..12656.22 rows=2 width=8)
      Workers Planned: 2
      -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
      -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
      Filter: (t = 'tang'::text)
      (6 rows)
      Time: 1.532 ms
      osdba=# explain SELECT count(*) FROM test01 WHERE t = 'osdba';
      QUERY PLAN
      --------------------------------------------------------------------------------------
      Aggregate (cost=4.45..4.46 rows=1 width=8)
      -> Index Only Scan using idx_test01_t on test01 (cost=0.42..4.44 rows=1 width=0)
      Index Cond: (t = 'osdba'::text)
      (3 rows)
      Time: 1.484 ms
      复制

      上面可以看到,当按“tang”来查是,走的是全表扫描,而按“osdba”查询时走的是索引,说明执行计划都是正确的。


      按绑定变量的第一次测试

      下面我们按绑定变量的方式执行:

        PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
        复制

        下面具体看:

          osdba-mac:~ osdba$ psql
          psql (10.5)
          Type "help" for help.
          osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
          PREPARE
          osdba=# explain EXECUTE myplan('tang');
          QUERY PLAN
          ------------------------------------------------------------------------------------------
          Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
          -> Gather (cost=12656.01..12656.22 rows=2 width=8)
          Workers Planned: 2
          -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
          -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
          Filter: (t = 'tang'::text)
          (6 rows)
          osdba=# explain EXECUTE myplan('tang');
          QUERY PLAN
          ------------------------------------------------------------------------------------------
          Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
          -> Gather (cost=12656.01..12656.22 rows=2 width=8)
          Workers Planned: 2
          -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
          -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
          Filter: (t = 'tang'::text)
          (6 rows)
          osdba=# explain EXECUTE myplan('tang');
          QUERY PLAN
          ------------------------------------------------------------------------------------------
          Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
          -> Gather (cost=12656.01..12656.22 rows=2 width=8)
          Workers Planned: 2
          -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
          -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
          Filter: (t = 'tang'::text)
          (6 rows)
          osdba=# explain EXECUTE myplan('tang');
          QUERY PLAN
          ------------------------------------------------------------------------------------------
          Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
          -> Gather (cost=12656.01..12656.22 rows=2 width=8)
          Workers Planned: 2
          -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
          -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
          Filter: (t = 'tang'::text)
          (6 rows)
          osdba=# explain EXECUTE myplan('tang');
          QUERY PLAN
          ------------------------------------------------------------------------------------------
          Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
          -> Gather (cost=12656.01..12656.22 rows=2 width=8)
          Workers Planned: 2
          -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
          -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
          Filter: (t = 'tang'::text)
          (6 rows)
          osdba=# explain EXECUTE myplan('tang');
          QUERY PLAN
          ------------------------------------------------------------------------------------------
          Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
          -> Gather (cost=12656.01..12656.22 rows=2 width=8)
          Workers Planned: 2
          -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
          -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
          Filter: (t = $1)
          (6 rows)
          复制

          注意上面执行中的第6次的执行计划和第5次的执行计划发生了变化,前5次都是“Filter: (t = ‘tang’::text)”,而第6次变成了“Filter: (t = $1)”,这说明执行计划变成了通用执行计划,这时我们把传进去的值改成“osdba”,发现也会是走全表扫描了,不会走索引了,这时的执行计划就错了:

            osdba=# explain analyze EXECUTE myplan('osdba');
            QUERY PLAN
            ----------------------------------------------------------------------------------------------------------------------------------------
            Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8) (actual time=114.069..114.069 rows=1 loops=1)
            -> Gather (cost=12656.01..12656.22 rows=2 width=8) (actual time=113.957..114.865 rows=3 loops=1)
            Workers Planned: 2
            Workers Launched: 2
            -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8) (actual time=106.088..106.088 rows=1 loops=3)
            -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0) (actual time=106.072..106.072 rows=1 loops=3)
            Filter: (t = $1)
            Rows Removed by Filter: 333333
            Planning time: 0.035 ms
            Execution time: 115.044 ms
            (10 rows)
            复制


            按绑定变量的第二次测试

            前面的测试时,我们前5次执行时传进去的值都是“tang”,我们这一次让前5次中前四次传进去的值是“tang”,最后一次是“osdba”:

              osdba-mac:~ osdba$ psql
              psql (10.5)
              Type "help" for help.
              osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
              PREPARE
              osdba=# explain EXECUTE myplan('tang');
              QUERY PLAN
              ------------------------------------------------------------------------------------------
              Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
              -> Gather (cost=12656.01..12656.22 rows=2 width=8)
              Workers Planned: 2
              -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
              -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
              Filter: (t = 'tang'::text)
              (6 rows)
              osdba=# explain EXECUTE myplan('tang');
              QUERY PLAN
              ------------------------------------------------------------------------------------------
              Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
              -> Gather (cost=12656.01..12656.22 rows=2 width=8)
              Workers Planned: 2
              -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
              -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
              Filter: (t = 'tang'::text)
              (6 rows)
              osdba=# explain EXECUTE myplan('tang');
              QUERY PLAN
              ------------------------------------------------------------------------------------------
              Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
              -> Gather (cost=12656.01..12656.22 rows=2 width=8)
              Workers Planned: 2
              -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
              -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
              Filter: (t = 'tang'::text)
              (6 rows)
              osdba=# explain EXECUTE myplan('tang');
              QUERY PLAN
              ------------------------------------------------------------------------------------------
              Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
              -> Gather (cost=12656.01..12656.22 rows=2 width=8)
              Workers Planned: 2
              -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
              -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
              Filter: (t = 'tang'::text)
              (6 rows)
              osdba=# explain EXECUTE myplan('osdba');
              QUERY PLAN
              --------------------------------------------------------------------------------------
              Aggregate (cost=4.45..4.46 rows=1 width=8)
              -> Index Only Scan using idx_test01_t on test01 (cost=0.42..4.44 rows=1 width=0)
              Index Cond: (t = 'osdba'::text)
              (3 rows)
              复制

              后面我们无论再怎么执行固定的值,发现每次都是生成新的执行计划了:

                osdba=# explain EXECUTE myplan('tang');
                QUERY PLAN
                ------------------------------------------------------------------------------------------
                Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
                -> Gather (cost=12656.01..12656.22 rows=2 width=8)
                Workers Planned: 2
                -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
                -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
                Filter: (t = 'tang'::text)
                (6 rows)
                osdba=# explain EXECUTE myplan('tang');
                QUERY PLAN
                ------------------------------------------------------------------------------------------
                Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
                -> Gather (cost=12656.01..12656.22 rows=2 width=8)
                Workers Planned: 2
                -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
                -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
                Filter: (t = 'tang'::text)
                (6 rows)
                ...
                ...
                ...
                osdba=# explain EXECUTE myplan('osdba');
                QUERY PLAN
                --------------------------------------------------------------------------------------
                Aggregate (cost=4.45..4.46 rows=1 width=8)
                -> Index Only Scan using idx_test01_t on test01 (cost=0.42..4.44 rows=1 width=0)
                Index Cond: (t = 'osdba'::text)
                (3 rows)
                ...
                ...
                osdba=# explain EXECUTE myplan('tang');
                QUERY PLAN
                ------------------------------------------------------------------------------------------
                Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
                -> Gather (cost=12656.01..12656.22 rows=2 width=8)
                Workers Planned: 2
                -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
                -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
                Filter: (t = 'tang'::text)
                (6 rows)
                复制

                上面的演示是前5次中前四次传进去的值是“tang”,最后一次是“osdba”,实际上只要前五次中,只要任意有1次或多次传进去的是“osdba”,不一定要求最后一次是“osdba”时,都不会走通用的执行计划,这个结果大家可以测试。

                这就验证了我们前面的理论。


                PostgreSQL 12的plan_cache_mode配置参数

                在PostgreSQL11及一下的版本中,因为绑定变量窥视,虽然比Oracle出现的概率低,但还是有一定的概率导致执行计划走错。那么在PostgreSQL中是否也有类似Oracle的隐含参数把绑定变量窥视关掉的功能?答案是PostgreSQL12提供了这个功能。

                在PostgreSQL 12提供了plan_cache_mode配置参数,可以取以下三个值:

                auto: 这时默认值,即默认情况下与PostgreSQL11及以下版本相同的行为。

                force_custom_plan: 相当于关闭绑定变量窥视,永远进行硬解析。

                force_generic_plan: 走通用的固定执行计划(generic plan)

                所以对于一些非常重要的系统,可以把“plan_cache_mode”配置成“force_custom_plan”,避免执行计划的错误,如下所示:

                  osdba-mac:pgdata12 osdba$ psql
                  psql (12.1)
                  Type "help" for help.
                  osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
                  PREPARE
                  osdba=# set plan_cache_mode to force_custom_plan;
                  SET
                  osdba=# explain EXECUTE myplan('osdba');
                  QUERY PLAN
                  --------------------------------------------------------------------------------------
                  Aggregate (cost=4.45..4.46 rows=1 width=8)
                  -> Index Only Scan using idx_test01_t on test01 (cost=0.42..4.44 rows=1 width=0)
                  Index Cond: (t = 'osdba'::text)
                  (3 rows)
                  osdba=# explain EXECUTE myplan('tang');
                  QUERY PLAN
                  ------------------------------------------------------------------------------------------
                  Finalize Aggregate (cost=12656.23..12656.24 rows=1 width=8)
                  -> Gather (cost=12656.01..12656.22 rows=2 width=8)
                  Workers Planned: 2
                  -> Partial Aggregate (cost=11656.01..11656.02 rows=1 width=8)
                  -> Parallel Seq Scan on test01 (cost=0.00..10614.34 rows=416668 width=0)
                  Filter: (t = 'tang'::text)
                  (6 rows)
                  复制


                  点击关注乘数科技

                  扫码添加乘数小助手微信号
                  邀您进入《PostgreSQL修炼之道:从小工到专家》
                  读者技术交流群



                  文章转载自中启乘数,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                  评论