关于PostgreSQL的分区表的历史及分区裁剪参数的区别

中启乘数 2021-09-17

778

1. 疑惑

我们知道控制分区裁剪的参数有两个：

enable_partition_pruning
constraint_exclusion

这两个参数有什么区别呢？

2. 解答

要说明这两个参数的区别需要先讲一讲PostgreSQL数据库中分区的历史，在PostgreSQL 10版本之前，PostgreSQL数据库实际上是没有单独的创建分区表的DDL语句，都是通过表继承的原理来创建分区表，这样使得在PostgreSQL中使用分区表不是很方便，到PostgreSQL 10之后，PostgreSQL扩展了创建表的DDL语句，可以用这个DDL语句来创建分区表，原先使用继承的方式还是可以创建分区表，但这两种分区表是不能混用的。于是PostgreSQL 10增加的分区表叫声明式分区（Declarative Partitioning），原先使用表继承的方式仍然可以实现分区表的功能。

而使用继承的方式实现的分区表的分区裁剪是靠设置参数“constraint_exclusion=partition”来实现的，而如果使用了声明式分区表，则需要使用参数“enable_partition_pruning”来控制是否使用分区裁剪功能。

3. 测试

创建声明式分区表:

CREATE TABLE ptab01 (
    id         int not null,
    tm         timestamptz not null
) PARTITION BY RANGE (tm);

create table ptab01_202001 partition of ptab01 for values from ('2020-01-01') to ('2020-02-01');
create table ptab01_202002 partition of ptab01 for values from ('2020-02-01') to ('2020-03-01');
create table ptab01_202003 partition of ptab01 for values from ('2020-03-01') to ('2020-04-01');
create table ptab01_202004 partition of ptab01 for values from ('2020-04-01') to ('2020-05-01');
create table ptab01_202005 partition of ptab01 for values from ('2020-05-01') to ('2020-06-01');

insert into ptab01 select extract(epoch from seq), seq from generate_series('2020-01-01'::timestamptz, '2020-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq;

创建传统的继承式的分区表:



CREATE TABLE ptab02 (
    id         int not null,
    tm         timestamptz not null
);


CREATE TABLE ptab02_202001 (
    CHECK ( tm >= '2020-01-01'::timestamptz AND tm < '2020-02-01'::timestamptz )
) INHERITS (ptab02);

CREATE TABLE ptab02_202002 (
    CHECK ( tm >= '2020-02-01'::timestamptz AND tm < '2020-03-01'::timestamptz )
) INHERITS (ptab02);

CREATE TABLE ptab02_202003 (
    CHECK ( tm >= '2020-03-01'::timestamptz AND tm < '2020-04-01'::timestamptz )
) INHERITS (ptab02);

CREATE TABLE ptab02_202004 (
    CHECK ( tm >= '2020-04-01'::timestamptz AND tm < '2020-05-01'::timestamptz )
) INHERITS (ptab02);

CREATE TABLE ptab02_202005 (
    CHECK ( tm >= '2020-05-01'::timestamptz AND tm < '2020-06-01'::timestamptz )
) INHERITS (ptab02);

CREATE OR REPLACE FUNCTION ptab02_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.tm >= '2020-01-01'::timestamptz AND NEW.tm < '2020-02-01'::timestamptz THEN
        INSERT INTO ptab02_202001 VALUES (NEW.*);
    ELSIF  NEW.tm >= '2020-02-01'::timestamptz AND NEW.tm < '2020-03-01'::timestamptz THEN
        INSERT INTO ptab02_202002 VALUES (NEW.*);
    ELSIF  NEW.tm >= '2020-02-01'::timestamptz AND NEW.tm < '2020-04-01'::timestamptz THEN
        INSERT INTO ptab02_202003 VALUES (NEW.*);
    ELSIF  NEW.tm >= '2020-02-01'::timestamptz AND NEW.tm < '2020-05-01'::timestamptz THEN
        INSERT INTO ptab02_202004 VALUES (NEW.*);
    ELSIF  NEW.tm >= '2020-02-01'::timestamptz AND NEW.tm < '2020-06-01'::timestamptz THEN
        INSERT INTO ptab02_202005 VALUES (NEW.*);
    ELSE
        RAISE 'value % out of range ', NEW.tm;
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER insert_ptab02_trigger
    BEFORE INSERT ON ptab02
    FOR EACH ROW EXECUTE FUNCTION ptab02_insert_trigger();


insert into ptab02 select extract(epoch from seq), seq from generate_series('2020-01-01'::timestamptz, '2020-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq;

默认情况下constraint_exclusion为partition和enable_partition_pruning为on，不管是表继承方式实现的分区表还是声明式分区表都可以走到分区裁剪，如下所示：

postgres=# show constraint_exclusion;
 constraint_exclusion
----------------------
 partition
(1 row)

postgres=# show enable_partition_pruning;
 enable_partition_pruning
--------------------------
 on
(1 row)
postgres=# explain select * from ptab01 where tm='2020-01-07'::timestamptz;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Gather  (cost=1000.00..4417.51 rows=1 width=12)
   Workers Planned: 1
   ->  Parallel Seq Scan on ptab01_202001 ptab01  (cost=0.00..3417.41 rows=1 width=12)
         Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(4 rows)

postgres=# explain select * from ptab02 where tm='2020-01-07'::timestamptz;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..4417.62 rows=2 width=12)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..3417.42 rows=2 width=12)
         ->  Parallel Seq Scan on ptab02_202001 ptab02_2  (cost=0.00..3417.41 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab02 ptab02_1  (cost=0.00..0.00 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(7 rows)

从上面可以看出，声明式分区表只扫描了包括指定时间实际的分区ptab01_202001，没有扫描其他时间段的分区，继承式的分区表只扫描了父表ptab02和包括指定时间实际的分区ptab02_202001。

当我们把参数enable_partition_pruning设置为off，这是可以看到查询声明式分区表时，会扫描所有分区，没有进行分区裁剪，但是继承式分区表还是进行了分区裁剪：

postgres=# set enable_partition_pruning to off;
SET
postgres=# explain select * from ptab01 where tm='2020-01-07'::timestamptz;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..17758.00 rows=5 width=12)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..16757.50 rows=5 width=12)
         ->  Parallel Seq Scan on ptab01_202001 ptab01_1  (cost=0.00..3417.41 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab01_202003 ptab01_3  (cost=0.00..3417.41 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab01_202005 ptab01_5  (cost=0.00..3417.41 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab01_202004 ptab01_4  (cost=0.00..3307.88 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab01_202002 ptab01_2  (cost=0.00..3197.35 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(13 rows)

postgres=# explain select * from ptab02 where tm='2020-01-07'::timestamptz;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..4417.62 rows=2 width=12)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..3417.42 rows=2 width=12)
         ->  Parallel Seq Scan on ptab02_202001 ptab02_2  (cost=0.00..3417.41 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab02 ptab02_1  (cost=0.00..0.00 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(7 rows)

当我们把参数constraint_exclusion设置为off，这是可以看到继承式分区表就没有再做分区裁剪了：

postgres=# set constraint_exclusion to off;
SET
postgres=# explain select * from ptab02 where tm='2020-01-07'::timestamptz;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..17758.10 rows=6 width=12)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..16757.50 rows=6 width=12)
         ->  Parallel Seq Scan on ptab02_202001 ptab02_2  (cost=0.00..3417.41 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab02_202003 ptab02_4  (cost=0.00..3417.41 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab02_202005 ptab02_6  (cost=0.00..3417.41 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab02_202004 ptab02_5  (cost=0.00..3307.88 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab02_202002 ptab02_3  (cost=0.00..3197.35 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
         ->  Parallel Seq Scan on ptab02 ptab02_1  (cost=0.00..0.00 rows=1 width=12)
               Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(15 rows)

由此可见：

enable_partition_pruning：只控制声明式分区表的分区裁剪，对继承式分区表没有影响。
constraint_exclusion：只控制继承式分区表的分区裁剪，对声明式分区表没有影响。

4. 区别

通常声明式分区比继承式分区表在更多的情况下能走到分区裁剪，如在select * from tab01 where tm = (select tm from tab02);
类似的SQL中：

postgres=# explain analyze select * from ptab01 where tm = (select tm  from tabtm);
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1001.01..17759.01 rows=5 width=12) (actual time=4.154..27.762 rows=1 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   InitPlan 1 (returns $0)
     ->  Seq Scan on tabtm  (cost=0.00..1.01 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
   ->  Parallel Append  (cost=0.00..16757.50 rows=5 width=12) (actual time=1.228..6.338 rows=0 loops=3)
         ->  Parallel Seq Scan on ptab01_202001 ptab01_1  (cost=0.00..3417.41 rows=1 width=12) (actual time=3.652..18.978 rows=1 loops=1)
               Filter: (tm = $0)
               Rows Removed by Filter: 267839
         ->  Parallel Seq Scan on ptab01_202003 ptab01_3  (cost=0.00..3417.41 rows=1 width=12) (never executed)
               Filter: (tm = $0)
         ->  Parallel Seq Scan on ptab01_202005 ptab01_5  (cost=0.00..3417.41 rows=1 width=12) (never executed)
               Filter: (tm = $0)
         ->  Parallel Seq Scan on ptab01_202004 ptab01_4  (cost=0.00..3307.88 rows=1 width=12) (never executed)
               Filter: (tm = $0)
         ->  Parallel Seq Scan on ptab01_202002 ptab01_2  (cost=0.00..3197.35 rows=1 width=12) (never executed)
               Filter: (tm = $0)
 Planning Time: 0.121 ms
 Execution Time: 27.858 ms
(20 rows)

postgres=# explain analyze select * from ptab02 where tm = (select tm  from tabtm);
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1001.01..17759.11 rows=6 width=12) (actual time=107.141..107.310 rows=1 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   InitPlan 1 (returns $0)
     ->  Seq Scan on tabtm  (cost=0.00..1.01 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=1)
   ->  Parallel Append  (cost=0.00..16757.50 rows=6 width=12) (actual time=59.127..81.720 rows=0 loops=3)
         ->  Parallel Seq Scan on ptab02_202001 ptab02_2  (cost=0.00..3417.41 rows=1 width=12) (actual time=5.747..39.632 rows=0 loops=2)
               Filter: (tm = $0)
               Rows Removed by Filter: 133920
         ->  Parallel Seq Scan on ptab02_202003 ptab02_4  (cost=0.00..3417.41 rows=1 width=12) (actual time=45.304..45.304 rows=0 loops=1)
               Filter: (tm = $0)
               Rows Removed by Filter: 267840
         ->  Parallel Seq Scan on ptab02_202005 ptab02_6  (cost=0.00..3417.41 rows=1 width=12) (actual time=21.245..21.245 rows=0 loops=2)
               Filter: (tm = $0)
               Rows Removed by Filter: 133920
         ->  Parallel Seq Scan on ptab02_202004 ptab02_5  (cost=0.00..3307.88 rows=1 width=12) (actual time=60.385..60.385 rows=0 loops=1)
               Filter: (tm = $0)
               Rows Removed by Filter: 259200
         ->  Parallel Seq Scan on ptab02_202002 ptab02_3  (cost=0.00..3197.35 rows=1 width=12) (actual time=17.671..17.671 rows=0 loops=1)
               Filter: (tm = $0)
               Rows Removed by Filter: 250560
         ->  Parallel Seq Scan on ptab02 ptab02_1  (cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.001 rows=0 loops=1)
               Filter: (tm = $0)
 Planning Time: 0.171 ms
 Execution Time: 107.431 ms
(26 rows)

从上面可以看到声明式分区可以做动态的分区裁剪，不需要扫描的分区后面会有“(never executed)”的提示，表示把这些分区表都跳过了。

本站文章，未经作者同意，请勿转载，如需转载，请邮件customer@csudata.com.

postgresql

文章转载自中启乘数，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

关于PostgreSQL的分区表的历史及分区裁剪参数的区别

1. 疑惑

2. 解答

3. 测试

4. 区别

评论