PostgreSQL-分区表详解：何时该用分区？分区类型与优势全解析！

原创数据蛙 All China Database Union 2024-10-29

624

什么是分区表，何时需要分区？

将大表拆分成多个小的物理分块，从而减少数据库对这些表的查询时间及物理资源消耗来提升性能。

当表的大小应超过数据库服务器的物理内存时，就需要考虑将表进行分区。

分区表的优点

查询性能高
- 频繁查询的数据将会放在单个分区或者在少量的分区中。
- 当表分区后，频繁使用的索引将会存放在内存中。
- 当查询或更新单个分区的大量数据时，将会使用分区的顺序扫描而不是使用索引，从而提高性能，而且这将需要随机读取分布在整个表中的数据。
可以通过添加或移除分区来完成批量加载和删除。
减少批量 DELETE ，避免 VACUUM。
通过使用DROP TABLE或ALTER TABLE DETACH PARTITION删除单个分区比批量操作要快得多。
查询脚本可保持不变，对业务开发来说是透明的。
具有更高的可用性，比如一个分区的的数据页面损坏，不会影响其它分区。

分区类型

在PostgreSQL中，根据分区表的创建方式，可以将其分为声明式创建的分区表（在 10.x 以后才有）和继承式创建的分区表。在分区表的分区也可以是分区表，实现多级分区。在大多数生产环境中使用声明式分区都能满足，类似 Oracle 一样，有范围分区，列表分区、哈希分区和复合分区，但创建表语法比 Oracle 简单很多。以下是每种分区方式的使用场景：

声明式分区策略	常用场景
范围分区	适合时间序列数据等有顺序特征的数据。
列表分区	适用于有离散值的数据，如分类或状态。
哈希分区	适合均匀分布数据。
复合分区	结合两种或多种分区方式。可以先按范围分区，再在每个范围内按列表或哈希分区。

下面将演示如何创建与使用范围分区、列表分区、哈希分区。前提已安装好 postgresql ，并服务能正常运行。如未安装 postgresql 数据库可参考以下文章：
【PostgreSQL源码方式安装】

范围分区

范围分区是一种基于连续值范围（如日期或数值）将数据分成多个分区的方法。这种分区特别适用于时间序列数据或连续数值数据，能够在查询指定范围时显著提升性能。以下演示范围分区的创建、数据插入与查询过程：

# 开启执行时间，并测试查询数据时间
postgres=# \timing
Timing is on.

# 创建range分区表
postgres=# create table test_range_table(customer_id int,order_date date)partition by range(order_date);
CREATE TABLE
Time: 1.559 ms

# 创建范围分区并将分区与父表关联
postgres=# create table test_range_table_1 partition of test_range_table for values from('2024-01-01'::date) to ('2024-04-01'::date);
CREATE TABLE
Time: 1.627 ms
postgres=# create table test_range_table_2 partition of test_range_table for values from('2024-04-01'::date) to ('2024-07-01'::date);
CREATE TABLE
Time: 1.324 ms
postgres=# create table test_range_table_3 partition of test_range_table for values from('2024-07-01'::date) to ('2024-10-01'::date);
CREATE TABLE
Time: 1.376 ms
postgres=# create table test_range_table_4 partition of test_range_table for values from('2024-10-01'::date) to ('2025-01-01'::date);
CREATE TABLE
Time: 1.227 ms

# 查看分区表的表详细结构信息
postgres=# \d+ test_range_table
                                 Partitioned table "public.test_range_table"
   Column    |  Type   | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
 customer_id | integer |           |          |         | plain   |             |              |
 order_date  | date    |           |          |         | plain   |             |              |
Partition key: RANGE (order_date)
Partitions: test_range_table_1 FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'),
            test_range_table_2 FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'),
            test_range_table_3 FOR VALUES FROM ('2024-07-01') TO ('2024-10-01'),
            test_range_table_4 FOR VALUES FROM ('2024-10-01') TO ('2025-01-01')

# 生成从 2024-01-01 到 2024-10-29 的模拟数据
postgres=# insert into test_range_table values(floor(random()*1000000+1),generate_series('2024-01-01'::date,'2024-10-29'::date,'1 day'));
INSERT 0 303
Time: 3.555 ms

# 从以下查询计划可以看出，数据是通过分区扫描查询
postgres=# explain select customer_id,order_date from test_range_table where order_date >='2024-01-01'::date and order_date<='2024-01-31'::date;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Seq Scan on test_range_table_1 test_range_table  (cost=0.00..2.37 rows=1 width=8)
   Filter: ((order_date >= '2024-01-01'::date) AND (order_date <= '2024-01-31'::date))
(2 rows)

Time: 0.527 ms
复制

# 以下分区的创建不连续。
create table test_range_table_1 partition of test_range_table for values from('2024-01-01'::date) to ('2024-04-01'::date);
create table test_range_table_2 partition of test_range_table for values from('2024-04-01'::date) to ('2024-06-30'::date);

postgres=# insert into test_range_table values(floor(random()*1000000+1),generate_series('2024-01-01'::date,'2024-10-29'::date,'1 day'));
ERROR:  no partition of relation "test_range_table" found for row
DETAIL:  Partition key of the failing row contains (order_date) = (2024-03-31).
Time: 0.568 ms
复制

列表分区

列表分区是一种将数据按具体的枚举值划分到不同分区的分区方法。这种方法适用于具有离散值的列（例如分类、状态等），并可以显著优化基于这些离散值的查询性能。以下演示列表分区的创建、数据插入与查询过程：

# 创建list分区表
postgres=# create table test_list_table (id int,product_type varchar(20))partition by list(product_type);
CREATE TABLE

# 创建分区并将分区与父表关联
postgres=# create table test_list_table_a partition of test_list_table for values in('A');
CREATE TABLE
postgres=# create table test_list_table_b partition of test_list_table for values in('B');
CREATE TABLE
postgres=# create table test_list_table_c partition of test_list_table for values in('C');
CREATE TABLE
postgres=# create table test_list_table_d partition of test_list_table for values in('D');
CREATE TABLE
postgres=#

# 查看分区表的表详细结构信息
postgres=# \d+ test_list_table
                                         Partitioned table "public.test_list_table"
    Column    |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 id           | integer               |           |          |         | plain    |             |              |
 product_type | character varying(20) |           |          |         | extended |             |              |
Partition key: LIST (product_type)
Partitions: test_list_table_a FOR VALUES IN ('A'),
            test_list_table_b FOR VALUES IN ('B'),
            test_list_table_c FOR VALUES IN ('C'),
            test_list_table_d FOR VALUES IN ('D')

# 生成模拟数据，将类型 A/B/C/D 进行分区插入
postgres=# insert into test_list_table values(generate_series(1,1000000),'A');
INSERT 0 1000000
postgres=# insert into test_list_table values(generate_series(1000001,2000000),'B');
INSERT 0 1000000
postgres=# insert into test_list_table values(generate_series(2000001,3000000),'C');
INSERT 0 1000000
postgres=# insert into test_list_table values(generate_series(3000001,4000000),'D');
INSERT 0 1000000
postgres=# select count(*) from test_list_table;
  count
---------
 4000000
(1 row)

# 开启执行时间，并测试查询数据时间
postgres=# \timing
Timing is on.
postgres=# select count(1) from test_list_table;
  count
---------
 4000000
(1 row)

Time: 116.986 ms
postgres=# select product_type,count(1) from test_list_table group by product_type;
 product_type |  count
--------------+---------
 A            | 1000000
 B            | 1000000
 C            | 1000000
 D            | 1000000
(4 rows)

Time: 386.041 ms

# 从以下查询计划可以看出，数据是通过分区扫描查询
postgres=# explain select product_type,count(1) from test_list_table group by product_type;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=52043.01..52093.68 rows=200 width=10)
   Group Key: test_list_table.product_type
   ->  Gather Merge  (cost=52043.01..52089.68 rows=400 width=10)
         Workers Planned: 2
         ->  Sort  (cost=51042.99..51043.49 rows=200 width=10)
               Sort Key: test_list_table.product_type
               ->  Partial HashAggregate  (cost=51033.35..51035.35 rows=200 width=10)
                     Group Key: test_list_table.product_type
                     ->  Parallel Append  (cost=0.00..42700.01 rows=1666668 width=2)
                           ->  Parallel Seq Scan on test_list_table_a test_list_table_1  (cost=0.00..8591.67 rows=416667 width=2)
                           ->  Parallel Seq Scan on test_list_table_b test_list_table_2  (cost=0.00..8591.67 rows=416667 width=2)
                           ->  Parallel Seq Scan on test_list_table_c test_list_table_3  (cost=0.00..8591.67 rows=416667 width=2)
                           ->  Parallel Seq Scan on test_list_table_d test_list_table_4  (cost=0.00..8591.67 rows=416667 width=2)
(13 rows)

Time: 0.368 ms

复制

哈希分区

哈希分区是一种基于哈希函数将数据分散到多个分区的方法。与范围分区和列表分区不同，哈希分区没有特定的范围或分类条件，而是通过对分区键进行哈希计算来将数据分布到不同的分区中。哈希分区特别适合于均匀分布的数据集，可以有效地分散负载和提高查询性能。以下演示哈希分区的创建、数据插入与查询过程：

# 创建hash分区表，根据 user_id 进行hash分区
postgres=# create table test_hash_table(user_id int,age int)partition by hash(user_id);
CREATE TABLE
Time: 1.354 ms

# 创建 hash 分区，modulus 表示分区总数，REMAINDER 表示当前分区的余数
postgres=# create table test_hash_table_1 partition of test_hash_table for values with (modulus 4,REMAINDER 0);
CREATE TABLE
Time: 1.407 ms
postgres=#  create table test_hash_table_2 partition of test_hash_table for values with (modulus 4,REMAINDER 1);
CREATE TABLE
Time: 1.162 ms
postgres=#  create table test_hash_table_3 partition of test_hash_table for values with (modulus 4,REMAINDER 2);
CREATE TABLE
Time: 1.130 ms
postgres=#  create table test_hash_table_4 partition of test_hash_table for values with (modulus 4,REMAINDER 3);
CREATE TABLE
Time: 1.321 ms

# 查看分区表的表详细结构信息
postgres=# \d+ test_hash_table
                               Partitioned table "public.test_hash_table"
 Column  |  Type   | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
 user_id | integer |           |          |         | plain   |             |              |
 age     | integer |           |          |         | plain   |             |              |
Partition key: HASH (user_id)
Partitions: test_hash_table_1 FOR VALUES WITH (modulus 4, remainder 0),
            test_hash_table_2 FOR VALUES WITH (modulus 4, remainder 1),
            test_hash_table_3 FOR VALUES WITH (modulus 4, remainder 2),
            test_hash_table_4 FOR VALUES WITH (modulus 4, remainder 3)

# 生成模拟数据，当插入数据时，会根据 user_id % 4 的结果将行放入对应的分区。例如，当 user_id = 5 时，5 % 4 = 1，所以该行数据将插入到 test_hash_table_2 中
postgres=# insert into test_hash_table values(generate_series(1,1000000),floor(random()*50+15));
INSERT 0 1000000
Time: 993.522 ms

postgres=# explain select * from test_hash_table where user_id = 33;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..3948.17 rows=1 width=8)
   Workers Planned: 1
   ->  Parallel Seq Scan on test_hash_table_4 test_hash_table  (cost=0.00..2948.07 rows=1 width=8)
         Filter: (user_id = 33)
(4 rows)

Time: 0.696 ms
复制