PostgreSQL 17增量排序

原创彭冲 2024-09-13

580

增量排序(Incremental Sort)的功能是从PostgreSQL 13引入的，它的使用场景是多字段组合排序，可以加速数据的排序。其原理是借助有序的索引来实现。

例如一张表中有两个字段a和b，其中a字段上建立了索引，缺省的B-tree索引是有序的，字段a已经预先排好了序，那同时对a和b的排序只需要在a的基础上再对b进行排序即可，这就是增量排序。

下面来观察PostgreSQL 13增量排序的示例，首先创建一张表，插入测试数据后对字段a建立B-tree索引。

CREATE TABLE incresort_test(a int,b int);

INSERT INTO incresort_test(a,b) 
  SELECT n,round(random()*100000000) 
    FROM generate_series(1,1000000) n;

CREATE INDEX idx_incresort_test ON incresort_test USING btree(a);
复制

然后执行下面的查询，按字段a和b同时进行排序。

postgres=# EXPLAIN  SELECT * FROM incresort_test ORDER BY a,b;
              QUERY PLAN                                      
---------------------------------
 Incremental Sort  (cost=0.47..75408.43 rows=1000000 width=8)
   Sort Key: a, b
   Presorted Key: a
   ->  Index Scan using idx_incresort_test on incresort_test  (cost=0.42..30408.42 rows=1000000 width=8)
(4 rows)
复制

从上面的执行计划重点关注Presorted Key:a和Incremental Sort这两行，可以看到PostgreSQL使用a字段的idx_incresort_test索引对b字段进行了增量排序。接着再观察真实执行计划，执行时间为222毫秒。

postgres=# EXPLAIN ANALYZE SELECT * FROM incresort_test ORDER BY a,b;
              QUERY PLAN                                      
---------------------------------
 Incremental Sort  (cost=0.47..75408.43 rows=1000000 width=8) (actual time==0.036..184.226 rows=1000000 loops=1)
   Sort Key: a, b
   Presorted Key: a
   Full-sort Groups: 31250  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
   ->  Index Scan using idx_incresort_test on incresort_test  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.019..81.695 rows=1000000 loops=1)
 Planning Time: 0.085 ms
 Execution Time: 222.630 ms
(7 rows)
复制

PostgreSQL 13增量排序的功能通过enable_incremental_sort参数控制，默认值为on，我们可以设置为off之后，观察PostgreSQL 13版本之前的行为。

postgres=# SET enable_incremental_sort TO off;
SET
复制

关闭增量排序之后，再观察执行计划。

postgres=# EXPLAIN ANALYZE SELECT * FROM incresort_test ORDER BY a,b;
              QUERY PLAN                                      
---------------------------------
 Sort  (cost=127757.34..130257.34 rows=1000000 width=8) (actual time=193.864..255.397 rows=1000000 loops=1)
   Sort Key: a, b
   Sort Method: external merge  Disk: 17696kB
   ->  Seq Scan on incresort_test  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.091..78.077 rows=1000000 loops=1)
 Planning Time: 0.780 ms
 Execution Time: 291.258 ms
(6 rows)
复制

执行计划使用常规的Sort排序机制，执行需要的时间为291毫秒。对比使用增量排序，执行时间有所节省，而且传统的Sort机制必须对整个数据集进行排序，内存使用较大。

在PostgreSQL 17里，增量排序不只局限于B-tree索引，增加了对GiST和SP-GiST索引的支持：对于查询语句中包含ORDER BY和LIMIT子句，排序的第一列具有GiST或SP-GiST索引，也能使用增量排序功能。

对GiST索引的增量排序测试，可以使用下面网址的模拟测试数据：

https://postgrespro.com/community/demodb

本文下载中等大小的数据文件demo-medium-en.zip，解压导入数据库后大约700多MB。

对airports表建立GiST索引：

CREATE INDEX ON airports_data USING gist (coordinates);
复制

模拟的场景：从离给定点最近的机场(例如，坐标为10和50)按出发日期排序找到最近的10个航班。

查看语句的执行计划：

EXPLAIN (costs off)
SELECT f.*
FROM flights f
JOIN airports a ON (f.departure_airport=a.airport_code)
ORDER BY point(10,50) <-> a.coordinates, f.scheduled_departure
LIMIT 10;
复制

执行计划显示如下：

              QUERY PLAN                                      
---------------------------------
 Limit
   ->  Incremental Sort
         Sort Key: (('(10,50)'::point <-> ml.coordinates)), f.scheduled_departure
         Presorted Key: (('(10,50)'::point <-> ml.coordinates))
         ->  Nested Loop
               Join Filter: (ml.airport_code = f.departure_airport)
               ->  Index Scan using airports_data_coordinates_idx on airports_data ml
                     Order By: (coordinates <-> '(10,50)'::point)
               ->  Materialize
                     ->  Seq Scan on flights f
(10 rows)
复制

从执行计划第2行能看到Incremental Sort算子。