PG的并行查询

进击的CJR 2024-11-22

PG学习笔记

PostgreSQL利用多CPU 让查询更快，这种特性被称为并行查询。

并行执行的优势

postgres=# explain analyze select count(*) from t3;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=964311.71..964311.72 rows=1 width=8) (actual time=13581.820..13583.875 rows=1 loops=1)
   ->  Gather  (cost=964311.50..964311.71 rows=2 width=8) (actual time=13567.259..13583.845 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=963311.50..963311.51 rows=1 width=8) (actual time=13559.285..13559.287 rows=1 loops=3)
               ->  Parallel Seq Scan on t3  (cost=0.00..859144.80 rows=41666680 width=0) (actual time=0.036..9740.796 rows=33333333 loops=3)
 Planning Time: 3.170 ms
 Execution Time: 13583.996 ms
(8 rows)

postgres=# set max_parallel_workers_per_gather =0;
SET
postgres=# explain analyze select count(*) from t3;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1692478.40..1692478.41 rows=1 width=8) (actual time=14258.661..14258.661 rows=1 loops=1)
   ->  Seq Scan on t3  (cost=0.00..1442478.32 rows=100000032 width=0) (actual time=0.012..9640.810 rows=100000000 loops=1)
 Planning Time: 0.038 ms
 Execution Time: 14258.685 ms
(4 rows)


复制

第一个sql进行了并行查询，Gather节点位于最顶端，说明整个查询都是并行，如果Gather或Gather Merge位于Plan Tree的中间层，则位于Plan Tree下方的都是并行。第二个sql关闭了并行，在有缓存的情况下，执行依然比第一个并行执行的sql还要慢一些。

并行成本值

postgres=# show parallel_setup_cost ;
 parallel_setup_cost 
---------------------
 1000
(1 row)

postgres=# show parallel_tuple_cost ;
 parallel_tuple_cost 
---------------------
 0.1
(1 row)
复制

并行执行需要的条件

（1）需要动态共享内存DSM

并行执行，需要将多个worker进程结果进行汇总，通信，因此需要一个共享内存。
DSM是Dynamic Shared Memory即动态共享内存。动态共享内存不需要系统启动时预先分配，在执行时动态创建，用于leader与worker间的通信，执行完成后释放。

postgres=# show dynamic_shared_memory_type ;
 dynamic_shared_memory_type 
----------------------------
 posix
(1 row)

复制

DSM的位置如下：posix模式下，使用的是tmpfs 位置在/dev/shm

**执行watch -n 1 ls /dev/shm/ 再psql中执行并行查询，可以看到此目录下的文件增删的情况。**如果这个目录满了，会报错

[root@VM-20-8-centos ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        989M     0  989M   0% /dev
tmpfs          1000M  1.3M  999M   1% /dev/shm
tmpfs          1000M  552K  999M   1% /run
tmpfs          1000M     0 1000M   0% /sys/fs/cgroup
/dev/vda1        50G  8.5G   39G  18% /
tmpfs           200M     0  200M   0% /run/user/0

复制

如果主机内存不够，可以将type改为mmap模式，使用文件的方式存储。

postgres=# select name,enumvals from pg_settings where name ='dynamic_shared_memory_type';
            name            |     enumvals      
----------------------------+-------------------
 dynamic_shared_memory_type | {posix,sysv,mmap}
(1 row)


复制

（2）参数限制

min_parallel_index_scan_size和min_parallel_table_scan_size是使用并行扫描的最小表大小和索引大小的条件。

show min_parallel_index_scan_size ;
 min_parallel_index_scan_size 
------------------------------
 512kB
(1 row)


postgres=# show min_parallel_table_scan_size;
 min_parallel_table_scan_size 
------------------------------
 8MB
(1 row)

复制

并行的参数

max_parallel_workers
设置系统支持的并行查询进程数。不过得小于max_worker_processes的值，因为并行worker是从该max_worker_processes建立的worker进程池中获取的。

parallel_workers
是一个表级的参数，和fillfactor等类似，很容易被忽略，该参数表示表级并行度，指定对表扫描可以启用的工作进程数，默认不设置，依据表的大小计算出一个值，alter table test set (parallel_workers =32);

max_parallel_workers_per_gather
控制一个并行的EXEC NODE最多能开启多少个并行worker，也就是单个Gather节点的并行度。

force_parallel_mode
表示强制开启并行，可以用来模仿Oracle的Hint。

parallel_leader_participation leader是否参与worker的工作。设置为off可以减少工作进程被阻塞的几率

如下设置max_parallel_workers_per_gather为8，则使用了6个并行

postgres=# show max_parallel_workers_per_gather ;
 max_parallel_workers_per_gather 
---------------------------------
 8
(1 row)

postgres=# explain analyze select count(*) from t3;
                                                                 QUERY PLAN       
                                                           
----------------------------------------------------------------------------------
-----------------------------------------------------------
 Finalize Aggregate  (cost=651812.03..651812.04 rows=1 width=8) (actual time=12371
.659..12371.809 rows=1 loops=1)
   ->  Gather  (cost=651811.40..651812.01 rows=6 width=8) (actual time=12363.417..
12371.792 rows=7 loops=1)
         Workers Planned: 6
         Workers Launched: 6
         ->  Partial Aggregate  (cost=650811.40..650811.41 rows=1 width=8) (actual
 time=12347.425..12347.426 rows=1 loops=7)
               ->  Parallel Seq Scan on t3  (cost=0.00..609144.72 rows=16666672 wi
dth=0) (actual time=0.029..9900.161 rows=14285714 loops=7)
 Planning Time: 0.038 ms
 Execution Time: 12371.836 ms
(8 rows)

复制