利用PolarDB-PG HTAP加速TPC-H（三）

PolarDB农夫山泉 2023-09-20

PolarDB PostgreSQL版（以下简称 PolarDB-PG）是一款阿里云自主研发的企业级数据库产品，采用计算存储分离架构，兼容 PostgreSQL 与 Oracle。PolarDB-PG 的存储与计算能力均可横向扩展，具有高可靠、高可用、弹性扩展等企业级数据库特性。同时，PolarDB-PG 具有大规模并行计算能力，可以应对 OLTP 与 OLAP 混合负载；还具有时空、向量、搜索、图谱等多模创新特性，可以满足企业对数据处理日新月异的新需求。
本节介绍利用PolarDB PostgreSQL版的HTAP能力加速TPC-H的执行的最佳实践案例。本案例将基于单机本地存储来运行。

执行 PolarDB HTAP 跨机并行查询

在体验完单机并行查询后，我们开启跨机并行查询。然后使用相同的数据，重新体验一下查询性能。

在 psql 后，执行如下命令，开启计时（若已开启，可跳过）。

\timing
复制

执行如下命令，开启跨机并行查询（PX）。

set polar_enable_px=on;
复制

设置每个节点的并行度为 1。

set polar_px_dop_per_node=1;
复制

执行如下命令，查看执行计划。

\i queries/q18.explain.sql
复制

该引擎集群带有 2 个 RO 节点，开启 PX 后默认并行度为 2x1=2 个：

                                                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit  (cost=0.00..93628.34 rows=100 width=47)
->  PX Coordinator 2:1  (slice1; segments: 2)  (cost=0.00..93628.33 rows=100 width=47)
      Merge Key: orders.o_totalprice, orders.o_orderdate
      ->  Limit  (cost=0.00..93628.31 rows=50 width=47)
            ->  GroupAggregate  (cost=0.00..93628.31 rows=11995940 width=47)
                  Group Key: orders.o_totalprice, orders.o_orderdate, customer.c_name, customer.c_custkey, orders.o_orderkey
                  ->  Sort  (cost=0.00..92784.19 rows=11995940 width=44)
                        Sort Key: orders.o_totalprice DESC, orders.o_orderdate, customer.c_name, customer.c_custkey, orders.o_orderkey
                        ->  Hash Join  (cost=0.00..22406.63 rows=11995940 width=44)
                              Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                              ->  PX Hash 2:2  (slice2; segments: 2)  (cost=0.00..4301.49 rows=29989848 width=9)
                                    Hash Key: lineitem.l_orderkey
                                    ->  Partial Seq Scan on lineitem  (cost=0.00..2954.65 rows=29989848 width=9)
                              ->  Hash  (cost=10799.35..10799.35 rows=83024 width=39)
                                    ->  PX Hash 2:2  (slice3; segments: 2)  (cost=0.00..10799.35 rows=83024 width=39)
                                          Hash Key: orders.o_orderkey
                                          ->  Hash Join  (cost=0.00..10789.21 rows=83024 width=39)
                                                Hash Cond: (customer.c_custkey = orders.o_custkey)
                                                ->  PX Hash 2:2  (slice4; segments: 2)  (cost=0.00..597.52 rows=750040 width=23)
                                                      Hash Key: customer.c_custkey
                                                      ->  Partial Seq Scan on customer  (cost=0.00..511.44 rows=750040 width=23)
                                                ->  Hash  (cost=9993.50..9993.50 rows=83024 width=20)
                                                      ->  PX Hash 2:2  (slice5; segments: 2)  (cost=0.00..9993.50 rows=83024 width=20)
                                                            Hash Key: orders.o_custkey
                                                            ->  Hash Semi Join  (cost=0.00..9988.30 rows=83024 width=20)
                                                                  Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
                                                                  ->  Partial Seq Scan on orders  (cost=0.00..1020.90 rows=7500272 width=20)
                                                                  ->  Hash  (cost=7256.00..7256.00 rows=166047 width=4)
                                                                        ->  PX Broadcast 2:2  (slice6; segments: 2)  (cost=0.00..7256.00 rows=166047 width=4)
                                                                              ->  Result  (cost=0.00..7238.62 rows=83024 width=4)
                                                                                    Filter: ((sum(lineitem_1.l_quantity)) > '313'::numeric)
                                                                                    ->  Finalize HashAggregate  (cost=0.00..7231.79 rows=207559 width=12)
                                                                                          Group Key: lineitem_1.l_orderkey
                                                                                          ->  PX Hash 2:2  (slice7; segments: 2)  (cost=0.00..7205.20 rows=207559 width=12)
                                                                                                Hash Key: lineitem_1.l_orderkey
                                                                                                ->  Partial HashAggregate  (cost=0.00..7197.41 rows=207559 width=12)
                                                                                                      Group Key: lineitem_1.l_orderkey
                                                                                                      ->  Partial Seq Scan on lineitem lineitem_1  (cost=0.00..2954.65 rows=29989848 width=9)
Optimizer: PolarDB PX Optimizer
(39 rows)
复制

执行 SQL：

\i queries/q18.sql
复制

可以看到部分结果（按 q 不查看全部结果）和运行时间，运行时间为 1 分钟，比单机并行的结果降低了 27.71% 的运行时间。如有兴趣，也可加大并行度或者数据量查看提升程度。

跨机并行查询会去获取全局一致性视图，因此得到的数据是一致的，无需担心数据正确性。可以通过如下方式手动设置跨机并行查询的并行度：

set polar_px_dop_per_node = 1;
\i queries/q18.sql

set polar_px_dop_per_node = 2;
\i queries/q18.sql

set polar_px_dop_per_node = 4;
\i queries/q18.sql
复制

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

PolarDB-PG介绍（共76篇）

关于阿里云开源PolarDB—PG的内容介绍

利用PolarDB-PG HTAP加速TPC-H（三）

执行 PolarDB HTAP 跨机并行查询

文章被以下合辑收录

评论

相关阅读