关于 PolarDB PostgreSQL 版
PolarDB PostgreSQL 版是一款阿里云自主研发的云原生关系型数据库产品,100% 兼容 PostgreSQL,高度兼容Oracle语法;采用基于 Shared-Storage 的存储计算分离架构,具有极致弹性、毫秒级延迟、HTAP 、Ganos全空间数据处理能力和高可靠、高可用、弹性扩展等企业级数据库特性。同时,PolarDB PostgreSQL 版具有大规模并行计算能力,可以应对 OLTP 与 OLAP 混合负载。
简介
外连接消除(left/right/full join)
以left join为例,left join(左连接) 返回包括左表中的所有记录和右表中连接字段相等的记录 ,如果右表没有匹配的记录,那么右表将会以NULL值代替,例如:
A表 B表
ID1 ID2
1 1
2
select * from A left join B on A.id1 = B.id2;
结果如下:
ID1 ID2
1 1
2 NULL复制
存在外连接left join
postgres=> explain select * from t1 left join t2 on true;
QUERY PLAN
-------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..11932.02 rows=952000 width=16)
-> Seq Scan on t1 (cost=0.00..14.52 rows=952 width=8)
-> Materialize (cost=0.00..20.00 rows=1000 width=8)
-> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=8)
(4 rows)复制
消除外连接需要where和join条件保证右表不会有NULL值的行产生。
postgres=> explain select * from t1 left join t2 on t1.b1 = t2.b2 where t2.b2 is not NULL;
QUERY PLAN
---------------------------------------------------------------------
Nested Loop (cost=0.28..23.30 rows=1 width=16)
-> Seq Scan on t2 (cost=0.00..15.00 rows=1 width=8)
Filter: (b2 IS NOT NULL)
-> Index Scan using b1_1 on t1 (cost=0.28..8.29 rows=1 width=8)
Index Cond: (b1 = t2.b2)
(5 rows)复制
条件下推
条件下推的目的为了连接前,元组数组尽量少,如下示例,条件已经下推到每个表上面了。
postgres=> explain select * from t1,t2 where t1.a1 < 10 and t2.a2 > 900;
QUERY PLAN
---------------------------------------------------------------------------------
Nested Loop (cost=0.55..31.20 rows=1000 width=16)
-> Index Scan using t2_a2_key on t2 (cost=0.28..10.03 rows=100 width=8)
Index Cond: (a2 > 900)
-> Materialize (cost=0.28..8.70 rows=10 width=8)
-> Index Scan using t1_a1_key on t1 (cost=0.28..8.65 rows=10 width=8)
Index Cond: (a1 < 10)复制
语义优化
当表中字段存在约束键时,PostgreSQL将会对其进行语义优化,因为查询条件有可能已经隐含满足或者不满足,例如:
create table tt1(id int not null);
postgres=> explain select * from tt1 where id is null;
QUERY PLAN
--------------------------------------------------------
Seq Scan on tt1 (cost=0.00..15407.02 rows=1 width=15)
Filter: (id IS NULL)
set constraint_exclusion = on;
postgres=> explain select * from tt1 where id is null;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false复制
表tt1的id字段已经隐含了不为NULL,所以id=null这种条件可以直接返回false,PostgreSQL数据库默认并没有开启约束优化,需要设置constraint_exclusion这个参数。
MIN/MAX优化
min/max函数在应用的使用中是非常广泛的,数据库有必要对其进行特殊优化,比如索引中已经将数据排好序了,最大最小值可以直接获取到,所以PostgreSQL对min/max函数做了一步转化。
select min(a1) from t1 转化为 select a1 from t1 order by a1 limit 1;
如果a1没有索引,那么将会是顺序扫描,不进行转化。
postgres=> explain select min(a1) from t1;
QUERY PLAN
------------------------------------------------------------------------------------------
Result (cost=0.32..0.33 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.28..0.32 rows=1 width=4)
-> Index Only Scan using t1_a1_key on t1 (cost=0.28..45.09 rows=952 width=4)
Index Cond: (a1 IS NOT NULL)复制
group by优化
如果不对group by优化,那么将会需要对结果进行Sort或者Hash,但是如果表中数据已经是排序好的,那么将可以对其进行优化。
create index tt1_id_key on tt1 using btree ( id);
postgres=> explain select id from tt1 group by id;
QUERY PLAN
-------------------------------------------------------------------------------------------
Group (cost=0.42..33891.21 rows=1000102 width=4)
Group Key: id
-> Index Only Scan using tt1_id_key on tt1 (cost=0.42..31390.96 rows=1000102 width=4)
postgres=> explain select name from tt1 group by name;
QUERY PLAN
--------------------------------------------------------------------------
Group (cost=132169.76..137170.27 rows=1000102 width=11)
Group Key: name
-> Sort (cost=132169.76..134670.02 rows=1000102 width=11)
Sort Key: name
-> Seq Scan on tt1 (cost=0.00..15407.02 rows=1000102 width=11)复制
order by优化
利用索引消除order by
postgres=> explain select * from t1 order by a1;
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using t1_a1_key on t1 (cost=0.28..42.71 rows=952 width=8)
(1 row)复制
order by下推,利用merge join实现更快的连接
postgres=> explain select * from t1,t2 where t1.b1=t2.b2 order by b1;
QUERY PLAN
------------------------------------------------------------------
Merge Join (cost=126.45..136.22 rows=1 width=16)
Merge Cond: (t1.b1 = t2.b2)
-> Sort (cost=61.62..64.00 rows=952 width=8)
Sort Key: t1.b1
-> Seq Scan on t1 (cost=0.00..14.52 rows=952 width=8)
-> Sort (cost=64.83..67.33 rows=1000 width=8)
Sort Key: t2.b2
-> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=8)
(8 rows)复制
distinct优化
类似于group by优化,distinct将会从Sort和Hash中选择最优的,如果字段中有索引,Sort代价可能会更低。
postgres=> explain select distinct(a1) from t1;
QUERY PLAN
-----------------------------------------------------------
HashAggregate (cost=16.90..26.42 rows=952 width=4)
Group Key: a1
-> Seq Scan on t1 (cost=0.00..14.52 rows=952 width=4)
(3 rows)
postgres=> explain select distinct(name) from tt1;
QUERY PLAN
--------------------------------------------------------------------------
Unique (cost=132169.76..137170.27 rows=1000102 width=11)
-> Sort (cost=132169.76..134670.02 rows=1000102 width=11)
Sort Key: name
-> Seq Scan on tt1 (cost=0.00..15407.02 rows=1000102 width=11)复制
集合操作优化
集合操作union被转换成Append方式。
postgres=> explain select a1 from t1 where a1 < 10 union select a2 from t2;
QUERY PLAN
--------------------------------------------------------------------------------------
HashAggregate (cost=36.28..46.38 rows=1010 width=4)
Group Key: t1.a1
-> Append (cost=0.28..33.75 rows=1010 width=4)
-> Index Only Scan using t1_a1_key on t1 (cost=0.28..8.65 rows=10 width=4)
Index Cond: (a1 < 10)
-> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=4)
postgres=> explain select a1 from t1 where a1 < 10 union all select a2 from t2;
QUERY PLAN
--------------------------------------------------------------------------------
Append (cost=0.28..23.75 rows=1010 width=4)
-> Index Only Scan using t1_a1_key on t1 (cost=0.28..8.65 rows=10 width=4)
Index Cond: (a1 < 10)
-> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=4)复制
总结
以上介绍了几种常见的PostgreSQL优化器对SQL优化的方法,这些方法更着重于SQL逻辑优化,也就是尽量对SQL进行等价或者推倒变换,以达到更有的执行计划。PostgreSQL优化器原理远不止这些,比如表的扫描方式选择、多表组合方式、多表组合顺序等,这些内容将会在后续的月报中继续呈现。