暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

感受greenplum下semi join和inner join性能差异

原创 姚崇 2023-09-18
316

greenplum下的慢SQL

SQL如下:
其中JYDB.MF_FundNetValueReTrans B, JYDB.MF_FundManagerRank_V_TMP A 两个表分别按照 InnerCode和Code做分布式键
最后分析是其中的子查询慢

SELECT InnerCode, PersonalCode, AccessionDate, RANK AS TypeRank, SL AS TypeNumber, Ret as Performance FROM ( SELECT * FROM ( SELECT RANK() OVER (PARTITION BY A.InnerCode,A.PersonalCode ORDER BY (B.UnitNVRestored/ NULLIF(C.UnitNVRestored, 0) - 1) DESC) AS RANK, COUNT(A.Code) OVER (PARTITION BY A.InnerCode,A.PersonalCode) AS SL, A.* FROM JYDB.MF_FundManagerRank_V_TMP A left JOIN ( SELECT MIN(B.UnitNVRestored) AS UnitNVRestored, B.InnerCode FROM JYDB.MF_FundNetValueReTrans B, JYDB.MF_FundManagerRank_V_TMP A WHERE B.InnerCode = A.Code and B.TradingDay >= A.BeginDay GROUP BY B.InnerCode ) B ON B.InnerCode = A.Code left JOIN ( SELECT MAX(C.UnitNVRestored) AS UnitNVRestored, C.InnerCode FROM JYDB.MF_FundNetValueReTrans C, JYDB.MF_FundManagerRank_V_TMP A WHERE C.InnerCode = A.Code and C.TradingDay <= A.EndDay GROUP BY C.InnerCode ) C ON C.InnerCode = A.Code WHERE B.UnitNVRestored IS NOT NULL AND C.UnitNVRestored IS NOT NULL ) D where Code = InnerCode ) E;

数据分布如下

jydb=# select count(*) from JYDB.MF_FundManagerRank_V_TMP A; count ----------- 112295234 (1 row) jydb=# select count(*) from jydb.MF_FundNetValueReTrans; count ---------- 17404235 (1 row) jydb=# select count(*),InnerCode from JYDB.MF_FundNetValueReTrans group by InnerCode; count | innercode -------+----------- 158 | 62015 107 | 327622 123 | 427851 182 | 427243 3354 | 7654 ... ... 796 | 231901 627 | 76371 230 | 393340 1013 | 78029 ... ...

使用inner join时的执行计划(跑不出结果)

explain SELECT MIN(B.UnitNVRestored) AS UnitNVRestored, B.InnerCode FROM JYDB.MF_FundNetValueReTrans B, JYDB.MF_FundManagerRank_V_TMP A WHERE B.InnerCode = A.Code and B.TradingDay >= A.BeginDay GROUP BY B.InnerCode; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Gather Motion 16:1 (slice1; segments: 16) (cost=51550238.44..51550392.51 rows=13695 width=36) -> HashAggregate (cost=51550238.44..51550247.00 rows=856 width=36) Group Key: b.innercode -> Hash Join (cost=28251.94..36589535.58 rows=2992140571 width=11) Hash Cond: (a.code = b.innercode) Join Filter: (b.tradingday >= a.beginday) -> Seq Scan on mf_fundmanagerrank_v_tmp a (cost=0.00..129050.94 rows=7019894 width=12) -> Hash (cost=14654.75..14654.75 rows=1087775 width=19) -> Seq Scan on mf_fundnetvalueretrans b (cost=0.00..14654.75 rows=1087775 width=19) Optimizer: Postgres query optimizer

semi join的执行计划(10s出结果)

执行计划没太大区别

explain SELECT MIN(B.UnitNVRestored) AS UnitNVRestored, B.InnerCode FROM JYDB.MF_FundNetValueReTrans B WHERE exists(select 1 from JYDB.MF_FundManagerRank_V_TMP A WHERE B.InnerCode = A.Code and B.TradingDay >= A.BeginDay) GROUP BY B.InnerCode; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Gather Motion 16:1 (slice1; segments: 16) (cost=239896.91..240050.98 rows=13695 width=36) -> HashAggregate (cost=239896.91..239905.47 rows=856 width=36) Group Key: b.innercode -> Hash Semi Join (cost=216799.61..238495.01 rows=280380 width=11) Hash Cond: (b.innercode = a.code) Join Filter: (b.tradingday >= a.beginday) -> Seq Scan on mf_fundnetvalueretrans b (cost=0.00..14654.75 rows=1087775 width=19) -> Hash (cost=129050.94..129050.94 rows=7019894 width=12) -> Seq Scan on mf_fundmanagerrank_v_tmp a (cost=0.00..129050.94 rows=7019894 width=12) Optimizer: Postgres query optimizer (10 rows) set optimizer =on; jydb=# explain SELECT MIN(B.UnitNVRestored) AS UnitNVRestored, B.InnerCode FROM JYDB.MF_FundNetValueReTrans B WHERE exists(select 1 from JYDB.MF_FundManagerRank_V_TMP A WHERE B.InnerCode = A.Code and B.TradingDay >= A.BeginDay) GROUP BY B.InnerCode; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Gather Motion 16:1 (slice1; segments: 16) (cost=0.00..7212.44 rows=9386 width=12) -> HashAggregate (cost=0.00..7212.16 rows=587 width=12) Group Key: mf_fundnetvalueretrans.innercode -> Hash Semi Join (cost=0.00..7207.89 rows=35157 width=11) Hash Cond: (mf_fundnetvalueretrans.innercode = mf_fundmanagerrank_v_tmp.code) Join Filter: (mf_fundnetvalueretrans.tradingday >= mf_fundmanagerrank_v_tmp.beginday) -> Seq Scan on mf_fundnetvalueretrans (cost=0.00..494.42 rows=1087776 width=19) -> Hash (cost=1660.32..1660.32 rows=7019894 width=12) -> Seq Scan on mf_fundmanagerrank_v_tmp (cost=0.00..1419.40 rows=7019894 width=12) Optimizer: Pivotal Optimizer (GPORCA) (10 rows)

结论

应该是快在了semi join本身只要A表匹配到一条数据即可,无需像inner join A和B两个表都要取出来

最后修改时间:2023-09-19 13:44:49
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论