greenplum下的慢SQL
SQL如下:
其中JYDB.MF_FundNetValueReTrans B, JYDB.MF_FundManagerRank_V_TMP A 两个表分别按照 InnerCode和Code做分布式键
最后分析是其中的子查询慢
SELECT InnerCode, PersonalCode, AccessionDate, RANK AS TypeRank, SL AS TypeNumber, Ret as Performance
FROM (
SELECT * FROM (
SELECT
RANK() OVER (PARTITION BY A.InnerCode,A.PersonalCode ORDER BY (B.UnitNVRestored/ NULLIF(C.UnitNVRestored, 0) - 1) DESC) AS RANK,
COUNT(A.Code) OVER (PARTITION BY A.InnerCode,A.PersonalCode) AS SL,
A.* FROM JYDB.MF_FundManagerRank_V_TMP A
left JOIN (
SELECT MIN(B.UnitNVRestored) AS UnitNVRestored, B.InnerCode
FROM JYDB.MF_FundNetValueReTrans B, JYDB.MF_FundManagerRank_V_TMP A
WHERE B.InnerCode = A.Code and B.TradingDay >= A.BeginDay GROUP BY B.InnerCode
) B ON B.InnerCode = A.Code
left JOIN (
SELECT MAX(C.UnitNVRestored) AS UnitNVRestored, C.InnerCode
FROM JYDB.MF_FundNetValueReTrans C, JYDB.MF_FundManagerRank_V_TMP A
WHERE C.InnerCode = A.Code and C.TradingDay <= A.EndDay GROUP BY C.InnerCode
) C ON C.InnerCode = A.Code
WHERE B.UnitNVRestored IS NOT NULL AND C.UnitNVRestored IS NOT NULL
) D where Code = InnerCode
) E;
数据分布如下
jydb=# select count(*) from JYDB.MF_FundManagerRank_V_TMP A;
count
-----------
112295234
(1 row)
jydb=# select count(*) from jydb.MF_FundNetValueReTrans;
count
----------
17404235
(1 row)
jydb=# select count(*),InnerCode from JYDB.MF_FundNetValueReTrans group by InnerCode;
count | innercode
-------+-----------
158 | 62015
107 | 327622
123 | 427851
182 | 427243
3354 | 7654
... ...
796 | 231901
627 | 76371
230 | 393340
1013 | 78029
... ...
使用inner join时的执行计划(跑不出结果)
explain
SELECT MIN(B.UnitNVRestored) AS UnitNVRestored, B.InnerCode
FROM JYDB.MF_FundNetValueReTrans B, JYDB.MF_FundManagerRank_V_TMP A
WHERE B.InnerCode = A.Code and B.TradingDay >= A.BeginDay GROUP BY B.InnerCode;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Gather Motion 16:1 (slice1; segments: 16) (cost=51550238.44..51550392.51 rows=13695 width=36)
-> HashAggregate (cost=51550238.44..51550247.00 rows=856 width=36)
Group Key: b.innercode
-> Hash Join (cost=28251.94..36589535.58 rows=2992140571 width=11)
Hash Cond: (a.code = b.innercode)
Join Filter: (b.tradingday >= a.beginday)
-> Seq Scan on mf_fundmanagerrank_v_tmp a (cost=0.00..129050.94 rows=7019894 width=12)
-> Hash (cost=14654.75..14654.75 rows=1087775 width=19)
-> Seq Scan on mf_fundnetvalueretrans b (cost=0.00..14654.75 rows=1087775 width=19)
Optimizer: Postgres query optimizer
semi join的执行计划(10s出结果)
执行计划没太大区别
explain
SELECT MIN(B.UnitNVRestored) AS UnitNVRestored, B.InnerCode
FROM JYDB.MF_FundNetValueReTrans B WHERE exists(select 1 from JYDB.MF_FundManagerRank_V_TMP A
WHERE B.InnerCode = A.Code and B.TradingDay >= A.BeginDay) GROUP BY B.InnerCode;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Gather Motion 16:1 (slice1; segments: 16) (cost=239896.91..240050.98 rows=13695 width=36)
-> HashAggregate (cost=239896.91..239905.47 rows=856 width=36)
Group Key: b.innercode
-> Hash Semi Join (cost=216799.61..238495.01 rows=280380 width=11)
Hash Cond: (b.innercode = a.code)
Join Filter: (b.tradingday >= a.beginday)
-> Seq Scan on mf_fundnetvalueretrans b (cost=0.00..14654.75 rows=1087775 width=19)
-> Hash (cost=129050.94..129050.94 rows=7019894 width=12)
-> Seq Scan on mf_fundmanagerrank_v_tmp a (cost=0.00..129050.94 rows=7019894 width=12)
Optimizer: Postgres query optimizer
(10 rows)
set optimizer =on;
jydb=# explain
SELECT MIN(B.UnitNVRestored) AS UnitNVRestored, B.InnerCode
FROM JYDB.MF_FundNetValueReTrans B WHERE exists(select 1 from JYDB.MF_FundManagerRank_V_TMP A
WHERE B.InnerCode = A.Code and B.TradingDay >= A.BeginDay) GROUP BY B.InnerCode;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Gather Motion 16:1 (slice1; segments: 16) (cost=0.00..7212.44 rows=9386 width=12)
-> HashAggregate (cost=0.00..7212.16 rows=587 width=12)
Group Key: mf_fundnetvalueretrans.innercode
-> Hash Semi Join (cost=0.00..7207.89 rows=35157 width=11)
Hash Cond: (mf_fundnetvalueretrans.innercode = mf_fundmanagerrank_v_tmp.code)
Join Filter: (mf_fundnetvalueretrans.tradingday >= mf_fundmanagerrank_v_tmp.beginday)
-> Seq Scan on mf_fundnetvalueretrans (cost=0.00..494.42 rows=1087776 width=19)
-> Hash (cost=1660.32..1660.32 rows=7019894 width=12)
-> Seq Scan on mf_fundmanagerrank_v_tmp (cost=0.00..1419.40 rows=7019894 width=12)
Optimizer: Pivotal Optimizer (GPORCA)
(10 rows)
结论
应该是快在了semi join本身只要A表匹配到一条数据即可,无需像inner join A和B两个表都要取出来
最后修改时间:2023-09-19 13:44:49
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




