postgres Rows估算错误导致SQL性能偏差

godba 2024-03-11

362

背景：

在日常隐患巡检中发现的可疑SQL。隐患比较隐藏，其实源SQL更长，隐藏的更深。特作文记录。为了方便读者阅读，已把SQL简化，突出其隐藏部分。

还是一贯作风：现象说明，官方理论，数据证明，实施效果，一一说明。希望方便读者阅读。

SELECT
ROUND( SUM (COALESCE(station_month.maintain_labor_cost,0) + ............
COALESCE(station_month.employee_compensation,0)), 2) AS totalCost,
ROUND( SUM ( info.ast_original_value ), 2 ) AS totalAstValue
FROM
powerX.cost_XXXXX_pl_info info
INNER JOIN powerX.cost_XXXXX_pl_month station_month ON info.psr_id = station_month.psr_id
AND info.ast_id = station_month.ast_id
where  info.professional_type = '01'
AND info.equip_type = 'xl'
AND station_month.sun_year_month in ( '202301', '202302','202303','202304'  )

执行计划：

Aggregate (cost=4938.02..4938.03 rows=1 width=64) (actual time=85.167..85.169 rows=1 loops=1)
Buffers: shared hit=44893
->  Nested Loop  (cost=13.38..4937.97 rows=1 width=126) (actual time=0.720..82.459 rows=1343 loops=1)
Buffers: shared hit=44893
->  Bitmap Heap Scan on cost_XXXXX_pl_info info  (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)
Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
Heap Blocks: exact=764
Buffers: shared hit=786
->  Bitmap Index Scan on idx_cost_XXXXX_pl_info_equip_type_professional  (cost=0.00..12.85 rows=443 width=0) (actual time=0.614..0.614 rows=7013 loops=1)
Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
Buffers: shared hit=22
->  Index Scan using idx_cost_XXXXX_pl_month_idx_prsid_astid on cost_XXXXX_pl_month station_month  (cost=0.42..7.93 rows=1 width=176) (actual time=0.011..0.011 rows=0 loops=7013)
Index Cond: (((psr_id)::text = (info.psr_id)::text) AND ((ast_id)::text = (info.ast_id)::text))
Filter: ((sun_year_month)::text = ANY ('{202301,202302,202303,202304}'::text[]))
Rows Removed by Filter: 3
Buffers: shared hit=44107
Planning Time: 0.667 ms

发现“隐患”

Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)
Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
Heap Blocks: exact=764
Buffers: shared hit=786

其中 rows=443 VS rows=7013，隐藏比较深。毕竟就是2个数字很难直接定位到。其实源SQL执行计划更长，需要再一屏甚至更长的执行计划中发现隐患还是比较困难的。

疑问：为什么估算的rows会差异如此之多？是不是本次统计信息过期导致rows评估偏差过多？经过排查后发现统计信息并没有过期。根据SQL条件测试分别单独测试两个条件：

(((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

执行计划1

Bitmap Heap Scan on cost_XXXXX_pl_info info  (cost=129.15..7439.87 rows=6804 width=723) (actual time=219.819..221.361 rows=7013 loops=1)
Recheck Cond: ((equip_type)::text = 'xl'::text)
Heap Blocks: exact=764
Buffers: shared hit=764 read=22
I/O Timings: read=219.260
->  Bitmap Index Scan on idx_cost_XXXXX_pl_info_equip_type_professional  (cost=0.00..127.45 rows=6804 width=0) (actual time=219.735..219.736 rows=7013 loops=1)
Index Cond: ((equip_type)::text = 'xl'::text)
Buffers: shared read=22
I/O Timings: read=219.260
Planning Time: 0.126 ms
Execution Time: 221.680 ms

执行计划2

Seq Scan on cost_XXXXX_pl_info info  (cost=0.00..8328.08 rows=6804 width=723) (actual time=0.007..22.252 rows=7014 loops=1)
Filter: ((professional_type)::text = '01'::text)
Rows Removed by Filter: 97392
Buffers: shared hit=7023
Planning Time: 0.117 ms
Execution Time: 22.569 ms

发现估算的rows 精确度都比较高，为啥两个一起组合rows偏差如此之大呢？

查看rows计算逻辑：

with taba as (
select   schemaname, tablename, attname, null_frac,     array_to_string(  most_common_vals  ,',')   most_common_vals ,
array_to_string(  most_common_freqs  ,',')   most_common_freqs
from   pg_stats where tablename = 'cost_XXXXX_pl_info' and attname in( 'professional_type','equip_type')
) select  schemaname, tablename, t.reltuples,  attname, null_frac,    
   regexp_split_to_table(most_common_vals, ',') vals,  regexp_split_to_table(most_common_freqs, ',')  cardinality,
    round( regexp_split_to_table(most_common_freqs, ',')::float *  (t.reltuples  - null_frac))   rowss 
    from  taba  
inner join pg_class t on taba.tablename = t.relname  and  t.relnamespace::regnamespace::text = taba.schemaname;

查询结果：

Schemaname  | tablename  |reltuples    |attname   |     null_frac |   vals |  most_common_freqs  |rowss 
powerX  cost_XXXXX_pl_info  104406  equip_type      0  dytq  0.7334    76571
powerX  cost_XXXXX_pl_info  104406  equip_type      0  dkx  0.1727    18031
powerX  cost_XXXXX_pl_info  104406  equip_type      0  xl  0.06516667  6804
powerX  cost_XXXXX_pl_info  104406  equip_type      0  zf01  0.019         1984
powerX  cost_XXXXX_pl_info  104406  equip_type      0  zf06  0.0059666666  623
powerX  cost_XXXXX_pl_info  104406  equip_type      0  zf08  0.0025    261
powerX  cost_XXXXX_pl_info  104406  equip_type      0  zf04  0.0010333334  108
powerX  cost_XXXXX_pl_info  104406  professional_type  0  03  0.91566664  95601
powerX  cost_XXXXX_pl_info  104406  professional_type  0  01  0.06516667  6804
powerX  cost_XXXXX_pl_info  104406  professional_type  0  02  0.0191    1994

而结果 6804 正是单一条件的执行计划中估算rows。

计算方式：(reltuples - null_frac ) * most_common_freqs

104406 * 0.06516667 = 6804

同理两个条件的rows估算逻辑

104406 * 0.06516667 * 0.06516667 = 443 正是两个过滤条件的rows

Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)
Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
Heap Blocks: exact=764
Buffers: shared hit=786

优化方式：

如何纠正rows评估差异问题？根据官方文档通过创建扩展统计信息创建扩展统计信息：

create statistics  stat_p_e  ( dependencies  )  on   professional_type, equip_type   from  powerX.cost_XXXXX_pl_info;

收集统计信息：Analyse powerX.cost_XXXXX_pl_info;

查看两个过滤条件的执行计划：之后发现执行计划符合预期。

---- 正确的执行计划 
Bitmap Heap Scan on cost_stationline_pl_info info (cost=146.09..7472.62 rows=6797 width=723) (actual time=0.730..2.339 rows=7013 loops=1)
Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
Heap Blocks: exact=764
Buffers: shared hit=786
->  Bitmap Index Scan on idx_cost_stationline_pl_info_equip_type_professional  (cost=0.00..144.39 rows=6797 width=0) (actual time=0.648..0.649 rows=7013 loops=1)
Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
Buffers: shared hit=22
Planning Time: 0.144 ms
Execution Time: 2.657 ms

优化成果： Hash Join

Aggregate  (cost=14662.17..14662.18 rows=1 width=64) (actual time=16.336..16.339 rows=1 loops=1)
Buffers: shared hit=1236 
->  Hash Join  (cost=7574.99..14662.12 rows=1 width=126) (actual time=11.282..13.865 rows=1343 loops=1)
Hash Cond: (((station_month.psr_id)::text = (info.psr_id)::text) AND ((station_month.ast_id)::text = (info.ast_id)::text))
Buffers: shared hit=1236 
->  Index Scan using cost_stationline_pl_month_idx_sun_year_month on cost_stationline_pl_month station_month  (cost=0.42..7070.73 rows=3202 width=176) (actual time=0.063..1.184 rows=3235 loops=1)
Index Cond: ((sun_year_month)::text = ANY ('{202301,202302,202303,202304}'::text[]))
Buffers: shared hit=450
I/O Timings: read=290.886
->  Hash  (cost=7472.62..7472.62 rows=6797 width=78) (actual time=11.190..11.191 rows=7013 loops=1)
Buckets: 8192  Batches: 1  Memory Usage: 620kB
Buffers: shared hit=786
->  Bitmap Heap Scan on cost_stationline_pl_info info  (cost=146.09..7472.62 rows=6797 width=78) (actual time=2.371..8.550 rows=7013 loops=1)
Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
Heap Blocks: exact=764
Buffers: shared hit=786
->  Bitmap Index Scan on idx_cost_stationline_pl_info_equip_type_professional  (cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)
Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
Buffers: shared hit=22
Planning Time: 1.853 ms
Execution Time: 16.453 ms

效果对比：

优化前：(cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)

优化后：(cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)

优化前：执行计划：Nested Loop

优化后：执行计划：Hash Join

优化前：执行成本： shared hit=44893 85ms

优化后：执行计划： shared hit=1236 16ms

	Rows评估	执行计划	执行成本
优化前	(cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)	Nested Loop	shared hit=44893 85ms
优化后	(cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)	Hash join	shared hit=1236 16ms

文章转载自godba，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

postgres Rows估算错误导致SQL性能偏差

发现“隐患”

优化方式：

优化成果： Hash Join

评论