暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

postgres Rows估算错误导致SQL性能偏差

godba 2024-03-11
362

背景:

在日常隐患巡检中发现的可疑SQL。 隐患比较隐藏,其实源SQL更长,隐藏的更深。 特作文记录。为了方便读者阅读,已把SQL简化,突出其隐藏部分。
还是一贯作风:现象说明,官方理论,数据证明,实施效果,一一说明。希望方便读者阅读。 
    SELECT
    ROUND( SUM (COALESCE(station_month.maintain_labor_cost,0) + ............
    COALESCE(station_month.employee_compensation,0)), 2) AS totalCost,
    ROUND( SUM ( info.ast_original_value ), 2 ) AS totalAstValue
    FROM
    powerX.cost_XXXXX_pl_info info
    INNER JOIN powerX.cost_XXXXX_pl_month station_month ON info.psr_id = station_month.psr_id
    AND info.ast_id = station_month.ast_id
    where  info.professional_type = '01'
    AND info.equip_type = 'xl'
    AND station_month.sun_year_month in ( '202301', '202302','202303','202304'  )

    执行计划:

      Aggregate (cost=4938.02..4938.03 rows=1 width=64) (actual time=85.167..85.169 rows=1 loops=1)
      Buffers: shared hit=44893
      -> Nested Loop (cost=13.38..4937.97 rows=1 width=126) (actual time=0.720..82.459 rows=1343 loops=1)
      Buffers: shared hit=44893
      -> Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)
      Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
      Heap Blocks: exact=764
      Buffers: shared hit=786
      -> Bitmap Index Scan on idx_cost_XXXXX_pl_info_equip_type_professional (cost=0.00..12.85 rows=443 width=0) (actual time=0.614..0.614 rows=7013 loops=1)
      Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
      Buffers: shared hit=22
      -> Index Scan using idx_cost_XXXXX_pl_month_idx_prsid_astid on cost_XXXXX_pl_month station_month (cost=0.42..7.93 rows=1 width=176) (actual time=0.011..0.011 rows=0 loops=7013)
      Index Cond: (((psr_id)::text = (info.psr_id)::text) AND ((ast_id)::text = (info.ast_id)::text))
      Filter: ((sun_year_month)::text = ANY ('{202301,202302,202303,202304}'::text[]))
      Rows Removed by Filter: 3
      Buffers: shared hit=44107
      Planning Time: 0.667 ms

      发现“隐患”

        Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)
        Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
        Heap Blocks: exact=764
        Buffers: shared hit=786

        其中  rows=443 VS rows=7013,隐藏比较深。毕竟就是2个数字很难直接定位到。其实源SQL执行计划更长,需要再一屏甚至更长的执行计划中发现隐患还是比较困难的。

        疑问:为什么估算的rows会差异如此之多?是不是本次统计信息过期导致rows评估偏差过多?经过排查后发现统计信息并没有过期。根据SQL条件测试分别单独测试两个条件:

          (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))

          执行计划1

            Bitmap Heap Scan on cost_XXXXX_pl_info info  (cost=129.15..7439.87 rows=6804 width=723) (actual time=219.819..221.361 rows=7013 loops=1)
            Recheck Cond: ((equip_type)::text = 'xl'::text)
            Heap Blocks: exact=764
            Buffers: shared hit=764 read=22
            I/O Timings: read=219.260
            -> Bitmap Index Scan on idx_cost_XXXXX_pl_info_equip_type_professional (cost=0.00..127.45 rows=6804 width=0) (actual time=219.735..219.736 rows=7013 loops=1)
            Index Cond: ((equip_type)::text = 'xl'::text)
            Buffers: shared read=22
            I/O Timings: read=219.260
            Planning Time: 0.126 ms
            Execution Time: 221.680 ms

            执行计划2

              Seq Scan on cost_XXXXX_pl_info info  (cost=0.00..8328.08 rows=6804 width=723) (actual time=0.007..22.252 rows=7014 loops=1)
              Filter: ((professional_type)::text = '01'::text)
              Rows Removed by Filter: 97392
              Buffers: shared hit=7023
              Planning Time: 0.117 ms
              Execution Time: 22.569 ms

              发现估算的rows 精确度都比较高,为啥两个一起组合rows偏差如此之大呢?

              查看rows计算逻辑:

                with taba as (
                select schemaname, tablename, attname, null_frac, array_to_string( most_common_vals ,',') most_common_vals ,
                array_to_string( most_common_freqs ,',') most_common_freqs
                from pg_stats where tablename = 'cost_XXXXX_pl_info' and attname in( 'professional_type','equip_type')
                ) select schemaname, tablename, t.reltuples, attname, null_frac,
                regexp_split_to_table(most_common_vals, ',') vals, regexp_split_to_table(most_common_freqs, ',') cardinality,
                round( regexp_split_to_table(most_common_freqs, ',')::float * (t.reltuples - null_frac)) rowss
                from taba
                inner join pg_class t on taba.tablename = t.relname and t.relnamespace::regnamespace::text = taba.schemaname;

                查询结果:

                  Schemaname  | tablename  |reltuples    |attname   |     null_frac |   vals |  most_common_freqs  |rowss 
                  powerX cost_XXXXX_pl_info 104406 equip_type 0 dytq 0.7334 76571
                  powerX cost_XXXXX_pl_info 104406 equip_type 0 dkx 0.1727 18031
                  powerX cost_XXXXX_pl_info 104406 equip_type 0 xl 0.06516667 6804
                  powerX cost_XXXXX_pl_info 104406 equip_type 0 zf01 0.019 1984
                  powerX cost_XXXXX_pl_info 104406 equip_type 0 zf06 0.0059666666 623
                  powerX cost_XXXXX_pl_info 104406 equip_type 0 zf08 0.0025 261
                  powerX cost_XXXXX_pl_info 104406 equip_type 0 zf04 0.0010333334 108
                  powerX cost_XXXXX_pl_info 104406 professional_type 0 03 0.91566664 95601
                  powerX cost_XXXXX_pl_info 104406 professional_type 0 01 0.06516667 6804
                  powerX cost_XXXXX_pl_info 104406 professional_type 0 02 0.0191 1994

                  而结果 6804 正是单一条件的执行计划中 估算rows。

                  计算方式:(reltuples  -   null_frac )  * most_common_freqs 

                   104406  * 0.06516667  =  6804  

                  同理两个条件的rows估算逻辑

                  104406 * 0.06516667 * 0.06516667  =  443 正是两个过滤条件的rows

                    Bitmap Heap Scan on cost_XXXXX_pl_info info (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)
                    Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
                    Heap Blocks: exact=764
                    Buffers: shared hit=786

                    优化方式:

                    如何纠正rows评估差异问题?根据官方文档通过创建扩展统计信息 创建扩展统计信息:

                      create statistics  stat_p_e  ( dependencies  )  on   professional_type, equip_type   from  powerX.cost_XXXXX_pl_info;

                      收集统计信息:Analyse powerX.cost_XXXXX_pl_info;

                      查看两个过滤条件的执行计划:之后发现执行计划符合预期。 

                        ---- 正确的执行计划 
                        Bitmap Heap Scan on cost_stationline_pl_info info (cost=146.09..7472.62 rows=6797 width=723) (actual time=0.730..2.339 rows=7013 loops=1)
                        Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
                        Heap Blocks: exact=764
                        Buffers: shared hit=786
                        -> Bitmap Index Scan on idx_cost_stationline_pl_info_equip_type_professional (cost=0.00..144.39 rows=6797 width=0) (actual time=0.648..0.649 rows=7013 loops=1)
                        Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
                        Buffers: shared hit=22
                        Planning Time: 0.144 ms
                        Execution Time: 2.657 ms

                        优化成果: Hash Join

                          Aggregate  (cost=14662.17..14662.18 rows=1 width=64) (actual time=16.336..16.339 rows=1 loops=1)
                          Buffers: shared hit=1236
                          -> Hash Join (cost=7574.99..14662.12 rows=1 width=126) (actual time=11.282..13.865 rows=1343 loops=1)
                          Hash Cond: (((station_month.psr_id)::text = (info.psr_id)::text) AND ((station_month.ast_id)::text = (info.ast_id)::text))
                          Buffers: shared hit=1236
                          -> Index Scan using cost_stationline_pl_month_idx_sun_year_month on cost_stationline_pl_month station_month (cost=0.42..7070.73 rows=3202 width=176) (actual time=0.063..1.184 rows=3235 loops=1)
                          Index Cond: ((sun_year_month)::text = ANY ('{202301,202302,202303,202304}'::text[]))
                          Buffers: shared hit=450
                          I/O Timings: read=290.886
                          -> Hash (cost=7472.62..7472.62 rows=6797 width=78) (actual time=11.190..11.191 rows=7013 loops=1)
                          Buckets: 8192 Batches: 1 Memory Usage: 620kB
                          Buffers: shared hit=786
                          -> Bitmap Heap Scan on cost_stationline_pl_info info (cost=146.09..7472.62 rows=6797 width=78) (actual time=2.371..8.550 rows=7013 loops=1)
                          Recheck Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
                          Heap Blocks: exact=764
                          Buffers: shared hit=786
                          -> Bitmap Index Scan on idx_cost_stationline_pl_info_equip_type_professional (cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)
                          Index Cond: (((equip_type)::text = 'xl'::text) AND ((professional_type)::text = '01'::text))
                          Buffers: shared hit=22
                          Planning Time: 1.853 ms
                          Execution Time: 16.453 ms

                          效果对比:

                          优化前:(cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)

                          优化后:(cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)

                          优化前:执行计划:Nested Loop

                          优化后执行计划:Hash Join

                          优化前:执行成本: shared hit=44893    85ms

                          优化后执行计划: shared hit=1236   16ms


                          Rows评估

                          执行计划

                          执行成本

                          优化前

                          (cost=12.96..1420.40 rows=443 width=78) (actual time=0.695..2.500 rows=7013 loops=1)

                          Nested Loop

                          shared hit=44893  85ms

                          优化后

                          (cost=0.00..144.39 rows=6797 width=0) (actual time=0.615..0.615 rows=7013 loops=1)

                          Hash join

                          shared hit=1236  16ms









                          文章转载自godba,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                          评论