暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

循序渐进丨MogDB 数据库查询重写规则uniquecheck详解

MogDB 2024-07-04
105

在 MogDB 和 openGauss 中,参数rewrite_rule用于控制查询重写,本文则介绍查询重写规则uniquecheck。uniquecheck表示提升无aggwhere子查询。

  参考示例 

现在有如下例子:
    orcl=> explain analyze select count(*)
    from test01 t1
    where t1.data_object_id =
    (select data_object_id
    from test02 t2
    where t1.object_id = t2.object_id);
    QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Aggregate (cost=369500968.48..369500968.49 rows=1 width=20) (actual time=122494.785..122494.785 rows=1 loops=1)
    -> Seq Scan on test01 t1 (cost=0.00..369500411.84 rows=222656 width=12) (actual time=0.912..121981.180 rows=4696064 loops=1)
    Filter: (data_object_id = (SubPlan 1))
    Rows Removed by Filter: 39841280
    SubPlan 1
    -> Index Scan using idx_test02_objectid on test02 t2 (cost=0.00..8.27 rows=1 width=6) (actual time=94019.753..99841.735 rows=44537344 loops=44537344)
    Index Cond: (t1.object_id = object_id)
    Total runtime: 122494.942 ms
    (8 rows)
    复制
    从执行计划上可以看到:
    • test01 t1是大表,有4000多万行数据;
    • 子查询test02 t2没有被提升(Oracle的说法是没有被展开),走了filter,被扫描了4000多万次;
    • 整个SQL查询耗时122秒,性能较差。
    现在设置rewrite_rule=uniquecheck,我们再来看一下执行计划:
      orcl=> explain analyze select *+ set(rewrite_rule uniquecheck) */  count(*)
      from test01 t1
      where t1.data_object_id =
      (select data_object_id
      from test02 t2
      where t1.object_id = t2.object_id);
      QUERY PLAN
      ---------------------------------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=1467229.99..1467230.00 rows=1 width=8) (actual time=10758.577..10758.577 rows=1 loops=1)
      -> Hash Join (cost=5660.88..1467225.97 rows=1608 width=0) (actual time=66.574..10424.858 rows=4696064 loops=1)
      Hash Cond: ((t1.object_id = subquery."?column?") AND (t1.data_object_id = subquery.data_object_id))
      -> Seq Scan on test01 t1 (cost=0.00..1227776.53 rows=44531153 width=12) (actual time=0.003..3561.282 rows=44537344 loops=1)
      -> Hash (cost=4356.08..4356.08 rows=86987 width=12) (actual time=65.462..65.462 rows=9172 loops=1)
      Buckets: 131072 Batches: 1 Memory Usage: 1427kB
      -> Subquery Scan on subquery (cost=2616.34..4356.08 rows=86987 width=12) (actual time=33.027..54.531 rows=86987 loops=1)
      -> HashAggregate (cost=2616.34..3486.21 rows=86987 width=12) (actual time=33.025..47.518 rows=86987 loops=1)
      Group By Key: t2.object_id
      Unique Check Required
      -> Seq Scan on test02 t2 (cost=0.00..2398.87 rows=86987 width=12) (actual time=0.005..9.467 rows=86987 loops=1)
      Total runtime: 10758.989 ms
      (12 rows)
      复制
      设置rewrite_rule=uniquecheck之后,子查询被提升了,t1和t2走了hash join,整个SQL耗时10秒,性能较好。
      执行计划中Unique Check Required表示检查子查询JOIN列(t2.object_id)是否唯一,如果子查询JOIN列(t2.object_id)不唯一,SQL会报错:
        orcl=> insert into test02 select * from test02;
        INSERT 0 86987
        orcl=> explain analyze select /*+ set(rewrite_rule uniquecheck) */ count(*)
        from test01 t1
        where t1.data_object_id =
        (select data_object_id
        from test02 t2
        where t1.object_id = t2.object_id);
        ERROR: more than one row returned by a subquery used as an expression
        复制
        如果子查询有agg函数,无需设置查询重写参数,可以自动提升:
          orcl=> explain analyze select count(*)
          from test01 t1
          where t1.data_object_id =
          (select max(data_object_id)
          from test02 t2
          where t1.object_id = t2.object_id);
          QUERY PLAN
          ------------------------------------------------------------------------------------------------------------------------------------------------
          Aggregate (cost=1470319.09..1470319.10 rows=1 width=8) (actual time=10345.027..10345.027 rows=1 loops=1)
          -> Hash Join (cost=8749.93..1470315.02 rows=1628 width=0) (actual time=122.192..10044.758 rows=4696064 loops=1)
          Hash Cond: ((t1.object_id = subquery."?column?") AND (t1.data_object_id = subquery.max))
          -> Seq Scan on test01 t1 (cost=0.00..1227776.53 rows=44531153 width=12) (actual time=0.007..3708.561 rows=44537344 loops=1)
          -> Hash (cost=7428.51..7428.51 rows=88095 width=38) (actual time=120.853..120.853 rows=9172 loops=1)
          Buckets: 131072 Batches: 1 Memory Usage: 1427kB
          -> Subquery Scan on subquery (cost=5666.61..7428.51 rows=88095 width=38) (actual time=79.485..110.452 rows=86987 loops=1)
          -> HashAggregate (cost=5666.61..6547.56 rows=88095 width=44) (actual time=79.483..103.440 rows=86987 loops=1)
          Group By Key: t2.object_id
          -> Seq Scan on test02 t2 (cost=0.00..4796.74 rows=173974 width=12) (actual time=0.007..20.196 rows=173974 loops=1)
          Total runtime: 10345.491 ms
          (11 rows)
          复制


          关于作者

          罗炳森,云和恩墨数据库架构师,15年SQL优化&数据库性能优化经验,擅长超大型,超复杂SQL&存储过程优化;国内首位专注于SQL优化实战案例博客作者;5年+数据库/ETL/BI/大数据培训经验,累计为中国培养了1000多名DBA,2000多名ETL/BI/大数据人才;2015年出版《Oracle查询优化改写技巧与案例》一书,累计销量达6万余册;2018年出版《SQL优化核心思想》,被翻译为繁体版在台湾发行。

          END

          MogDB 是云和恩墨基于 openGauss 开源内核进行增强提升,推出的一款安稳易用的企业级关系型数据库。其具备金融级高可用和全密态计算的极致安全、面向多核处理器的极致性能、AI自诊断调优的极致智能能力,能够满足从核心交易到复杂计算的企业级业务需求。

          访问官网了解更多:www.mogdb.io

          产品兼容适配申请:partner@enmotech.com

          进入交流群:Roger_database

          文章转载自MogDB,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

          评论