同样的SQL语句，为什么执行时间有较大差异

原创杨天成 2023-08-08

679

适用范围

操作系统版本：CentOS Linux release 7.6.1810 (Core)
主机环境：虚拟机
数据库版本：MogDB 5.0.0 build 503a9ef7

问题概述

SQL语句，先后执行耗时差异很大

select
    count(*) as low_stock
from
    (
    select
        s_w_id,
        s_i_id,
        s_quantity
    from
        bmsql_stock
    where
        s_w_id = 16
        and s_quantity < 16
        and s_i_id in (
        select
            /*+ TIDB_INLJ(bmsql_order_line) */
            ol_i_id
        from
            bmsql_district
        join bmsql_order_line on
            ol_w_id = d_w_id
            and ol_d_id = d_id
            and ol_o_id >= d_next_o_id - 20
            and ol_o_id < d_next_o_id
        where
            d_w_id = 16
            and d_id = 1 ) ) as L;

对应的执行计划如下：

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=22190.42..22190.43 rows=1 width=8)
   ->  Nested Loop  (cost=19075.91..22177.37 rows=5220 width=0)
         ->  HashAggregate  (cost=19075.91..19077.81 rows=190 width=4)
               Group By Key: bmsql_order_line.ol_i_id
               ->  Nested Loop  (cost=203.52..19063.26 rows=5059 width=4)
                     ->  Index Scan using bmsql_district_pkey on bmsql_district  (cost=0.00..8.27 rows=1 width=12)
                           Index Cond: ((d_w_id = 16) AND (d_id = 1))
                     ->  Bitmap Heap Scan on bmsql_order_line  (cost=203.52..18599.67 rows=45532 width=16)
                           Recheck Cond: ((ol_w_id = 16) AND (ol_d_id = 1) AND (ol_o_id >= (bmsql_district.d_next_o_id - 20)) AND (ol_o_id < bmsql_district.d_next_o_id))
                           ->  Bitmap Index Scan on bmsql_order_line_pkey  (cost=0.00..192.14 rows=5059 width=0)
                                 Index Cond: ((ol_w_id = 16) AND (ol_d_id = 1) AND (ol_o_id >= (bmsql_district.d_next_o_id - 20)) AND (ol_o_id < bmsql_district.d_next_o_id))
         ->  Index Scan using bmsql_stock_pkey on bmsql_stock  (cost=0.00..8.08 rows=823 width=4)
               Index Cond: ((s_w_id = 16) AND (s_i_id = bmsql_order_line.ol_i_id))
               Filter: (s_quantity < 16)
(14 rows)

执行时间如下：
第一次：

tpccdb=> \i 11.sql
 low_stock 
-----------
        16
(1 row)

Time: 151.586 ms

第二次：

tpccdb=> \i 11.sql
 low_stock 
-----------
        15
(1 row)

Time: 13.722 ms

相差10倍以上

分析过程

使用perf针对线程进行分析，先找到会话对应的线程：18302

tpccdb=> select pg_backend_pid();
 pg_backend_pid 
----------------
 47560644040448
(1 row)

Time: 1.254 ms
tpccdb=> select lwpid from pg_os_threads where pid=pg_backend_pid();
 lwpid 
-------
 18302
(1 row)

Time: 2.188 ms
tpccdb=>

perf分析如下：
第一次：

[root@mogdb244 ~]# perf trace -t 18302 -s
^C
 Summary of events:

 worker (18302), 158 events, 98.8%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   pread64               30   134.959     0.004     4.499    30.972     34.53%
   poll                   1     8.233     8.233     8.233     8.233      0.00%
   semop                  3     0.677     0.005     0.226     0.451     57.13%
   sendto                17     0.138     0.005     0.008     0.016      7.85%
   lseek                  9     0.021     0.002     0.002     0.005     15.37%
   madvise                1     0.020     0.020     0.020     0.020      0.00%
   rt_sigprocmask         6     0.010     0.001     0.002     0.002      5.67%
   tgkill                 2     0.008     0.003     0.004     0.005     13.30%
   clock_gettime          3     0.007     0.002     0.002     0.003     13.41%
   read                   2     0.006     0.003     0.003     0.003      1.96%
   fadvise64              2     0.005     0.002     0.003     0.003     17.57%
   write                  1     0.004     0.004     0.004     0.004      0.00%
   gettid                 1     0.002     0.002     0.002     0.002      0.00%
   recvfrom               1     0.000     0.000     0.000     0.000      0.00%

第二次：

[root@mogdb244 ~]# perf trace -t 18302 -s
^C
 Summary of events:

 worker (18302), 117 events, 100.0%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   semop                  1     5.108     5.108     5.108     5.108      0.00%
   pread64               30     3.575     0.004     0.119     3.349     93.47%
   sendto                12     0.112     0.003     0.009     0.021     15.70%
   lseek                  9     0.021     0.002     0.002     0.006     18.87%
   madvise                1     0.016     0.016     0.016     0.016      0.00%
   clock_gettime          3     0.009     0.002     0.003     0.004     26.45%
   fadvise64              1     0.003     0.003     0.003     0.003      0.00%
   gettid                 1     0.002     0.002     0.002     0.002      0.00%
   recvfrom               1     0.000     0.000     0.000     0.000      0.00%

执行差异最大的发生在pread64函数上。pread64是操作系统函数，解释如下：

NOTES
       The  pread() and pwrite() system calls are especially useful in multithreaded applications.  They allow multiple threads to perform I/O on the same file descriptor without being
       affected by changes to the file offset by other threads.

       On Linux, the underlying system calls were renamed in kernel 2.6: pread() became pread64(), and pwrite() became pwrite64().  The system call  numbers  remained  the  same.   The
       glibc pread() and pwrite() wrapper functions transparently deal with the change.

       On some 32-bit architectures, the calling signature for these system calls differ, for the reasons described in syscall(2).

从该函数可以了解到，影响差异主要是在io方面。
通过了解，该物理机上有30个虚拟机在使用，通过分析定位到虚拟机相互之间io的影响比较大，导致SQL的执行耗时有较大差异。

小结

通过以上的分析，了解到虚拟机相互之间io的影响比较大，导致SQL的执行耗时有较大差异。

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

同样的SQL语句，为什么执行时间有较大差异

适用范围

问题概述

分析过程

小结

评论