暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

MySQL 性能排查 之一 系统排查

DBA 杂谈笔记 2021-05-31
348

关注点

    1.  CPU :%user,%sys,%idle,%iowait
    2. 内存:avalabel ,cache。swap,以及内存泄漏和OOM
    3. I/O :iops ,吞吐,时延,利用率
    4.  网络 :吞吐 (特别关注小包收发频率)
    复制

    CPU 

    top

      top - 09:28:09 up 79 days, 15:24,  2 users,  load average: 0.02, 0.08, 0.04
      Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie
      %Cpu(s): 2.1 us, 0.5 sy, 0.0 ni, 97.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
      KiB Mem : 4046300 total, 2422244 free, 91868 used, 1532188 buff/cache
      KiB Swap: 4194300 total, 4156212 free, 38088 used. 3661064 avail Mem


      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      761 root 20 0 80264 1428 1264 S 0.3 0.0 76:46.85 gapd
      1 root 20 0 38208 5392 3368 S 0.0 0.1 7:31.33 systemd
      2 root 20 0 0 0 0 S 0.0 0.0 0:00.05 kthreadd
      3 root 20 0 0 0 0 S 0.0 0.0 5:49.22 ksoftirqd/0
      5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
      7 root 20 0 0 0 0 S 0.0 0.0 14:40.66 rcu_sched
      8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
      9 root rt 0 0 0 0 S 0.0 0.0 3:10.08 migration/0
      10 root rt 0 0 0 0 S 0.0 0.0 0:23.44 watchdog/0
      11 root rt 0 0 0 0 S 0.0 0.0 0:17.68 watchdog/1
      12 root rt 0 0 0 0 S 0.0 0.0 3:09.57 migration/1
      13 root 20 0 0 0 0 S 0.0 0.0 5:50.96 ksoftirqd/1
      15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H
      16 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs
      17 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns
      18 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 perf
      19 root 20 0 0 0 0 S 0.0 0.0 0:02.70 khungtaskd
      20 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback
      21 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd
      23 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto
      24 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kintegrityd
      25 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset
      26 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd
      27 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ata_sff
      28 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 md
      29 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 devfreq_wq
      33 root 20 0 0 0 0 S 0.0 0.0 0:30.66 kswapd0
      34 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 vmstat
      35 root 20 0 0 0 0 S 0.0 0.0 0:00.01 fsnotify_mark
      复制
        主要关注项 
        1. load average,  5 分钟 10 分钟 ,15 分钟的负载变化 
        2. cpu 的使用率  查看整体占比是多少 (也可以 top 进入界面 1,查看每个cpu的使用情况) 
        如果:%user 占比高,可能是无索引,group by ,order by 无索引,慢sql 多 
        如果:%sys 占比高,可能是numa未关闭,也可能是连接数过多 
        如果 是 wa% 占比高的话,可能是io 成为瓶颈 ,需要具体判断 io 慢在哪里
        一般 id% 剩余 大于90 ,是正常的
        复制

        内存

          ubuntu@i-p8v1yk5a:~$ free -h
          total used free shared buff/cache available
          Mem: 3.9G 88M 2.3G 4.5M 1.5G 3.5G
          Swap: 4.0G
          复制
            1. 当 total - (free + buffer+cache) > 1/3 total 时 可能发生的内存泄漏 , 有大于 1/3 total,没有回收
            2. 当内存不够用的情况,会发生 OOM (out of memory),记录在  grep -i out   /var/log/messages*  会记录oom 是否发生   如果发生了 ,需要具体分析是否数据库参数的值是否合理
            3. 从中可以查看 swap 的使用率,一般数据库不建议使用swap,出现swap的情况 ,可能是 NUMA 未关闭,也可能是在服务器使用的时期某一段时间内存不够,使用到了swap会导致数据库响应时间变长
            grep swap etc/sysctl.conf
            vm.swappiness=1
            这是一个优先级,使用swap 的可能性,值越大使用的可能性越高 time).       b: The number of processes in uninterruptible sleep.   Memory       swpd: the amount of virtual memory used.       free: the amount of idle memory.       buff: the amount of memory used as buffers.       cache: the amount of memory used as cache.       inact: the amount of inactive memory.  (-a option)       active: the amount of active memory.  (-a option)   Swap       si: Amount of memory swapped in from disk (/s).       so: Amount of memory swapped to disk (/s).   IO       bi: Blocks received from a block device (blocks/s).       bo: Blocks sent to a block device (blocks/s).   System       in: The number of interrupts per secondincluding the clock.       cs: The number of context switches per second.   CPU       These are percentages of total CPU time.       us: Time spent running non-kernel code.  (user timeincluding nice time)       sy: Time spent running kernel code.  (system time)       idTime spent idle.  Prior to Linux 2.5.41, this includes IO-wait time.       wa: Time spent waiting for IO.  Prior to Linux 2.5.41, included in idle.       st: Time stolen from a virtual machine.  Prior to Linux 2.6.11, unknown.
            复制



            IO

              ubuntu@i-p8v1yk5a:~$ iostat -m -x 1 5 
              Linux 4.4.0-116-generic (i-p8v1yk5a) 05/31/2021 _x86_64_ (2 CPU)


              avg-cpu: %user %nice %system %iowait %steal %idle
              1.68 0.00 0.78 0.04 0.01 97.49


              Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
              vda 0.00 0.26 0.01 0.26 0.00 0.00 28.52 0.00 15.27 1.78 16.02 0.23 0.01
              vdb 0.00 0.03 0.00 0.00 0.00 0.00 66.97 0.00 0.72 0.14 0.93 0.55 0.00
              vdc 0.00 0.58 0.05 0.87 0.01 0.04 111.18 0.05 57.42 7.85 60.00 0.93 0.09


              avg-cpu: %user %nice %system %iowait %steal %idle
              1.51 0.00 2.01 0.00 0.00 96.48


              Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
              vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
              vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
              vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


              avg-cpu: %user %nice %system %iowait %steal %idle
              3.48 0.00 1.00 0.00 0.00 95.52


              Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
              vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
              vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
              vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


              avg-cpu: %user %nice %system %iowait %steal %idle
              0.00 0.00 0.00 0.00 0.00 100.00


              Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
              vda 0.00 0.00 0.00 3.00 0.00 0.01 8.00 0.00 0.00 0.00 0.00 0.00 0.00
              vdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
              vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00




              复制
                详情 man iostat
                The iostat command generates two types of reports, the CPU Utilization report and the Device Utilization report.


                CPU Utilization Report
                The first report generated by the iostat command is the CPU Utilization Report. For multiprocessor systems, the CPU values are global averages among all processors. The report has the following format:


                %user
                Show the percentage of CPU utilization that occurred while executing at the user level (application).


                %nice
                Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.


                %system
                Show the percentage of CPU utilization that occurred while executing at the system level (kernel).


                %iowait
                Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.


                %steal
                Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.


                %idle
                Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.


                Indicate the amount of data written to the device expressed in a number of blocks (kilobytes, megabytes) per second.


                Blk_read (kB_read, MB_read)
                The total number of blocks (kilobytes, megabytes) read.


                Blk_wrtn (kB_wrtn, MB_wrtn)
                The total number of blocks (kilobytes, megabytes) written.


                rrqm/s
                The number of read requests merged per second that were queued to the device.


                wrqm/s
                The number of write requests merged per second that were queued to the device.


                r/s
                The number (after merges) of read requests completed per second for the device.


                w/s
                The number (after merges) of write requests completed per second for the device.


                rsec/s (rkB/s, rMB/s)
                The number of sectors (kilobytes, megabytes) read from the device per second.


                wsec/s (wkB/s, wMB/s)
                The number of sectors (kilobytes, megabytes) written to the device per second.


                avgrq-sz
                The average size (in sectors) of the requests that were issued to the device.


                avgqu-sz
                The average queue length of the requests that were issued to the device.


                await
                The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.


                r_await
                The average time (in milliseconds) for read requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.


                w_await
                The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.


                svctm
                The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.


                %util
                Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.


                关注点
                iops = r/s + w/s
                磁盘 吞吐 =r/MBs + w/MBs
                await 响应时间 单位是 毫秒
                %util io 设备的繁忙程度 ,超过 50% 就需要关注,可能是平均值中隐藏的峰值
                也可以 结合 iotop 输出的 TID 和 MySQL threads.THREAD_OS_ID 的字段
                复制

                网络

                  [root@zgs ~]# sar -n DEV  1
                  Linux 3.10.0-957.el7.x86_64 (zgs) 11/29/2020 _x86_64_ (1 CPU)


                  02:41:53 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
                  02:41:54 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
                  02:41:54 PM virbr0-nic 0.00 0.00 0.00 0.00 0.00 0.00 0.00
                  02:41:54 PM virbr0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
                  02:41:54 PM ens33 3.00 1.00 0.26 0.17 0.00 0.00 0.00


                  02:41:54 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
                  02:41:55 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
                  02:41:55 PM virbr0-nic 0.00 0.00 0.00 0.00 0.00 0.00 0.00
                  02:41:55 PM virbr0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
                  02:41:55 PM ens33 3.03 3.03 0.22 0.74 0.00
                  0.00 0.00
                  复制
                    关注点
                    rxpck/s 每秒接收的数据包总数。
                    txpck /秒 每秒传输的数据包总数。
                    rxkB /秒 每秒接收的千字节总数。
                    txkB /秒每秒传输的千字节总数。
                    rxcmp /秒 每秒接收的压缩数据包数(用于cslip等)。
                    txcmp /秒 每秒传输的压缩数据包数。
                     rxmcst /秒  每秒接收到的组播数据包数。
                    主要观察每秒的网络吞吐  即 rxKB + txKB 的大小一般网卡1/10左右网卡就满了
                    检查TCP 的连接使用情况 active/s ,passive/s retrans/s 
                    sar -n TCP,ETCP 1
                    复制

                    系统整体

                      ubuntu@i-p8v1yk5a:~$ vmstat 1 10 
                      procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
                      r b swpd free buff cache si so bi bo in cs us sy id wa st
                      2 0 38088 2423024 334008 1198220 0 0 5 23 2 3 2 1 97 0 0
                      0 0 38088 2423072 334008 1198220 0 0 0 0 79 91 0 0 100 0 0
                      0 0 38088 2422900 334008 1198220 0 0 0 0 107 182 1 0 100 0 0
                      0 0 38088 2422692 334008 1198220 0 0 0 0 148 261 2 1 97 0 0
                      0 0 38088 2422856 334008 1198212 0 0 0 0 58 76 0 0 100 0 0
                      0 0 38088 2422664 334008 1198212 0 0 0 0 62 82 0 0 100 0 0
                      0 0 38088 2422756 334008 1198212 0 0 0 0 89 140 0 1 99 0 0
                      0 0 38088 2422468 334008 1198212 0 0 0 0 134 311 4 2 93 0 0
                      0 0 38088 2422604 334008 1198240 0 0 0 0 72 106 0 0 100 0 0
                      0 0 38088 2422604 334008 1198240 0 0 0 12 74 95 0 0 100 0 0


                      复制
                        详情可以 man vmstat
                        FIELD DESCRIPTION FOR VM MODE
                        Procs
                        r: The number of runnable processes (running or waiting for run time).
                        b: The number of processes in uninterruptible sleep.


                        Memory
                        swpd: the amount of virtual memory used.
                        free: the amount of idle memory.
                        buff: the amount of memory used as buffers.
                        cache: the amount of memory used as cache.
                        inact: the amount of inactive memory. (-a option)
                        active: the amount of active memory. (-a option)


                        Swap
                        si: Amount of memory swapped in from disk (/s).
                        so: Amount of memory swapped to disk (/s).


                        IO
                        bi: Blocks received from a block device (blocks/s).
                        bo: Blocks sent to a block device (blocks/s).


                        System
                        in: The number of interrupts per second, including the clock.
                        cs: The number of context switches per second.


                        CPU
                        These are percentages of total CPU time.
                        us: Time spent running non-kernel code. (user time, including nice time)
                        sy: Time spent running kernel code. (system time)
                        id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
                        wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
                               st: Time stolen from a virtual machine.  Prior to Linux 2.6.11, unknown.
                               
                        关注点:
                        可以了解系统cpu、内存、io、swap使用情况以及运行的进程的数量
                        复制


                        文章转载自DBA 杂谈笔记,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                        评论