暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

ORACLE数据库性能抖动问题分析报告

IT那活儿 2024-09-09
104
点击上方“IT那活儿”公众号--专注于企业全栈运维技术分享,不管IT什么活儿,干就完了!!!   

  
事件背景:
数据库在4月8日10:23监控告警,数据库非空闲等待数量超过告警阈值,分析原因。



故障描述与概要分析

2024年4月8日10:23数据库非空闲等待数量超过告警阈值,未进行人工干预,自行恢复。业务主要运行在实例1。
数据库版本信息:
  • $ cd $ORACLE_HOME/OPatch
  • $ ./opatch lspatches
  • 27223075;
  • 27674384;Database Apr 2018 Release Update : 12.2.0.1.180417 (27674384)
  • 27464465;OCW APR 2018 RELEASE UPDATE 12.2.0.1.0(180129) (27464465)

1.1 ASH 分析

确认问题发生的准确时间:
INSTANCE_NUMBER  SAMPLE_ID SAMPLE_TIME                    SESSION_COUNT
--------------- ---------- ------------------------------ -------------
              1 177955505 08-APR-24 XXX.X7.00.725 AM                 22
              1 177955515 08-APR-24 XXX.X7.10.825 AM                  9
              1 177955525 08-APR-24 XXX.X7.20.926 AM                 21
              1 177955535 08-APR-24 XXX.X7.31.026 AM                 13
              1 177955545 08-APR-24 XXX.X7.41.126 AM                 11
              1 177955555 08-APR-24 XXX.X7.51.226 AM                 11
              1 177955565 08-APR-24 XXX.X8.01.326 AM                 14
              1 177955575 08-APR-24 XXX.X8.11.426 AM                 10
              1 177955585 08-APR-24 XXX.X8.21.526 AM                 20
              1 177955595 08-APR-24 XXX.X8.31.626 AM                 21
              1 177955605 08-APR-24 XXX.X8.41.726 AM                 14
              1 177955615 08-APR-24 XXX.X8.51.827 AM                 11
              1 177955625 08-APR-24 XXX.X9.01.926 AM                 14
              1 177955635 08-APR-24 XXX.X9.12.027 AM                 12
              1 177955645 08-APR-24 XXX.X9.22.127 AM                 12
              1 177955655 08-APR-24 XXX.X9.32.227 AM                  3
              1 177955665 08-APR-24 XXX.X9.42.327 AM                  8
              1 177955675 08-APR-24 XXX.X9.52.427 AM                  7
              1 177955685 08-APR-24 XXX.X0.02.527 AM                 11
              1 177955695 08-APR-24 XXX.X0.12.627 AM                  8
              1 177955705 08-APR-24 XXX.X0.22.727 AM                  6
              1 177955715 08-APR-24 XXX.X0.32.827 AM                  9
              1 177955725 08-APR-24 XXX.X0.42.927 AM                 13
              1 177955735 08-APR-24 XXX.X0.53.027 AM                147 <<<< 第一次抖动 出现时间
              1 177955745 08-APR-24 XXX.X1.03.198 AM                231
              1 177955755 08-APR-24 XXX.X1.13.398 AM                227
              1 177955765 08-APR-24 XXX.X1.23.598 AM                252
              1 177955775 08-APR-24 XXX.X1.33.798 AM                351
              1 177955785 08-APR-24 XXX.X1.43.978 AM                 32
              1 177955795 08-APR-24 XXX.X1.54.088 AM                139
              1 177955805 08-APR-24 XXX.X2.04.238 AM                194
              1 177955815 08-APR-24 XXX.X2.14.398 AM                 18 <<<<<<< 第一次抖动 结束时间
              1 177955825 08-APR-24 XXX.X2.24.498 AM                 12
              1 177955835 08-APR-24 XXX.X2.34.598 AM                 14
              1 177955845 08-APR-24 XXX.X2.44.698 AM                 19
              1 177955855 08-APR-24 XXX.X2.54.799 AM                  9
              1 177955865 08-APR-24 XXX.X3.04.898 AM                  8
              1 177955875 08-APR-24 XXX.X3.14.999 AM                 20
              1 177955885 08-APR-24 XXX.X3.25.099 AM                 14
              1 177955895 08-APR-24 XXX.X3.35.199 AM                 15
              1 177955905 08-APR-24 XXX.X3.45.299 AM                 15
              1 177955915 08-APR-24 XXX.X3.55.399 AM                 10
              1 177955925 08-APR-24 XXX.X4.05.499 AM                 13
              1 177955935 08-APR-24 XXX.X4.15.599 AM                 13
              1 177955945 08-APR-24 XXX.X4.25.699 AM                 11
              1 177955955 08-APR-24 XXX.X4.35.799 AM                  8
              1 177955965 08-APR-24 XXX.X4.45.900 AM                 10
              1 177955975 08-APR-24 XXX.X4.56.000 AM                 13
              1 177955985 08-APR-24 XXX.X5.06.100 AM                 14
              1 177955995 08-APR-24 XXX.X5.16.200 AM                  7
              1 177956005 08-APR-24 XXX.X5.26.300 AM                  7
              1 177956015 08-APR-24 XXX.X5.36.400 AM                 11
              1 177956025 08-APR-24 XXX.X5.46.500 AM                 12
              1 177956035 08-APR-24 XXX.X5.56.600 AM                 14
              1 177956045 08-APR-24 XXX.X6.06.700 AM                 14
              1 177956055 08-APR-24 XXX.X6.16.800 AM                 15
              1 177956065 08-APR-24 XXX.X6.26.910 AM                 10
              1 177956075 08-APR-24 XXX.X6.37.010 AM                 15
              1 177956085 08-APR-24 XXX.X6.47.111 AM                 11
              1 177956095 08-APR-24 XXX.X6.57.211 AM                 18
              1 177956105 08-APR-24 XXX.X7.07.311 AM                 10
              1 177956115 08-APR-24 XXX.X7.17.411 AM                 21
              1 177956125 08-APR-24 XXX.X7.27.511 AM                 12
              1 177956135 08-APR-24 XXX.X7.37.611 AM                 11
              1 177956145 08-APR-24 XXX.X7.47.711 AM                  6
              1 177956155 08-APR-24 XXX.X7.57.811 AM                 17
              1 177956165 08-APR-24 XXX.X8.07.911 AM                 18
              1 177956175 08-APR-24 XXX.X8.18.011 AM                  9
              1 177956185 08-APR-24 XXX.X8.28.111 AM                 14
              1 177956195 08-APR-24 XXX.X8.38.211 AM                 10
              1 177956205 08-APR-24 XXX.X8.48.312 AM                  8
              1 177956215 08-APR-24 XXX.X8.58.411 AM                 22
              1 177956225 08-APR-24 XXX.X9.08.512 AM                  4
              1 177956235 08-APR-24 XXX.X9.18.611 AM                 10
              1 177956245 08-APR-24 XXX.X9.28.712 AM                  9
              1 177956255 08-APR-24 XXX.X9.38.812 AM                  7
              1 177956265 08-APR-24 XXX.X9.48.912 AM                 16
              1 177956275 08-APR-24 XXX.X9.59.012 AM                 13
              1 177956285 08-APR-24 XXX.X20.09.113 AM                 14
              1 177956295 08-APR-24 XXX.X20.19.212 AM                 12
              1 177956305 08-APR-24 XXX.X20.29.312 AM                 12
              1 177956315 08-APR-24 XXX.X20.39.412 AM                  6
              1 177956325 08-APR-24 XXX.X20.49.512 AM                 12
              1 177956335 08-APR-24 XXX.X20.59.612 AM                  4
              1 177956345 08-APR-24 XXX.X21.09.713 AM                 19
              1 177956355 08-APR-24 XXX.X21.19.813 AM                 19
              1 177956365 08-APR-24 XXX.X21.29.913 AM                 15
              1 177956375 08-APR-24 XXX.X21.40.013 AM                 10
              1 177956385 08-APR-24 XXX.X21.50.113 AM                  7
              1 177956395 08-APR-24 XXX.X22.00.213 AM                 17
              1 177956405 08-APR-24 XXX.X22.10.314 AM                 15
              1 177956415 08-APR-24 XXX.X22.20.413 AM                  5
              1 177956425 08-APR-24 XXX.X22.30.513 AM                  9
              1 177956435 08-APR-24 XXX.X22.40.613 AM                  8
              1 177956445 08-APR-24 XXX.X22.50.714 AM                 10
              1 177956455 08-APR-24 XXX.X23.00.813 AM                232 <<<< 第二次 抖动开始时间
              1 177956465 08-APR-24 XXX.X23.10.944 AM                276
              1 177956475 08-APR-24 XXX.X23.21.134 AM                272
              1 177956485 08-APR-24 XXX.X23.31.334 AM                368
              1 177956495 08-APR-24 XXX.X23.41.534 AM                387
              1 177956505 08-APR-24 XXX.X23.51.734 AM                351
              1 177956515 08-APR-24 XXX.X24.01.904 AM                 21 <<<<< 第二次 抖动结束时间
              1 177956525 08-APR-24 XXX.X24.12.004 AM                 14
              1 177956535 08-APR-24 XXX.X24.22.104 AM                 14
              1 177956545 08-APR-24 XXX.X24.32.204 AM                 10
              1 177956555 08-APR-24 XXX.X24.42.304 AM                  8
              1 177956565 08-APR-24 XXX.X24.52.405 AM                 15
              1 177956575 08-APR-24 XXX.X25.02.506 AM                 16
              1 177956585 08-APR-24 XXX.X25.12.605 AM                  7
              1 177956595 08-APR-24 XXX.X25.22.705 AM                  9
              1 177956605 08-APR-24 XXX.X25.32.806 AM                 14
              1 177956615 08-APR-24 XXX.X25.42.905 AM                 12
              1 177956625 08-APR-24 XXX.X25.53.005 AM                  6
              1 177956635 08-APR-24 XXX.X26.03.105 AM                 20
              1 177956645 08-APR-24 XXX.X26.13.205 AM                 17
              1 177956655 08-APR-24 XXX.X26.23.305 AM                 18
              1 177956665 08-APR-24 XXX.X26.33.405 AM                 15
              1 177956675 08-APR-24 XXX.X26.43.505 AM                 11
              1 177956685 08-APR-24 XXX.X26.53.606 AM                  7
              1 177956695 08-APR-24 XXX.X27.03.706 AM                  6
              1 177956705 08-APR-24 XXX.X27.13.806 AM                 14
              1 177956715 08-APR-24 XXX.X27.23.906 AM                  9
              1 177956725 08-APR-24 XXX.X27.34.006 AM                  6
              1 177956735 08-APR-24 XXX.X27.44.106 AM                 19
              1 177956745 08-APR-24 XXX.X27.54.206 AM                  4
              1 177956755 08-APR-24 XXX.X28.04.306 AM                  7
              1 177956765 08-APR-24 XXX.X28.14.406 AM                 14
              1 177956775 08-APR-24 XXX.X28.24.506 AM                 11
              1 177956785 08-APR-24 XXX.X28.34.606 AM                 12
              1 177956795 08-APR-24 XXX.X28.44.707 AM                 15

分析:
  • 在4/8 10:00~10:30 期间,数据库出现两次性能抖动
    第一次抖动时间段 08-APR-24 10.10.53.027 AM ~  08-APR-24 10.12.14.398 AM
    第二次抖动时间段 08-APR-24 10.23.00.813 AM ~  08-APR-24 10.24.01.904 AM
检查故障时段的top event:
SAMPLE_TIME                    INSTANCE_NUMBER EVENT                           SESSION SESSION_COUNT
 ------------------------------ --------------- ------------------------------- ------- -------------
08-APR-24 XXX.X9.22.127 AM         1 ON CPU              7
08-APR-24 XXX.X9.22.127 AM         1 db file sequential read         WAITING             5
08-APR-24 XXX.X9.32.227 AM         1 ON CPU              2
08-APR-24 XXX.X9.32.227 AM         1 db file sequential read         WAITING             1
08-APR-24 XXX.X9.42.327 AM         1 ON CPU              4
08-APR-24 XXX.X9.42.327 AM         1 LGWR wait for redo copy         WAITING             1
08-APR-24 XXX.X9.42.327 AM         1 gc cr grant 2-way               WAITING             1
08-APR-24 XXX.X9.42.327 AM         1 gc current grant 2-way          WAITING             1
08-APR-24 XXX.X9.42.327 AM         1 log file sync                   WAITING             1
08-APR-24 XXX.X9.52.427 AM         1 ON CPU              7
08-APR-24 XXX.X0.02.527 AM         1 ON CPU              8
08-APR-24 XXX.X0.02.527 AM         1 Sync ASM rebalance              WAITING             2
08-APR-24 XXX.X0.12.627 AM         1 ON CPU              6
08-APR-24 XXX.X0.12.627 AM         1 db file sequential read         WAITING             2
08-APR-24 XXX.X0.22.727 AM         1 db file sequential read         WAITING             3
08-APR-24 XXX.X0.22.727 AM         1 ON CPU              3
08-APR-24 XXX.X0.32.827 AM         1 ON CPU              6
08-APR-24 XXX.X0.32.827 AM         1 db file sequential read         WAITING             3
08-APR-24 XXX.X0.42.927 AM         1 ON CPU              7
08-APR-24 XXX.X0.42.927 AM         1 db file sequential read         WAITING             4
08-APR-24 XXX.X0.53.027 AM         1 gc current grant 2-way          WAITING            86
08-APR-24 XXX.X0.53.027 AM         1 gc cr grant 2-way               WAITING            18
08-APR-24 XXX.X1.03.198 AM         1 gc current grant 2-way          WAITING           160
08-APR-24 XXX.X1.03.198 AM         1 gc cr grant 2-way               WAITING            17
08-APR-24 XXX.X1.13.398 AM         1 gc current grant 2-way          WAITING           146
08-APR-24 XXX.X1.13.398 AM         1 log file sync                   WAITING            20
08-APR-24 XXX.X1.23.598 AM         1 gc current grant 2-way          WAITING           172
08-APR-24 XXX.X1.23.598 AM         1 log file sync                   WAITING            28
08-APR-24 XXX.X1.33.798 AM         1 gc current grant 2-way          WAITING           203
08-APR-24 XXX.X1.33.798 AM         1 gc cr grant 2-way               WAITING            39
08-APR-24 XXX.X1.43.978 AM         1 ON CPU             19
08-APR-24 XXX.X1.43.978 AM         1 db file sequential read         WAITING             7
08-APR-24 XXX.X1.54.088 AM         1 gc current grant 2-way          WAITING           105
08-APR-24 XXX.X1.54.088 AM         1 log file sync                   WAITING            13
08-APR-24 XXX.X2.04.238 AM         1 gc current grant 2-way          WAITING           125
08-APR-24 XXX.X2.04.238 AM         1 log file sync                   WAITING            28
08-APR-24 XXX.X2.14.398 AM         1 ON CPU              8
08-APR-24 XXX.X2.14.398 AM         1 db file sequential read         WAITING             3
08-APR-24 XXX.X2.24.498 AM         1 ON CPU              6
08-APR-24 XXX.X2.24.498 AM         1 db file sequential read         WAITING             2
08-APR-24 XXX.X2.24.498 AM         1 gc current grant 2-way          WAITING             2
08-APR-24 XXX.X2.34.598 AM         1 db file sequential read         WAITING             8
08-APR-24 XXX.X2.34.598 AM         1 ON CPU              5
08-APR-24 XXX.X2.44.698 AM         1 ON CPU             11
08-APR-24 XXX.X2.44.698 AM         1 db file sequential read         WAITING             5
08-APR-24 XXX.X2.54.799 AM         1 ON CPU              4
08-APR-24 XXX.X2.54.799 AM         1 db file sequential read         WAITING             3
08-APR-24 XXX.X3.04.898 AM         1 ON CPU              7
08-APR-24 XXX.X3.04.898 AM         1 db file sequential read         WAITING             1
08-APR-24 XXX.X3.14.999 AM         1 ON CPU              9
08-APR-24 XXX.X3.14.999 AM         1 db file sequential read         WAITING             5
08-APR-24 XXX.X3.25.099 AM         1 db file sequential read         WAITING             5
08-APR-24 XXX.X3.25.099 AM         1 log file sync                   WAITING             4
08-APR-24 XXX.X3.35.199 AM         1 ON CPU             11
……
08-APR-24 XXX.X21.29.913 AM         1 ON CPU              8
08-APR-24 XXX.X21.29.913 AM         1 db file sequential read         WAITING             6
08-APR-24 XXX.X21.40.013 AM         1 ON CPU              8
08-APR-24 XXX.X21.40.013 AM         1 gc current grant 2-way          WAITING             2
08-APR-24 XXX.X21.50.113 AM         1 db file sequential read         WAITING             5
08-APR-24 XXX.X21.50.113 AM         1 ON CPU              2
08-APR-24 XXX.X22.00.213 AM         1 ON CPU             13
08-APR-24 XXX.X22.00.213 AM         1 gc current grant 2-way          WAITING             2
08-APR-24 XXX.X22.10.314 AM         1 ON CPU             10
08-APR-24 XXX.X22.10.314 AM         1 PGA memory operation            WAITING             1
08-APR-24 XXX.X22.10.314 AM         1 db file parallel write          WAITING             1
08-APR-24 XXX.X22.10.314 AM         1 db file sequential read         WAITING             1
08-APR-24 XXX.X22.10.314 AM         1 gc cr block busy                WAITING             1
08-APR-24 XXX.X22.10.314 AM         1 gc current grant 2-way          WAITING             1
08-APR-24 XXX.X22.20.413 AM         1 ON CPU              2
08-APR-24 XXX.X22.20.413 AM         1 db file parallel write          WAITING             1
08-APR-24 XXX.X22.20.413 AM         1 db file sequential read         WAITING             1
08-APR-24 XXX.X22.20.413 AM         1 gc current grant 2-way          WAITING             1
08-APR-24 XXX.X22.30.513 AM         1 ON CPU              5
08-APR-24 XXX.X22.30.513 AM         1 db file sequential read         WAITING             3
08-APR-24 XXX.X22.40.613 AM         1 ON CPU              6
08-APR-24 XXX.X22.40.613 AM         1 db file sequential read         WAITING             2
08-APR-24 XXX.X22.50.714 AM         1 ON CPU              7
08-APR-24 XXX.X22.50.714 AM         1 db file sequential read         WAITING             2
08-APR-24 XXX.X23.00.813 AM         1 gc current grant 2-way          WAITING           145
08-APR-24 XXX.X23.00.813 AM         1 gc cr grant 2-way               WAITING            29
08-APR-24 XXX.X23.10.944 AM         1 gc current grant 2-way          WAITING           167
08-APR-24 XXX.X23.10.944 AM         1 log file sync                   WAITING            30
08-APR-24 XXX.X23.21.134 AM         1 gc current grant 2-way          WAITING           170
08-APR-24 XXX.X23.21.134 AM         1 gc cr grant 2-way               WAITING            30
08-APR-24 XXX.X23.31.334 AM         1 gc current grant 2-way          WAITING           210
08-APR-24 XXX.X23.31.334 AM         1 enq: TX - index contention      WAITING            34
08-APR-24 XXX.X23.41.534 AM         1 gc current grant 2-way          WAITING           246
08-APR-24 XXX.X23.41.534 AM         1 gc cr grant 2-way               WAITING            44
08-APR-24 XXX.X23.51.734 AM         1 gc current grant 2-way          WAITING           238
08-APR-24 XXX.X23.51.734 AM         1 log file sync                   WAITING            26
08-APR-24 XXX.X24.01.904 AM         1 ON CPU             16
08-APR-24 XXX.X24.01.904 AM         1 db file sequential read         WAITING             2
08-APR-24 XXX.X24.12.004 AM         1 ON CPU              6
08-APR-24 XXX.X24.12.004 AM         1 db file sequential read         WAITING             5
08-APR-24 XXX.X24.22.104 AM         1 ON CPU             11
08-APR-24 XXX.X24.22.104 AM         1 db file sequential read         WAITING             3
08-APR-24 XXX.X24.32.204 AM         1 ON CPU              4
08-APR-24 XXX.X24.32.204 AM         1 db file sequential read         WAITING             2
08-APR-24 XXX.X24.42.304 AM         1 ON CPU              7
08-APR-24 XXX.X24.42.304 AM         1 db file sequential read         WAITING             1
08-APR-24 XXX.X24.52.405 AM         1 db file sequential read         WAITING             5
08-APR-24 XXX.X24.52.405 AM         1 ON CPU              4
08-APR-24 XXX.X25.02.506 AM         1 ON CPU             10
08-APR-24 XXX.X25.02.506 AM         1 Sync ASM rebalance              WAITING             2
08-APR-24 XXX.X25.12.605 AM         1 db file sequential read         WAITING             3
08-APR-24 XXX.X25.12.605 AM         1 ON CPU              3
08-APR-24 XXX.X25.22.705 AM         1 ON CPU              8

分析:
数据库两次抖动发生时,top 等待为 gc current grant 2-way和 log file sync
==========================================
参考WAITEVENT: "gc current/cr grant 2-way" Reference Note ( Doc ID 2365429.1 ) 对该等待的解释:
Check the following if the average wait time is high:
Network bandwidth saturation
Private interconnect configuration
Run queue length and CPU utilization
Check if lms processes are running in real time or elevated priority
参考 Top 5 Database and/or Instance Performance Issues in RAC Environment (Doc ID 1373500.1),对log file sync 解释
High log file sync waits
A typical life cycle of 'log file sync' wait
1. A user sessions issues a commit or rollback and starts waiting on log file sync.
2. LGWR gather redo information to be written to redo log files, issues IO and queues BOC to an LMS process andposts LMS process.
3. LGWR waits for redo buffers to be flushed to disk and SCN to be ACK'd
4. Completion of IO and receiving ACK from LMS signal write complete. LGWR then posts foreground process to continue.
5. Foreground process wakes up and log file sync wait ends.
==========================================
对上述两个等待事件的阻塞连进行分析:
1)两个故障时段的相应等待事件阻塞链相同
with ash as
 (select /*+ materialize */
   *
 from ash2024040810
 where sample_time between to_date('20240408 10:22:00','yyyymmdd hh24:mi:ss')
   AND to_date('20240408 10:25:00','yyyymmdd hh24:mi:ss')),
 chains as
   (select inst_id,
   session_id,
    level lvl,
    sys_connect_by_path('inst ' || inst_id || ' ' ||
    event,
    ' -> ') path,
    connect_by_isleaf isleaf
    from ash
    start with event = 'log file sync'
    connect by nocycle(prior blocking_session = session_id
    and prior blocking_session_serial# = session_serial#
    and prior blocking_inst_id = inst_id
    and prior sample_id = sample_id
    /*and ((prior sample_time) - sample_time between interval '-1'
    second and interval '1' second)*/
))
    select inst_id,
    lpad(round(ratio_to_report(count(*)) over() * 100) || '%', 5, ' ') "%this",
    count(*) samples,
    path
    from chains
    where isleaf = 1
    group by inst_id, path
   order by samples desc
    
INST_ID %this SAMPLES PATH
---------- ------ ---------- ----------------------------------------------------------------
1   56% 3186  -> inst 1 log file sync -> inst 1 wait for scn ack
1   30% 1709  -> inst 1 log file sync
1    8% 437  -> inst 1 log file sync -> inst 1 log file parallel write
1    2% 85  -> inst 1 log file sync -> inst 1
1    1% 63  -> inst 1 log file sync -> inst 1 latch: redo allocation -> inst 1 buffer busy waits
2    1% 48  -> inst 2 log file sync
1    1% 43  -> inst 1 log file sync -> inst 1 LGWR wait for redo copy
1    1% 35  -> inst 1 log file sync -> inst 1 latch: redo allocation
1    1% 29  -> inst 1 log file sync -> inst 1 LGWR wait for redo copy -> inst 1
1    0% 14  -> inst 1 log file sync -> inst 1 latch: redo allocation -> inst 1 gc current grant 2-way
1    0% 5  -> inst 1 log file sync -> inst 1 LGWR wait for redo copy -> inst 1 db file sequential read
1    0% 3  -> inst 1 log file sync -> inst 1 LGWR wait for redo copy -> inst 1 gc current grant 2-way

12 rows selected.
INST_ID %this SAMPLES PATH
---------- -------------------- ---------- --------------------------------------------------------
    1   96% 33808  -> inst 1 gc current grant 2-way
    2    4% 1388  -> inst 2 gc current grant 2-way
    1    0% 3  -> inst 1 gc current grant 2-way -> inst 1 read by other session
    1    0% 1  -> inst 1 gc current grant 2-way -> inst 1 gc current grant 2-way

分析:
  • log file sync 被wait for scn ack阻塞,wait for scn ack这涉及LGWR 进程将commit SCN 广播和LMS 进程同步SCN。
  • gc current grant 2-way 没有捕获到阻塞源。
2)检查故障时段是否存在 gc block lost
SQL> select INSTANCE_NUMBER,SAMPLE_TIME,event,count(1) from dba_hist_active_sess_history where
2   SAMPLE_TIME between TO_TIMESTAMP('2024/04/08 9:00', 'yyyy/mm/dd hh24:mi') and TO_TIMESTAMP('2024/04/08 11:00', 'yyyy/mm/dd hh24:mi')
3 and event like '%lost%'
4 group by SAMPLE_TIME,event,INSTANCE_NUMBER
5 order by SAMPLE_TIME,event
;

no rows selected

SQL> select SAMPLE_TIME,INST_ID,SESSION_ID,PROGRAM,SQL_ID,event,count(1) from gv$active_session_history
2   where   event like '%lost%'
3   group by SAMPLE_TIME,event,INST_ID,SESSION_ID,PROGRAM,SQL_ID
4   order by SAMPLE_TIME,event;

no rows selected

分析:
  • 故障时段,数据库未出现gc block lost 等待。

1.2 Trace 和 alert日志分析

>>>>> db1_lms1_23112_20240408.trc

*** 2024-04-08T10:21:43.495539+08:00
Warning: log write broadcast wait time 512ms (SCN 0xa76ac2d5343)

*** 2024-04-08T11:00:43.195616+08:00
Warning: log write broadcast wait time 950ms (SCN 0xa76ad117eff)

*** 2024-04-08T11:00:46.512182+08:00
Warning: log write broadcast wait time 1091ms (SCN 0xa76ad117fa4)


>>>> db1_lms2_23114_20240408.trc

*** 2024-04-08T10:10:56.321719+08:00
Warning: log write broadcast wait time 565ms (SCN 0xa76abfa69e5)

*** 2024-04-08T10:11:50.437075+08:00
Warning: log write broadcast wait time 822ms (SCN 0xa76abfb427e)
2024-04-08 10:11:50.553 : GSIPC:BSEND: deq msg 0xc000000b7731e0c8 type 44 qtm 46428 us seq 0.1449369548 from opid 33 to 2.3
2024-04-08 10:11:50.553 : GSIPC:BSEND: last proc to post me: pid 36.23179 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg

*** 2024-04-08T10:59:34.846358+08:00
Warning: log write broadcast wait time 616ms (SCN 0xa76ad105039)


*** 2024-04-08T11:00:43.195517+08:00
Warning: log write broadcast wait time 959ms (SCN 0xa76ad117efc)

分析:
  • Trace 显示在故障发生前,出现了 log write broadcast  的warning。Trace中出现该waring 的时间点多伴随性能抖动故障。
>>>>  db1_lms0_23110_20240408.trc
*** 2024-04-08T10:10:49.602685+08:00
2024-04-08 10:10:49.602 : GSIPC:BSEND: deq msg 0xc000000b76e861c8 type 44 qtm 38225 us seq 0.1449537396 from opid 31 to 2.1
2024-04-08 10:10:49.602 : GSIPC:BSEND: last proc to post me: pid 33.23173 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg
2024-04-08 10:10:49.602 : GSIPC:BSEND: deq msg 0xc000000b76e50ac8 type 44 qtm 38194 us seq 0.1449537398 from opid 31 to 2.1
2024-04-08 10:10:49.602 : GSIPC:BSEND: last proc to post me: pid 33.23173 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg
2024-04-08 10:10:49.602 : GSIPC:BSEND: deq msg 0xc000000b76f40fc8 type 44 qtm 38177 us seq 0.1449537399 from opid 31 to 2.1
2024-04-08 10:10:49.602 : GSIPC:BSEND: last proc to post me: pid 33.23173 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg
……

*** 2024-04-08T10:21:42.705368+08:00
Warning: log write broadcast wait time 775ms (SCN 0xa76ac2d52cb)

*** 2024-04-08T10:21:42.990402+08:00
2024-04-08 10:21:42.990 : GSIPC:BSEND: deq msg 0xc000000b76ee9cc8 type 44 qtm 36250 us seq 0.1451036005 from opid 31 to 2.1
2024-04-08 10:21:42.990 : GSIPC:BSEND: last proc to post me: pid 36.23179 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg
2024-04-08 10:21:42.990 : GSIPC:BSEND: deq msg 0xc000000b76e22e48 type 44 qtm 36235 us seq 0.1451036006 from opid 31 to 2.1
2024-04-08 10:21:42.990 : GSIPC:BSEND: last proc to post me: pid 36.23179 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg
2024-04-08 10:21:43.548 : GSIPC:BSEND: deq msg 0xc000000b76e89dc8 type 44 qtm 43563 us seq 0.1451036055 from opid 33 to 2.1
2024-04-08 10:21:43.548 : GSIPC:BSEND: last proc to post me: pid 36.23179 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg
2024-04-08 10:21:43.548 : GSIPC:BSEND: deq msg 0xc000000b76f0b3c8 type 44 qtm 41557 us seq 0.1451036059 from opid 33 to 2.1
2024-04-08 10:21:43.548 : GSIPC:BSEND: last proc to post me: pid 36.23179 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg
2024-04-08 10:21:43.548 : GSIPC:BSEND: deq msg 0xc000000b76eb36c8 type 44 qtm 40307 us seq 0.1451036061 from opid 33 to 2.1
2024-04-08 10:21:43.548 : GSIPC:BSEND: last proc to post me: pid 36.23179 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg
2024-04-08 10:21:43.548 : GSIPC:BSEND: deq msg 0xc000000b76f34a48 type 44 qtm 40280 us seq 0.1451036062 from opid 33 to 2.1
2024-04-08 10:21:43.548 : GSIPC:BSEND: last proc to post me: pid 36.23179 in kjc.h LINE:2308 ID:KJCS Post snd proxy to flush msg

分析:
  • Lms 进程持续输出如上含GSIPC:BSEND 的信息,这些信息是lms队列超过一定时间而产生的,每一条信息对应一个队列,可能会导致lms 繁忙而无法及时处理gc 消息。
对lms trace 中的 GSIPC:BSEND 信息进行统计,每一条信息对应一个队列 按时间进行统计队列情况,截取了部分信息,说明lms进程队列比较严重:
4 2024-04-0810:11:01.387
     11 2024-04-0810:11:01.957
     15 2024-04-0810:11:01.958
     18 2024-04-0810:11:01.997
     30 2024-04-0810:11:01.998
     19 2024-04-0810:11:02.037
     31 2024-04-0810:11:02.038
     13 2024-04-0810:11:02.087
     47 2024-04-0810:11:02.088
      8 2024-04-0810:11:02.089
     17 2024-04-0810:11:02.127
     15 2024-04-0810:11:02.128
     11 2024-04-0810:11:02.587
     49 2024-04-0810:11:02.588
     49 2024-04-0810:11:02.589
     51 2024-04-0810:11:02.590
     50 2024-04-0810:11:02.591
     51 2024-04-0810:11:02.592
     33 2024-04-0810:11:02.593
      6 2024-04-0810:11:02.660
     14 2024-04-0810:11:02.695
     28 2024-04-0810:11:02.742
      4 2024-04-0810:11:02.743
     24 2024-04-0810:11:02.977
     18 2024-04-0810:11:02.978

      1 2024-04-0810:22:59.720
      1 2024-04-0810:22:59.721
      4 2024-04-0810:22:59.824
     47 2024-04-0810:22:59.825
      1 2024-04-0810:22:59.826
      1 2024-04-0810:23:00.167
      5 2024-04-0810:23:00.168
      4 2024-04-0810:23:00.293
     26 2024-04-0810:23:00.518
     20 2024-04-0810:23:00.519
      6 2024-04-0810:23:00.557
     42 2024-04-0810:23:00.558
     35 2024-04-0810:23:00.603
     13 2024-04-0810:23:00.604
      4 2024-04-0810:23:00.943
     12 2024-04-0810:23:01.683
     29 2024-04-0810:23:02.123
     46 2024-04-0810:23:02.124
     45 2024-04-0810:23:02.125
     45 2024-04-0810:23:02.126
     23 2024-04-0810:23:02.127
     23 2024-04-0810:23:02.163
     29 2024-04-0810:23:02.164
     38 2024-04-0810:23:02.203
      4 2024-04-0810:23:02.204
      1 2024-04-0810:23:02.310
      5 2024-04-0810:23:02.311
     32 2024-04-0810:23:02.353
     30 2024-04-0810:23:02.354
     12 2024-04-0810:23:02.403
      4 2024-04-0810:23:02.783
      1 2024-04-0810:23:03.070
     43 2024-04-0810:23:03.071
      3 2024-04-0810:23:03.104

1.3 OSW 分析

1)检查CPU 负载
//节点1:
//节点2:
分析:
  • 两个节点主机CPU 负载在故障时段没有明显波动。
分析:
  • 两个节点主机内存使用情况在故障时段free 内存有所减少,到整体free 内存仍然充裕。
分析:
  • 两个节点磁盘busy 繁忙度正常。节点2 disk1 的avg wait time 在非故障时段出现两个小突刺,分别为2ms,7ms。
2)检查oswprivate 数据
-bash-4.2$ grep -A 1 " db02-priv (XXX.XX..22.112) " db01_prvtnet_24.04.08.1000.dat
--

 1 dbdb02-priv (XXX.XX..22.112) 0.161 ms 0.085 ms 0.076 ms
zzz ***Mon Apr 8 10:10:45 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.116 ms 0.065 ms 0.060 ms
zzz ***Mon Apr 8 10:11:00 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.147 ms 0.061 ms 0.058 ms
zzz ***Mon Apr 8 10:11:16 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.132 ms 0.066 ms 0.065 ms
zzz ***Mon Apr 8 10:11:31 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.175 ms 0.090 ms 0.083 ms
zzz ***Mon Apr 8 10:11:46 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.116 ms 0.066 ms 0.062 ms
zzz ***Mon Apr 8 10:12:01 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.121 ms 0.065 ms 0.059 ms
zzz ***Mon Apr 8 10:12:16 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.185 ms 0.098 ms 0.084 ms
zzz ***Mon Apr 8 10:12:31 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.112 ms 0.074 ms 0.061 ms
zzz ***Mon Apr 8 10:12:46 EAT 2024
......

--
 1 db02-priv (XXX.XX..22.112) 0.157 ms 0.081 ms 0.075 ms
zzz ***Mon Apr 8 10:22:51 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.108 ms 0.066 ms 0.060 ms
zzz ***Mon Apr 8 10:23:06 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.164 ms 0.080 ms 0.076 ms
zzz ***Mon Apr 8 10:23:21 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.138 ms 0.066 ms 0.057 ms
zzz ***Mon Apr 8 10:23:37 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.114 ms 0.065 ms 0.059 ms
zzz ***Mon Apr 8 10:23:52 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.127 ms 0.063 ms 0.057 ms
zzz ***Mon Apr 8 10:24:07 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.155 ms 0.085 ms 0.074 ms
zzz ***Mon Apr 8 10:24:22 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.141 ms 0.086 ms 0.067 ms
zzz ***Mon Apr 8 10:24:37 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.131 ms 0.066 ms 0.058 ms
zzz ***Mon Apr 8 10:24:52 EAT 2024
--

 1 db02-priv (XXX.XX..22.112) 0.129 ms 0.066 ms 0.060 ms
zzz ***Mon Apr 8 10:25:07 EAT 2024
--

-bash-4.2$ grep -B 2 " db01-priv ( *.*.*.*) " db02_prvtnet_24.04.08.1000.dat
--

zzz ***Mon Apr 8 10:10:38 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.384 ms 0.256 ms 0.252 ms
--

zzz ***Mon Apr 8 10:10:53 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.421 ms 0.252 ms 0.234 ms
--

zzz ***Mon Apr 8 10:11:08 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.356 ms 0.250 ms 0.246 ms
--

zzz ***Mon Apr 8 10:11:23 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.414 ms 0.238 ms 0.236 ms
--

zzz ***Mon Apr 8 10:11:39 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.392 ms 0.254 ms 0.252 ms
--

zzz ***Mon Apr 8 10:11:54 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.337 ms 0.259 ms 0.256 ms
--

zzz ***Mon Apr 8 10:12:09 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.425 ms 0.231 ms 0.228 ms
--

zzz ***Mon Apr 8 10:12:24 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.423 ms 0.253 ms 0.273 ms
--

zzz ***Mon Apr 8 10:12:39 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.410 ms 0.274 ms 0.271 ms
......
--

zzz ***Mon Apr 8 10:22:58 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.474 ms 0.257 ms 0.253 ms
--

zzz ***Mon Apr 8 10:23:13 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.396 ms 0.234 ms 0.223 ms
--

zzz ***Mon Apr 8 10:23:28 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.411 ms 0.247 ms 0.235 ms
--

zzz ***Mon Apr 8 10:23:44 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.413 ms 0.233 ms 0.230 ms
--

zzz ***Mon Apr 8 10:23:59 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.494 ms 0.278 ms 0.235 ms
--

zzz ***Mon Apr 8 10:24:14 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.483 ms 0.268 ms 0.254 ms
--

zzz ***Mon Apr 8 10:24:29 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.457 ms 0.247 ms 0.238 ms
--

zzz ***Mon Apr 8 10:24:44 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.463 ms 0.260 ms 0.244 ms
--

zzz ***Mon Apr 8 10:24:59 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.454 ms 0.245 ms 0.288 ms
--

zzz ***Mon Apr 8 10:25:14 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.413 ms 0.268 ms 0.249 ms
--

zzz ***Mon Apr 8 10:25:29 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.479 ms 0.254 ms 0.260 ms
--

zzz ***Mon Apr 8 10:25:44 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.322 ms 0.242 ms 0.238 ms
--

zzz ***Mon Apr 8 10:25:59 EAT 2024
traceroute to db01-priv ( *.*.*.*), 30 hops max, 40 byte packets
 1 db01-priv ( *.*.*.*) 0.419 ms 0.255 ms 0.244 ms

分析:
  • 两个节点的私网traceroute 延迟正常。
3)检查此前  *.*.*.*部署的ping 脚本日志
>>> rac_pingcheck_111_privip_202404081006.log

Mon Apr 8 10:10:47 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=236. time=0. ms
Mon Apr 8 10:10:48 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=237. time=0. ms
Mon Apr 8 10:10:49 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=238. time=0. ms
Mon Apr 8 10:10:50 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=239. time=0. ms
Mon Apr 8 10:10:51 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=240. time=0. ms
Mon Apr 8 10:10:52 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=241. time=0. ms
Mon Apr 8 10:10:53 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=242. time=0. ms
Mon Apr 8 10:10:54 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=243. time=0. ms
Mon Apr 8 10:10:55 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=244. time=0. ms
Mon Apr 8 10:10:56 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=245. time=0. ms
Mon Apr 8 10:10:57 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=246. time=0. ms
Mon Apr 8 10:10:58 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=247. time=0. ms
Mon Apr 8 10:10:59 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=248. time=0. ms
Mon Apr 8 10:11:00 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=249. time=0. ms
Mon Apr 8 10:11:01 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=250. time=0. ms
Mon Apr 8 10:11:02 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=251. time=0. ms
Mon Apr 8 10:11:03 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=252. time=0. ms
Mon Apr 8 10:11:04 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=253. time=0. ms
Mon Apr 8 10:11:05 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=254. time=0. ms
Mon Apr 8 10:11:06 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=255. time=0. ms
Mon Apr 8 10:11:07 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=256. time=0. ms
Mon Apr 8 10:11:08 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=257. time=0. ms
Mon Apr 8 10:11:09 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=258. time=0. ms
Mon Apr 8 10:11:10 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=259. time=0. ms
Mon Apr 8 10:11:11 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=260. time=0. ms
Mon Apr 8 10:11:12 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=261. time=0. ms
Mon Apr 8 10:11:13 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=262. time=0. ms
Mon Apr 8 10:11:14 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=263. time=0. ms
Mon Apr 8 10:11:15 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=264. time=0. ms
Mon Apr 8 10:11:16 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=265. time=0. ms
Mon Apr 8 10:11:17 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=266. time=0. ms
Mon Apr 8 10:11:18 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=267. time=0. ms
Mon Apr 8 10:11:19 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=268. time=0. ms
Mon Apr 8 10:11:20 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=269. time=0. ms
Mon Apr 8 10:11:22 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=270. time=0. ms
Mon Apr 8 10:11:23 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=271. time=0. ms
Mon Apr 8 10:11:24 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=272. time=0. ms
Mon Apr 8 10:11:25 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=273. time=0. ms
Mon Apr 8 10:11:26 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=274. time=0. ms
Mon Apr 8 10:11:27 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=275. time=0. ms
Mon Apr 8 10:11:28 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=276. time=0. ms
Mon Apr 8 10:11:29 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=277. time=0. ms
Mon Apr 8 10:11:30 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=278. time=0. ms
Mon Apr 8 10:11:31 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=279. time=0. ms
Mon Apr 8 10:11:32 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=280. time=0. ms
Mon Apr 8 10:11:33 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=281. time=0. ms
Mon Apr 8 10:11:34 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=282. time=0. ms
Mon Apr 8 10:11:35 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=283. time=0. ms
Mon Apr 8 10:11:36 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=284. time=0. ms
Mon Apr 8 10:11:37 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=285. time=0. ms
Mon Apr 8 10:11:38 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=286. time=0. ms
Mon Apr 8 10:11:39 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=287. time=0. ms
Mon Apr 8 10:11:40 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=288. time=0. ms
Mon Apr 8 10:11:41 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=289. time=0. ms
Mon Apr 8 10:11:42 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=290. time=0. ms
Mon Apr 8 10:11:43 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=291. time=0. ms
Mon Apr 8 10:11:44 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=292. time=0. ms
Mon Apr 8 10:11:45 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=293. time=0. ms
Mon Apr 8 10:11:46 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=294. time=0. ms
Mon Apr 8 10:11:47 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=295. time=0. ms
Mon Apr 8 10:11:48 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=296. time=0. ms
Mon Apr 8 10:11:49 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=297. time=0. ms
Mon Apr 8 10:11:50 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=298. time=0. ms
Mon Apr 8 10:11:51 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=299. time=0. ms
Mon Apr 8 10:11:52 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=300. time=0. ms
Mon Apr 8 10:11:53 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=301. time=0. ms
Mon Apr 8 10:11:54 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=302. time=0. ms
Mon Apr 8 10:11:55 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=303. time=0. ms
Mon Apr 8 10:11:56 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=304. time=0. ms
Mon Apr 8 10:11:57 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=305. time=0. ms
Mon Apr 8 10:11:58 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=306. time=0. ms
Mon Apr 8 10:11:59 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=307. time=0. ms
Mon Apr 8 10:12:00 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=308. time=0. ms
Mon Apr 8 10:12:01 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=309. time=0. ms
Mon Apr 8 10:12:02 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=310. time=0. ms
Mon Apr 8 10:12:03 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=311. time=0. ms
Mon Apr 8 10:12:04 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=312. time=0. ms
Mon Apr 8 10:12:05 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=313. time=0. ms
Mon Apr 8 10:12:06 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=314. time=0. ms
Mon Apr 8 10:12:07 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=315. time=0. ms
Mon Apr 8 10:12:08 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=316. time=0. ms
......
Mon Apr 8 10:22:58 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=959. time=0. ms
Mon Apr 8 10:22:59 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=960. time=0. ms
Mon Apr 8 10:23:00 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=961. time=0. ms
Mon Apr 8 10:23:01 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=962. time=0. ms
Mon Apr 8 10:23:02 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=963. time=0. ms
Mon Apr 8 10:23:03 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=964. time=0. ms
Mon Apr 8 10:23:04 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=965. time=0. ms
Mon Apr 8 10:23:05 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=966. time=0. ms
Mon Apr 8 10:23:06 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=967. time=0. ms
Mon Apr 8 10:23:07 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=968. time=0. ms
Mon Apr 8 10:23:08 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=969. time=0. ms
Mon Apr 8 10:23:09 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=970. time=0. ms
Mon Apr 8 10:23:10 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=971. time=0. ms
Mon Apr 8 10:23:11 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=972. time=0. ms
Mon Apr 8 10:23:12 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=973. time=0. ms
Mon Apr 8 10:23:13 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=974. time=0. ms
Mon Apr 8 10:23:14 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=975. time=0. ms
Mon Apr 8 10:23:15 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=976. time=0. ms
Mon Apr 8 10:23:16 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=977. time=0. ms
Mon Apr 8 10:23:17 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=978. time=0. ms
Mon Apr 8 10:23:18 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=979. time=0. ms
Mon Apr 8 10:23:19 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=980. time=0. ms
Mon Apr 8 10:23:20 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=981. time=0. ms
Mon Apr 8 10:23:21 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=982. time=0. ms
Mon Apr 8 10:23:22 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=983. time=0. ms
Mon Apr 8 10:23:23 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=984. time=0. ms
Mon Apr 8 10:23:24 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=985. time=0. ms
Mon Apr 8 10:23:25 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=986. time=0. ms
Mon Apr 8 10:23:26 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=987. time=0. ms
Mon Apr 8 10:23:27 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=988. time=0. ms
Mon Apr 8 10:23:28 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=989. time=0. ms
Mon Apr 8 10:23:29 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=990. time=0. ms
Mon Apr 8 10:23:30 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=991. time=0. ms
Mon Apr 8 10:23:31 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=992. time=0. ms
Mon Apr 8 10:23:32 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=993. time=0. ms
Mon Apr 8 10:23:33 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=994. time=0. ms
Mon Apr 8 10:23:34 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=995. time=0. ms
Mon Apr 8 10:23:35 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=996. time=0. ms
Mon Apr 8 10:23:36 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=997. time=0. ms
Mon Apr 8 10:23:37 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=998. time=0. ms
Mon Apr 8 10:23:38 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=999. time=0. ms
Mon Apr 8 10:23:39 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1000. time=0. ms
Mon Apr 8 10:23:40 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1001. time=0. ms
Mon Apr 8 10:23:41 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1002. time=0. ms
Mon Apr 8 10:23:42 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1003. time=0. ms
Mon Apr 8 10:23:43 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1004. time=0. ms
Mon Apr 8 10:23:44 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1005. time=0. ms
Mon Apr 8 10:23:45 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1006. time=0. ms
Mon Apr 8 10:23:46 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1007. time=0. ms
Mon Apr 8 10:23:47 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1008. time=0. ms
Mon Apr 8 10:23:48 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1009. time=0. ms
Mon Apr 8 10:23:49 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1010. time=0. ms
Mon Apr 8 10:23:50 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1011. time=0. ms
Mon Apr 8 10:23:51 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1012. time=0. ms
Mon Apr 8 10:23:52 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1013. time=0. ms
Mon Apr 8 10:23:53 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1014. time=0. ms
Mon Apr 8 10:23:54 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1015. time=0. ms
Mon Apr 8 10:23:55 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1016. time=0. ms
Mon Apr 8 10:23:56 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1017. time=0. ms
Mon Apr 8 10:23:57 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1018. time=0. ms
Mon Apr 8 10:23:58 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1019. time=0. ms
Mon Apr 8 10:23:59 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1020. time=0. ms
Mon Apr 8 10:24:00 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1021. time=0. ms
Mon Apr 8 10:24:01 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1022. time=0. ms
Mon Apr 8 10:24:02 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1023. time=0. ms
Mon Apr 8 10:24:03 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1024. time=0. ms
Mon Apr 8 10:24:04 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1025. time=0. ms
Mon Apr 8 10:24:05 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1026. time=0. ms
Mon Apr 8 10:24:06 EAT 2024: 500 bytes from *.*.*.*: icmp_seq=1027. time=0. ms

分析:
  • 节点2 ping 节点1 私网延迟正常。
4)检查OSWNETSTAT 数据
-bash-4.2$ grep "fragments dropped after timeout"   db02_netstat_24.04.08.1000.dat
        1279 fragments dropped after timeout
        1279 fragments dropped after timeout
        1279 fragments dropped after timeout
        1279 fragments dropped after timeout
        1279 fragments dropped after timeout
……
        1279 fragments dropped after timeout
        1279 fragments dropped after timeout
        1279 fragments dropped after timeout

-bash-4.2$ grep "fragments dropped (dup or out of space)"   db02_netstat_24.04.08.1000.dat
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
…….
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)

-bash-4.2$ grep "bad IP headers"   db02_netstat_24.04.08.1000.dat
        0 bad IP headers
        0 bad IP headers
        0 bad IP headers
        0 bad IP headers
……
        0 bad IP headers
        0 bad IP headers
        0 bad IP headers

-bash-4.2$ grep "incomplete headers"   db02_netstat_24.04.08.1000.dat
        11568 incomplete headers
        11568 incomplete headers
        11568 incomplete headers
        11568 incomplete headers
……
        11568 incomplete headers
        11568 incomplete headers

-bash-4.2$ grep "socket overflows"   db02_netstat_24.04.08.1000.dat
        11568 socket overflows
        11568 socket overflows
        11568 socket overflows
        11568 socket overflows
        11568 socket overflows
……
        11568 socket overflows
        11568 socket overflows
        11568 socket overflows





-bash-4.2$ grep "fragments dropped after timeout"   db01_netstat_24.04.08.1000.dat
        0 fragments dropped after timeout
        0 fragments dropped after timeout
        0 fragments dropped after timeout
        0 fragments dropped after timeout
        0 fragments dropped after timeout
……
        0 fragments dropped after timeout
        0 fragments dropped after timeout

-bash-4.2$ grep "fragments dropped (dup or out of space)"   db01_netstat_24.04.08.1000.dat
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
……
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)
        0 fragments dropped (dup or out of space)

-bash-4.2$ grep "bad IP headers"   db01_netstat_24.04.08.1000.dat
        0 bad IP headers
        0 bad IP headers
        0 bad IP headers
        0 bad IP headers
……
        0 bad IP headers
        0 bad IP headers

-bash-4.2$ grep "incomplete headers"   db01_netstat_24.04.08.1000.dat
        329 incomplete headers
        329 incomplete headers
        329 incomplete headers
……
        329 incomplete headers
        329 incomplete headers

-bash-4.2$ grep "socket overflows"   db01_netstat_24.04.08.1000.dat
        329 socket overflows
        329 socket overflows
        329 socket overflows
        329 socket overflows
……
        329 socket overflows
        329 socket overflows

分析:
  • 两个节点的 网卡分片和 udp 指标没有变化, 正常。

1.4 AWR 分析

分析:
  • 与上一业务周期相比,问题时段的数据库负载各项指标低于同期正常时段。
分析:
  • 与上一业务周期相比:
    问题时段的gc相关的等待基本在4~6ms左右,log file sync在4ms左右;
    正常时段的gc等待事件在200us左右,log file sync在800us左右。
  • 问题时段的读写性能,db file sequential read db file parallel write为500us,log file parallel write 为378us,说明I/O性能为正常的。
分析:
  • Wait for scn ack 平均等待时间达到了215ms。
分析:
  • 问题时段和正常时段的集群私网流量没有明显差异。

1.5 其他信息分析

归档切换频率检查,切换频率正常:
DAY 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
---------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
2024-04-09 14 42 36 18 8 1 9 7 4 11 22 16 12 7 17 20 19 3 0 0 0 0 0 0
2024-04-08 12 30 16 9 4 2 1 3 5 15 24 22 12 11 20 21 20 15 16 9 13 11 14 8
2024-04-07 12 25 11 9 0 1 0 2 4 10 24 16 11 7 14 18 14 16 12 18 8 12 11 11
2024-04-06 16 23 9 12 0 1 2 1 1 3 5 6 4 3 2 4 4 4 3 5 4 9 9 8
2024-04-05 14 32 12 9 0 1 2 2 2 2 4 5 4 3 2 3 3 4 4 3 4 5 11 6
2024-04-04 19 34 16 9 1 2 9 8 3 2 6 4 4 2 4 3 7 2 4 5 5 9 11 8
2024-04-03 15 36 27 13 6 5 8 1 9 12 20 18 12 7 13 39 14 14 9 10 10 8 10 11
2024-04-02 20 48 40 23 6 1 1 3 5 14 21 18 10 8 16 18 14 16 12 9 10 10 11 9
2024-04-01 17 36 25 26 5 1 11 13 10 18 34 31 29 11 17 23 17 19 24 14 11 11 10 8
2024-03-31 10 34 27 21 7 2 7 3 1 5 12 21 10 7 7 9 7 6 7 7 8 12 11 11
2024-03-30 22 46 52 67 30 10 19 5 4 7 8 8 7 5 5 7 7 6 7 7 11 10 11 5
2024-03-29 19 41 43 28 15 12 12 3 4 10 18 16 15 9 15 18 26 22 20 17 20 19 22 20
2024-03-28 21 47 38 27 22 18 12 2 4 10 20 17 11 6 12 16 44 16 14 13 12 15 12 12
2024-03-27 18 46 28 10 5 5 11 4 4 13 23 20 14 8 14 19 19 13 17 26 10 17 16 15
2024-03-26 26 30 24 19 8 2 10 2 6 13 21 20 12 8 13 16 26 22 13 12 12 10 11 9
2024-03-25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 14 10 8 17 11 9

归档生成量检查,归档生成量正常。
LOGTIME               COUNT(*) GBSIZE
------------------- ---------- ----------
2024-04-09 17:00:00          5 7
2024-04-09 16:00:00         19 25
2024-04-09 15:00:00         20 27
2024-04-09 14:00:00         17 22
2024-04-09 13:00:00          7 9
2024-04-09 12:00:00         12 15
2024-04-09 11:00:00         16 20
2024-04-09 10:00:00         22 29
2024-04-09 09:00:00         11 14
2024-04-09 08:00:00          4 5
2024-04-09 07:00:00          7 8
2024-04-09 06:00:00          9 11
2024-04-09 05:00:00          1 2
2024-04-09 04:00:00          8 10
2024-04-09 03:00:00         18 20
2024-04-09 02:00:00         36 44
2024-04-09 01:00:00         42 51
2024-04-09 00:00:00         14 18
2024-04-08 23:00:00          8 11
2024-04-08 22:00:00         14 13
2024-04-08 21:00:00         11 13
2024-04-08 20:00:00         13 14
2024-04-08 19:00:00          9 11
2024-04-08 18:00:00         16 20
2024-04-08 17:00:00         15 19
2024-04-08 16:00:00         20 26
2024-04-08 15:00:00         21 27
2024-04-08 14:00:00         20 26
2024-04-08 13:00:00         11 13
2024-04-08 12:00:00         12 15
2024-04-08 11:00:00         22 26
2024-04-08 10:00:00         24 31 <<<<<<<<
2024-04-08 09:00:00         15 19
2024-04-08 08:00:00          5 7
2024-04-08 07:00:00          3 3
2024-04-08 06:00:00          1 2
2024-04-08 05:00:00          2 2
2024-04-08 04:00:00          4 6
2024-04-08 03:00:00          9 10
2024-04-08 02:00:00         16 20
2024-04-08 01:00:00         30 38
2024-04-08 00:00:00         12 15
2024-04-07 23:00:00         11 13
2024-04-07 22:00:00         11 13
2024-04-07 21:00:00         12 11
2024-04-07 20:00:00          8 10
2024-04-07 19:00:00         18 21
2024-04-07 18:00:00         12 15
2024-04-07 17:00:00         16 21
2024-04-07 16:00:00         14 19
2024-04-07 15:00:00         18 24
2024-04-07 14:00:00         14 19
2024-04-07 13:00:00          7 9
2024-04-07 12:00:00         11 13
2024-04-07 11:00:00         16 20
2024-04-07 10:00:00         24 30
2024-04-07 09:00:00         10 14
2024-04-07 08:00:00          4 5
2024-04-07 07:00:00          2 2
2024-04-07 05:00:00          1 2
2024-04-07 03:00:00          9 9
2024-04-07 02:00:00         11 13
2024-04-07 01:00:00         25 32
2024-04-07 00:00:00         12 15
2024-04-06 23:00:00          8 10
2024-04-06 22:00:00          9 11
2024-04-06 21:00:00          9 7
2024-04-06 20:00:00          4 5
2024-04-06 19:00:00          5 5
2024-04-06 18:00:00          3 3
2024-04-06 17:00:00          4 5
2024-04-06 16:00:00          4 5
2024-04-06 15:00:00          4 5
2024-04-06 14:00:00          2 3
2024-04-06 13:00:00          3 4
2024-04-06 12:00:00          4 5
2024-04-06 11:00:00          6 4
2024-04-06 10:00:00          5 6
2024-04-06 09:00:00          3 3
2024-04-06 08:00:00          1 2
2024-04-06 07:00:00          1 2
2024-04-06 06:00:00          2 2
2024-04-06 05:00:00          1 2
2024-04-06 03:00:00         12 13
2024-04-06 02:00:00          9 12
2024-04-06 01:00:00         23 28
2024-04-06 00:00:00         16 19
2024-04-05 23:00:00          6 8
2024-04-05 22:00:00         11 13
2024-04-05 21:00:00          5 6
2024-04-05 20:00:00          4 5
2024-04-05 19:00:00          3 3
2024-04-05 18:00:00          4 5
2024-04-05 17:00:00          4 5
2024-04-05 16:00:00          3 3
2024-04-05 15:00:00          3 3
2024-04-05 14:00:00          2 3
2024-04-05 13:00:00          3 3
2024-04-05 12:00:00          4 5
2024-04-05 11:00:00          5 4
2024-04-05 10:00:00          4 5
2024-04-05 09:00:00          2 3
2024-04-05 08:00:00          2 2
2024-04-05 07:00:00          2 3
2024-04-05 06:00:00          2 2
2024-04-05 05:00:00          1 2
2024-04-05 03:00:00          9 11
2024-04-05 02:00:00         12 15
2024-04-05 01:00:00         32 39
2024-04-05 00:00:00         14 18
2024-04-04 23:00:00          8 10
2024-04-04 22:00:00         11 16
2024-04-04 21:00:00          9 6
2024-04-04 20:00:00          5 6
2024-04-04 19:00:00          5 5
2024-04-04 18:00:00          4 5
2024-04-04 17:00:00          2 3
2024-04-04 16:00:00          7 4
2024-04-04 15:00:00          3 3
2024-04-04 14:00:00          4 5
2024-04-04 13:00:00          2 3
2024-04-04 12:00:00          4 5
2024-04-04 11:00:00          4 4
2024-04-04 10:00:00          6 7
2024-04-04 09:00:00          2 3
2024-04-04 08:00:00          3 3
2024-04-04 07:00:00          8 10
2024-04-04 06:00:00          9 11
2024-04-04 05:00:00          2 2
2024-04-04 04:00:00          1 2
2024-04-04 03:00:00          9 9
2024-04-04 02:00:00         16 19
2024-04-04 01:00:00         34 42
2024-04-04 00:00:00         19 23
2024-04-03 23:00:00         11 13
2024-04-03 22:00:00         10 13
2024-04-03 21:00:00          8 10
2024-04-03 20:00:00         10 11
2024-04-03 19:00:00         10 11
2024-04-03 18:00:00          9 12
2024-04-03 17:00:00         14 17
2024-04-03 16:00:00         14 19
2024-04-03 15:00:00         39 49
2024-04-03 14:00:00         13 17
2024-04-03 13:00:00          7 9
2024-04-03 12:00:00         12 15
2024-04-03 11:00:00         18 22
2024-04-03 10:00:00         20 26
2024-04-03 09:00:00         12 15
2024-04-03 08:00:00          9 7
2024-04-03 07:00:00          1 2
2024-04-03 06:00:00          8 10
2024-04-03 05:00:00          5 6
2024-04-03 04:00:00          6 7
2024-04-03 03:00:00         13 15
2024-04-03 02:00:00         27 33
2024-04-03 01:00:00         36 44
2024-04-03 00:00:00         15 18
2024-04-02 23:00:00          9 11
2024-04-02 22:00:00         11 13
2024-04-02 21:00:00         10 13
2024-04-02 20:00:00         10 11
2024-04-02 19:00:00          9 9
2024-04-02 18:00:00         12 15
2024-04-02 17:00:00         16 21
2024-04-02 16:00:00         14 19
2024-04-02 15:00:00         18 23
2024-04-02 14:00:00         16 20
2024-04-02 13:00:00          8 10
2024-04-02 12:00:00         10 13
2024-04-02 11:00:00         18 21
2024-04-02 10:00:00         21 27
2024-04-02 09:00:00         14 17
2024-04-02 08:00:00          5 7
2024-04-02 07:00:00          3 3
2024-04-02 06:00:00          1 2
2024-04-02 05:00:00          1 2
2024-04-02 04:00:00          6 7
2024-04-02 03:00:00         23 27
2024-04-02 02:00:00         40 49
2024-04-02 01:00:00         48 59
2024-04-02 00:00:00         20 20

检查参数发现,存在_lm_lms_priority_dynamic 为非默认值
NAME DISPLAY_VALUE DEFAULT_VALUE ISDEFAULT DESCRIPTION
----------------------------------------- -------------- --------- ------------------------------------------------------------
_gc_policy_time 0              0              FALSE     how often to make object policy decisions in minutes
_gc_undo_affinity FALSE          TRUE           FALSE     if TRUE, enable undo affinity
_lm_drm_disable 7              0              FALSE     disable drm in different level
_lm_lms_priority_dynamic FALSE          TRUE           FALSE     enable lms priority modification

分析:
  • 参考Auto-Adjustment of LMS Process Priority in Oracle RAC with 11.2.0.3 and later ( Doc ID 1392248.1 )
    _lm_lms_priority_dynamic 是一个 undocument 的参数,它的作用是动态设定 LMS 优先级。
    通常我们不建议将该参数设定为非默认值。
进一步检查与lms 进程相关的参数:
NAME VALUE ISDEFA
----------------------------------- ------------------------------ ------
_high_priority_processes LMS*|LM1*|LM2*|LM3*|LM4*|LM5*| TRUE
      LM6*|LM7*|LM8*|LM9*|LM*|LCK0|G
                                  CR*|CKPT|DBRM|RMS0|LGWR|CR*|RMV*

_highest_priority_processes VKTM TRUE
_lm_lms_priority_check_frequency 60000                          TRUE
_lm_lms_priority_dynamic FALSE                          FALSE 
_lm_lms_rt_threshold TRUE
_os_sched_high_priority 1                              TRUE
_os_sched_highest_priority 1                              TRUE
hpux_sched_noage 178                            TRUE

检查lms 进程优先级,当前为noage 级别178,优先级正常:
$ hostname; ps -elf |head -1; ps -elf | egrep "lms|ora_vktm|ora_lgwr|ora_dbw" | grep -v ASM |grep -v grep | sort
  db01
  F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME COMD
3401 S oracle 23045 1 3 -2 20 e000000e75392040 183856 e000000e78294918 Mar 24 ? 40:19 ora_vktm_ db1
3401 S oracle 23110 1 35 178 20 e000000b7dd5bc40 183991 e000000b60c30160 Mar 24 ? 1074:14 ora_lms0_ db1
3401 S oracle 23112 1 45 178 20 e000000b7dd5b340 184035 e000000b60c30170 Mar 24 ? 1072:48 ora_lms1_ db1
3401 S oracle 23114 1 39 178 20 e000000b7dd5b940 183928 e000000b60c30278 Mar 24 ? 1078:21 ora_lms2_ db1
3401 S oracle 23168 1 61 178 20 e000000e701cf3c0 185222 e000000eb3209d40 Mar 24 ? 1130:13 ora_dbw0_ db1
3401 S oracle 23171 1 62 178 20 e000000ebcf6cc80 185232 e000000e3d1d3800 Mar 24 ? 1139:31 ora_dbw1_ db1
3401 S oracle 23173 1 61 178 20 e000000e326846c0 185232 e000000e838b5940 Mar 24 ? 1117:04 ora_dbw2_ db1
3401 S oracle 23175 1 76 178 20 e000000db423a340 185232 e000000e3e745340 Mar 24 ? 1112:50 ora_dbw3_ db1
3401 S oracle 23177 1 65 178 20 e000000df8ac20c0 185232 e000001754912240 Mar 24 ? 1128:39 ora_dbw4_ db1
3401 S oracle 23179 1 116 178 20 e000000eb1347100 185236 e000000e39ed2c00 Mar 24 ? 836:32 ora_lgwr_ db1

参考 lms process does not honor highest priority setting for SCHED_RR (Doc ID 2995503.1)
检查用户环境,配置正常:
# id oracle
uid=109(oracle) gid=311(oinstall) groups=310(dba),313(asmdba)

#
 id grid
uid=110(grid) gid=311(oinstall) groups=310(dba),312(asmadmin),313(asmdba),314(asmoper)
#
#
# cat etc/privgroup
dba MLOCK RTSCHED RTPRIO
#
//oracle 用户
$ cd $ORACLE_HOME/bin
$ ls -al oradism
-rwsr-x--- 1 root oinstall 82240 Mar 21 2017 oradism <<<< 粘滞位
$ id
uid=109(oracle) gid=311(oinstall) groups=310(dba),313(asmdba)
$

grid 用户
$ hostname; id; pwd; ls -al oradism
  db01
uid=110(grid) gid=311(oinstall) groups=310(dba),312(asmadmin),313(asmdba),314(asmoper)
/u01/app/12.2.0/grid/bin
-rwsrwxrwx 1 root oinstall 82240 Mar 22 2017 oradism
$

数据库其他参数检查,数据库采用SGA 自动管理,自实例启动以来,SGA 各pool 之间没有发生过resize 动作。
SQL> show parameter sga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
allow_group_access_to_sga boolean FALSE
lock_sga boolean FALSE
pre_page_sga boolean TRUE
sga_max_size big integer 120G
sga_min_size big integer 0
sga_target big integer 0
unified_audit_sga_queue_size integer 1048576

SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_limit big integer 40G
pga_aggregate_target big integer 20G

SQL> show parameter memory

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
hi_shared_memory_address integer 0
inmemory_adg_enabled boolean TRUE
inmemory_clause_default string
inmemory_expressions_usage string ENABLE
inmemory_force string DEFAULT
inmemory_max_populate_servers integer 0
inmemory_query string ENABLE
inmemory_size big integer 0
inmemory_trickle_repopulate_servers_ integer 1
percent
inmemory_virtual_columns string MANUAL
memory_max_target big integer 0
memory_target big integer 0
optimizer_inmemory_aware boolean TRUE
shared_memory_address integer 0

SQL> select instance_name,startup_time,status from gv$instance order by 1;

INSTANCE_NAME STARTUP_TIME STATUS
---------------- ------------------- ------------
 db1 2024-03-24 00:43:57 OPEN
 db2 2024-03-24 00:50:58 OPEN

SQL> select INST_ID,COMPONENT,OPER_TYPE,STATUS,START_TIME,END_TIME from gv$sga_resize_ops order by inst_id,start_time;

   INST_ID COMPONENT OPER_TYPE STATUS START_TIME END_TIME
---------- ------------------------------ ------------- --------- ------------------- -------------------
         1 Data Transfer Cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 In-Memory Area STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 DEFAULT 16K buffer cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 DEFAULT 8K buffer cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 DEFAULT 4K buffer cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 DEFAULT 2K buffer cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 RECYCLE buffer cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 KEEP buffer cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 DEFAULT 32K buffer cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 DEFAULT buffer cache INITIALIZING COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:41
         1 DEFAULT buffer cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 streams pool STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 java pool STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 large pool STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 shared pool STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 ASM Buffer Cache STATIC        COMPLETE 2024-03-24 00:45:08 2024-03-24 00:45:08
         1 DEFAULT buffer cache SHRINK COMPLETE 2024-03-24 00:48:48 2024-03-24 00:48:48
         2 ASM Buffer Cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 In-Memory Area STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 Data Transfer Cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 DEFAULT 32K buffer cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 DEFAULT 16K buffer cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 DEFAULT 8K buffer cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 DEFAULT 4K buffer cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 DEFAULT 2K buffer cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 RECYCLE buffer cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 KEEP buffer cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 DEFAULT buffer cache INITIALIZING COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:46
         2 DEFAULT buffer cache STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 streams pool STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 java pool STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 shared pool STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 large pool STATIC        COMPLETE 2024-03-24 00:52:15 2024-03-24 00:52:15
         2 DEFAULT buffer cache SHRINK COMPLETE 2024-03-24 00:53:25 2024-03-24 00:53:26

34 rows selected.

SQL> show parameter gcs_server_process

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
gcs_server_processes integer 3

根据文档Doc ID 1619155.1描述,在12.2.0.1 JUL 2018 database RU之前的版本,lms进程的数量应该为2倍的默认值。当前数据库版本为 12.2.0.1.180417

Set gcs_server_processes to the twice the default number of lms processes that are allocated.    (this recommendation is valid only for databases that are12.2 and lower)
       
The fix is also included in the 12.2.0.1 JUL 2018 database RU, so this does apply to the database that is running on 12.2.0.1 JUL 2018 or higher.
        The default number of lms processes depends on the number of CPUs/cores that the server has,  so please refer to the gcs_server_processes init.ora parameter section in the Oracle Database Reference Guide  for the default number of lms processes for your server.  Please make sure that the total number of lms processes   of all databases on the server is less than the total number of CPUs/cores on the server.  Please refer to the
Document 558185.1    It's a static parameter and rolling restart is supported.


故障分析总结

2.1 数据库问题时段,主要等待事件为 gc current grant 2-way 和 log file sync
参考 Doc ID 2365429.1,gc current grant 2-way 等待高通常可能的原因如下:
  • Network bandwidth saturation
  • Private interconnect configuration
  • Run queue length and CPU utilization
  • Check if lms processes are running in real time or elevated priority
2.2 检查上述两个等待事件的阻塞链
log file sync 被 wait for scn ack阻塞,该等待涉及 commit SCN 广播和LMS 进程同步SCN。gc current grant 2-way 没有捕获到阻塞源。
2.3 问题时段没有出现 gc block lost 等待
2.4 trace 显示问题发生前,出现了 log write broadcast  的warning。出现该waring 的时间点多伴随着性能抖动问题
2.5 lms 进程持续输出含GSIPC:BSEND 的信息,这些信息是lms队列超过一定时间而产生的,每一条信息对应一个队列,问题时段lms 进程队列比较严重。这可能会导致lms 繁忙而无法及时处理gc 消息
2.6 OSW 数据显示问题时段,主机CPU,内存,网络以及网卡相关指标均正常,用户部署的ping 脚本日志显示,问题时段私网ping 延迟正常
2.7 对比awr, 问题时段的数据库负载各项指标低于同期正常时段
  • 问题时段的gc相关的等待基本在4~6ms左右,log file sync在4ms左右;正常时段的gc等待事件在200us左右,log file sync在800us左右。 问题时段的I/O性能,db file sequential read /db file parallel write为500us,log file parallel write 为378us,说明I/O性能为正常的。
  • 问题时段的Wait for scn ack 平均等待时间达到了215ms。
  • 问题时段和正常时段的集群私网流量没有明显差异。
2.8 归档切换频率正常,生成量正常
2.9 检查发现非默认参数 _lm_lms_priority_dynamic,参考 Doc ID 1392248.1
_lm_lms_priority_dynamic 是一个 undocument 的参数,它的作用是动态设定 LMS 优先级。通常我们不建议将该参数设定为非默认值。
2.10 数据库采用SGA 自动管理,自实例启动以来,SGA 各pool 之间没有发生过resize 动作
2.11 根据文档Doc ID 1619155.1, 建议gcs_server_processes 设置为6, 参数支持滚动重启实例生效
综 上:

问题发生时的主要等待 gc current grant 2-way 可能与lms 进程繁忙有关,而log file sync 被wait for scn ack 阻塞,或许只是 lms 繁忙的一个副作用。即问题原因与lms 进程繁忙有关。


建议方案

数据库参数 _lm_lms_priority_dynamic 设定为默认值,gcs_server_processes 设置为6。

命令参考(支持滚动重启生效):

  • alter system reset "_lm_lms_priority_dynamic" scope=spfile sid='*';

  • alter system set gcs_server_processe=6 scope=spfile sid='*';


END


本文作者:李亚明(上海新炬中北团队)

本文来源:“IT那活儿”公众号

文章转载自IT那活儿,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论