暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle故障处理之ORA-00445: process "J0" didnot start after 120 second

数据与人 2020-12-15
3995

Oracle故障处理之ORA-00445: background process "J000" did not start after 120 seconds


问题背景:

客户反馈数据库宕机,协助排查原因


1> 查看alert日志:

    Mon Dec 30 08:56:01 2019
    WARNING: inbound connection timed out (ORA-3136)
    Mon Dec 30 08:56:04 2019
    Errors in file u01/app/oracle/diag/rdbms/ecology/ecology/trace/ecology_cjq0_25270.trc (incident=300282):
    ORA-00445: background process "J001" did not start after 30 seconds
    Incident details in: u01/app/oracle/diag/rdbms/ecology/ecology/incident/incdir_300282/ecology_cjq0_25270_i300282.trc
    Mon Dec 30 08:56:05 2019




    查看trc
    /u01/app/oracle/diag/rdbms/ecology/ecology/trace/ecology_cjq0_25270.trc


    *** 2019-12-31 08:49:26.444
    Process diagnostic dump for J000, OS id=23742
    -------------------------------------------------------------------------------
    os thread scheduling delay history: (sampling every 1.000000 secs)
    0.000000 secs at [ 08:49:21 ]
    NOTE: scheduling delay has not been sampled for 5.062184 secs 0.000000 secs from [ 08:49:21 - 08:49:26 ], 5 sec avg
    0.000000 secs from [ 08:49:21 - 08:49:26 ], 1 min avg

    *** 2019-12-31 08:49:28.330
    0.000000 secs from [ 08:45:08 - 08:49:28 ], 5 min avg

    *** 2019-12-31 08:49:43.789
    loadavg : 153.96 132.74 76.11
    Memory (Avail Total) = 289.81M 64411.24M
    Swap (Avail / Total) = 35820.70M / 64767.98M
    skgpgcmdout: read() for cmd /bin/ps -elf | /bin/egrep 'PID | 23742' | /bin/grep -v grep timed out after 13.740 seconds

    *** 2019-12-31 08:49:56.451
    Stack:
    skgpgcmdout: read() for cmd /usr/bin/gdb --batch -quiet -x /tmp/stackTcHuSK /proc/23742/exe 23742 < /dev/null 2>&1 timed out after 12.660 seconds

    -------------------------------------------------------------------------------
    Process diagnostic dump actual duration=30.000000 sec
    (max dump time=30.000000 sec)

    *** 2019-12-31 08:49:56.451
    Waited for process J000 to initialize for 120 seconds

    *** 2019-12-31 08:49:56.451
    Process diagnostic dump for J000, OS id=23742
    -------------------------------------------------------------------------------
    os thread scheduling delay history: (sampling every 1.000000 secs)
    0.000000 secs at [ 08:49:21 ]
    NOTE: scheduling delay has not been sampled for 35.069379 secs 0.000000 secs from [ 08:49:21 - 08:49:56 ], 5 sec avg
    0.000000 secs from [ 08:49:21 - 08:49:56 ], 1 min avg
    0.000000 secs from [ 08:45:08 - 08:49:56 ], 5 min avg

    *** 2019-12-31 08:50:12.312
    loadavg : 154.88 134.93 78.63
    Memory (Avail / Total) = 288.15M / 64411.24M
    Swap (Avail / Total) = 35665.90M / 64767.98M
    skgpgcmdout: read() for cmd /bin/ps -elf | /bin/egrep 'PID | 23742' | /bin/grep -v grep timed out after 15.000 seconds

    *** 2019-12-31 08:50:26.454
    Stack:
    skgpgcmdout: read() for cmd /usr/bin/gdb --batch -quiet -x /tmp/stackd1W3Ol /proc/23742/exe 23742 < /dev/null 2>&1 timed out after 14.140 seconds

    -------------------------------------------------------------------------------
    Process diagnostic dump actual duration=30.000000 sec
    (max dump time=30.000000 sec)

    *** 2019-12-31 08:50:26.454

    *** 2019-12-31 08:52:17.853
    Killing process (ospid 23742): (reason=KSOREQ_WAIT_CANCELLED error=0)
    ... and the process is still alive after kill!

    *** 2019-12-31 08:53:07.555
    Incident 713 created, dump file: /u01/app/database/diag/rdbms/feilioa/feilioa_1/incident/incdir_713/feilioa_1_cjq0_1370_i713.trc
    ORA-00445: background process "J000" did not start after 120 seconds


    【ID 1379200.1】中对这个错误的描述:

    What does this message mean ?

    The message indicates that we failed to spawn a new process at the Operating System level to serve the request. There are various causes for this issue.

    This typically occurs when there is a shortage or misconfiguration in Operating System Resources, and thereby the problem should be investigated from an OS perspective. However there are a few causes related to the Oracle Database as well.




    往期回顾


    Oracle故障处理之ORA-00371: not enough shared pool memory
    Oracle故障处理之错误代码:Warning: VKTM detected a time drift.
    Oracle故障处理之RAC环境下SPFILE文件修改


    客官长按关注

    吾辈自强不息

    文章转载自数据与人,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

    评论