登录节点2 查看 alert日志 发现,节点2在08:58:37报错 ora-00600,在08:58:45关闭实例,08:59:00重启实例
2019-11-05T08:58:37.077926+08:00
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_lms0_29557854.trc (incident=4834413) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kjctr_pbmsg:badbmsg2], [0x123D69B90], [0x123D6A090], [14388], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb2/incident/incdir_4834413/cdb2_lms0_29557854_i4834413.trc
检查两个服务器 的NMON数据库发现两个节点在 8点56分至9点02分心跳网络流量异常高
并且在节点1 的alert日志中发现 大量IPC Send timeout 报警。
2019-11-05T08:58:41.826686+08:00
HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.
HISDB(3):Using deprecated SQLNET.ALLOWED_LOGON_VERSION parameter.
2019-11-05T08:58:42.556739+08:00
Reconfiguration started (old inc 20, new inc 22)
2019-11-05T08:58:42.559844+08:00
IPC Send timeout to 2.3 inc 20 for msg type 151 from opid 25
List of instances (total 1) :
2019-11-05T08:58:42.560764+08:00
IPC Send timeout to 2.4 inc 20 for msg type 151 from opid 26
2019-11-05T08:58:42.562915+08:00
IPC Send timeout to 2.3 inc 20 for msg type 151 from opid 25
2019-11-05T08:58:42.563060+08:00
IPC Send timeout to 2.4 inc 20 for msg type 151 from opid 26
2019-11-05T08:58:42.563381+08:00
Dead instances (total 1) :
2019-11-05T08:58:42.565113+08:00
IPC Send timeout to 2.3 inc 20 for msg type 151 from opid 25
2019-11-05T08:58:42.566124+08:00
IPC Send timeout to 2.4 inc 20 for msg type 151 from opid 26
基本看出是心跳网络有问题,但是没有找到到底是什么触发的心跳流量突然暴增。
不知道各位有没有遇到类似的问题。麻烦看看给我点建议。