暂无图片
Oracle RAC 10.2.0.4 经常hung死
我来答
分享
MR.CTHREE
2019-11-02
Oracle RAC 10.2.0.4 经常hung死

硬件环境运行了8年之久,机器比较老。

最近一段时间,从10月份开始,老是出现数据库hung死的现象,以前也有,不过几个月一次。hung死之后,一般是通过重启操作系统恢复,基本上都是通过同时重启两个节点操作系统,如果重启其中一个节点操作系统,则该节点数据库可能无法启动,ocssd.log会报takeover aborted due to ALIVE node on Disk错误。

重启数据库的时间附近,经常会报ORA-29702的错误,我不知道这是导致数据库hung的原因,还是我重启操作系统造成的。

11月1日13:50分左右,数据库又hung死了,重启两个节点操作系统恢复。我看日志里的是在13:56开始报ORA-29702的错误,但是在这之前,数据库已经hung住了,所以我怀疑ORA-29702可能是我重启操作系统造成的,但是我不敢确定。我把日志都上传了,请各位帮忙分析分析,找到确切的故障原因。不胜感激。

数据库hung死的原因,怀疑是由于oracle bug导致的,但是没有确切的日志反映,所以计划希望通过升级至Oracle RAC 10.2.0.5来尝试解决该问题。

我来答
添加附件
收藏
分享
问题补充
9条回答
默认
最新
章芋文

drm禁用了没?

暂无图片 评论
暂无图片 有用 0
打赏 0
MR.CTHREE
上传附件:20191101.rar
暂无图片 评论
暂无图片 有用 0
打赏 0
MR.CTHREE

没有禁

暂无图片 评论
暂无图片 有用 0
打赏 0
章芋文
  • Fri Nov 1 13:55:06 2019 lsyy1
    System State dumped lsyy1_ora_17433530.trc
  • Fri Nov 1 13:56:52 2019 asm1
    Trace dumping is performing id=[cdmp_20191101135652]
  • Fri Nov 1 13:56:51 2019 asm2
    GMON: terminating instance due to error 29702
  • 2019-11-01 13:57:06.598 grid1
    [cssd(2818714)]CRS-1612:node p550b (0) at 50% heartbeat fatal, eviction in 0.000 seconds
  • 2019-11-01 13:56:51.540 CSSD
    WARNING: clssnmeventhndlr: Receive failure with node 2 (p550b), state 3, con(111e0e230), probe(0), rc=11

alert_grid1

[crsd(2949782)]CRS-1205:Auto-start failed for the CRS resource . Details in p550a. 2019-10-24 04:00:54.152 [crsd(2949782)]CRS-1205:Auto-start failed for the CRS resource . Details in p550a. [cssd(2818714)]CRS-1601:CSSD Reconfiguration complete. Active nodes are p550a p550b . 2019-11-01 13:57:06.598 [cssd(2818714)]CRS-1612:node p550b (0) at 50% heartbeat fatal, eviction in 0.000 seconds 2019-11-01 13:57:07.599 [cssd(2818714)]CRS-1612:node p550b (0) at 50% heartbeat fatal, eviction in 0.000 seconds 2019-11-01 13:57:14.612 [cssd(2818714)]CRS-1611:node p550b (0) at 75% heartbeat fatal, eviction in 0.000 seconds 2019-11-01 13:57:18.615 [cssd(2818714)]CRS-1610:node p550b (0) at 90% heartbeat fatal, eviction in 0.000 seconds 2019-11-01 13:57:19.616 [cssd(2818714)]CRS-1610:node p550b (0) at 90% heartbeat fatal, eviction in 0.000 seconds 2019-11-01 13:57:20.616 [cssd(2818714)]CRS-1610:node p550b (0) at 90% heartbeat fatal, eviction in 0.000 seconds 2019-11-01 13:57:21.126 [cssd(2818714)]CRS-1607:CSSD evicting node p550b. Details in /oracle/product/10.2.0/crs/log/p550a/cssd/ocssd.log. [cssd(2818714)]CRS-1601:CSSD Reconfiguration complete. Active nodes are p550a . 2019-11-01 13:57:33.903 [crsd(2949782)]CRS-1204:Recovering CRS resources for node p550b. 2019-11-01 13:57:47.118 [cssd(25952658)]CRS-1605:CSSD voting file is online: /dev/rhdiskpower4. Details in /oracle/product/10.2.0/crs/log/p550a/cssd/ocssd.log. 2019-11-01 14:04:01.636
复制

alert_grid2

2019-10-24 04:01:10.403 [crsd(2491016)]CRS-1201:CRSD started on node p550b. 2019-11-01 14:03:13.299 [cssd(1376764)]CRS-1605:CSSD voting file is online: /dev/rhdiskpower4. Details in /oracle/product/10.2.0/crs/log/p550b/cssd/ocssd.log. [cssd(1376764)]CRS-1601:CSSD Reconfiguration complete. Active nodes are p550b . 2019-11-01 14:04:00.184 [evmd(2425748)]CRS-1401:EVMD started on node p550b. 2019-11-01 14:04:00.241
复制

cssd_1

[ CSSD]2019-10-24 04:01:06.497 [3858] >TRACE: clssgmReconfigThread: completed for reconfig(2), with status(1) [ CSSD]2019-11-01 13:56:51.540 [1801] >WARNING: clssnmeventhndlr: Receive failure with node 2 (p550b), state 3, con(111e0e230), probe(0), rc=11 [ CSSD]2019-11-01 13:56:51.540 [1801] >TRACE: clssnmDiscHelper: p550b, node(2) connection failed, con (111e0e230), probe(0) [ CSSD]2019-11-01 13:56:51.540 [2829] >TRACE: clssgmPeerDeactivate: node 2 (p550b), death 0, state 0x1 connstate 0xf [ CSSD]2019-11-01 13:57:06.598 [3086] >WARNING: clssnmPollingThread: node p550b (2) at 50 2.481040e-265artbeat fatal, eviction in 14.523 seconds seedhbimpd 0 [ CSSD]2019-11-01 13:57:06.598 [3086] >TRACE: clssnmPollingThread: node p550b (2) is impending reconfig, flag 1, misstime 15477 [ CSSD]2019-11-01 13:57:06.598 [3086] >TRACE: clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1) [ CSSD]2019-11-01 13:57:07.599 [3086] >WARNING: clssnmPollingThread: node p550b (2) at 50 2.481040e-265artbeat fatal, eviction in 13.522 seconds seedhbimpd 1 [ CSSD]2019-11-01 13:57:14.612 [3086] >WARNING: clssnmPollingThread: node p550b (2) at 75 2.481040e-265artbeat fatal, eviction in 6.508 seconds seedhbimpd 1 [ CSSD]2019-11-01 13:57:18.615 [3086] >WARNING: clssnmPollingThread: node p550b (2) at 90 2.481040e-265artbeat fatal, eviction in 2.505 seconds seedhbimpd 1 [ CSSD]2019-11-01 13:57:19.616 [3086] >WARNING: clssnmPollingThread: node p550b (2) at 90 2.481040e-265artbeat fatal, eviction in 1.505 seconds seedhbimpd 1 [ CSSD]2019-11-01 13:57:20.616 [3086] >WARNING: clssnmPollingThread: node p550b (2) at 90 2.481040e-265artbeat fatal, eviction in 0.504 seconds seedhbimpd 1 [ CSSD]2019-11-01 13:57:21.122 [3086] >TRACE: clssnmPollingThread: Eviction started for node p550b (2), flags 0x0001, state 3, wt4c 0 seedhbimpd 1 [ CSSD]2019-11-01 13:57:21.122 [3600] >TRACE: clssnmDoSyncUpdate: Initiating sync 3 [ CSSD]2019-11-01 13:57:21.122 [3600] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (27000)ms [ CSSD]2019-11-01 13:57:21.122 [3600] >TRACE: clssnmSetupAckWait: Ack message type (11) [ CSSD]2019-11-01 13:57:21.122 [3600] >TRACE: clssnmSetupAckWait: node(1) is ALIVE [ CSSD]2019-11-01 13:57:21.122 [3600] >TRACE: clssnmSendSync: syncSeqNo(3) [ CSSD]2019-11-01 13:57:21.123 [3600] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1) [ CSSD]2019-11-01 13:57:21.123 [1801] >TRACE: clssnmHandleSync: diskTimeout set to (27000)ms [ CSSD]2019-11-01 13:57:21.123 [1801] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[p550a] seq[9] sync[3] [ CSSD]2019-11-01 13:57:21.123 [1] >USER: NMEVENT_SUSPEND [00][00][00][06] [ CSSD]2019-11-01 13:57:21.124 [3600] >TRACE: clssnmWaitForAcks: done, msg type(11) [ CSSD]2019-11-01 13:57:21.124 [3600] >TRACE: clssnmDoSyncUpdate: Terminating node 2, p550b, misstime(30001) state(5), seedhbimpd(1) [ CSSD]2019-11-01 13:57:21.124 [3600] >TRACE: clssnmSetupAckWait: Ack message type (13) [ CSSD]2019-11-01 13:57:21.124 [3600] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE [ CSSD]2019-11-01 13:57:21.124 [3600] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1) [ CSSD]2019-11-01 13:57:21.124 [1801] >TRACE: clssnmSendVoteInfo: node(1) syncSeqNo(3) [ CSSD]2019-11-01 13:57:21.125 [3600] >TRACE: clssnmWaitForAcks: done, msg type(13) [ CSSD]2019-11-01 13:57:21.125 [3600] >TRACE: clssnmCheckDskInfo: Checking disk info... [ CSSD]2019-11-01 13:57:21.126 [3600] >TRACE: clssnmEvict: Start [ CSSD]2019-11-01 13:57:21.126 [3600] >TRACE: clssnmEvict: Evicting node 2, p550b, birth 2, death 3, impendingrcfg 1, stateflags 0x1 [ CSSD]2019-11-01 13:57:21.126 [3600] >TRACE: clssnmWaitOnEvictions: Start [ CSSD]2019-11-01 13:57:21.126 [3600] >TRACE: clssnmWaitOnEvictions: node 2, p550b, is not dead, seedhbimpd 1 [ CSSD]2019-11-01 13:57:21.126 [3600] >TRACE: clssnmCheckKillStatus: Node 2, p550b, down, LATS(727368467),timeout(14527) [ CSSD]2019-11-01 13:57:21.126 [3600] >TRACE: clssnmSetupAckWait: Ack message type (15) [ CSSD]2019-11-01 13:57:21.126 [3600] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE [ CSSD]2019-11-01 13:57:21.127 [3600] >TRACE: clssnmSendUpdate: syncSeqNo(3) [ CSSD]2019-11-01 13:57:21.128 [3600] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1) [ CSSD]2019-11-01 13:57:21.128 [1801] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new) [ CSSD]2019-11-01 13:57:21.128 [1801] >TRACE: clssnmUpdateNodeState: node 1, state (3/3) unique (1571860840/1571860840) prevConuni(0) birth (1/1) (old/new) [ CSSD]2019-11-01 13:57:21.128 [1801] >TRACE: clssnmUpdateNodeState: node 2, state (5/0) unique (1571860861/1571860861) prevConuni(1571860861) birth (2/2) (old/new) [ CSSD]2019-11-01 13:57:21.129 [1801] >TRACE: clssnmDeactivateNode: node 2 (p550b) left cluster [ CSSD]2019-11-01 13:57:21.129 [1801] >USER: clssnmHandleUpdate: SYNC(3) from node(1) completed [ CSSD]2019-11-01 13:57:21.129 [1801] >USER: clssnmHandleUpdate: NODE 1 (p550a) IS ACTIVE MEMBER OF CLUSTER [ CSSD]2019-11-01 13:57:21.129 [1801] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms [ CSSD]2019-11-01 13:57:21.130 [3600] >TRACE: clssnmWaitForAcks: done, msg type(15) [ CSSD]2019-11-01 13:57:21.130 [3600] >TRACE: clssnmDoSyncUpdate: Sync 3 complete! [ CSSD]2019-11-01 13:57:21.135 [3859] >TRACE: clssgmReconfigThread: started for reconfig (3) [ CSSD]2019-11-01 13:57:21.135 [3859] >USER: NMEVENT_RECONFIG [00][00][00][02] [ CSSD]2019-11-01 13:57:21.135 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2 [ CSSD]2019-11-01 13:57:21.136 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(crs_version) birth(2/0) [ CSSD]2019-11-01 13:57:21.136 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_lsyy type 2 [ CSSD]2019-11-01 13:57:21.136 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_lsyy type 3 [ CSSD]2019-11-01 13:57:21.136 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_lsyy type 2 [ CSSD]2019-11-01 13:57:21.136 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_lsyy) birth(2/0) [ CSSD]2019-11-01 13:57:21.136 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_lsyy type 3 [ CSSD]2019-11-01 13:57:21.137 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_lsyy) birth(2/0) [ CSSD]2019-11-01 13:57:21.137 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock DB+ASM type 2 [ CSSD]2019-11-01 13:57:21.137 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DB+ASM) birth(2/0) [ CSSD]2019-11-01 13:57:21.137 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock DG+ASM type 2 [ CSSD]2019-11-01 13:57:21.137 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DG+ASM) birth(2/0) [ CSSD]2019-11-01 13:57:21.138 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock DBLSYY type 2 [ CSSD]2019-11-01 13:57:21.138 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DBLSYY) birth(2/0) [ CSSD]2019-11-01 13:57:21.138 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock DGLSYY type 2 [ CSSD]2019-11-01 13:57:21.138 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DGLSYY) birth(2/0) [ CSSD]2019-11-01 13:57:21.139 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock DG_DATA type 2 [ CSSD]2019-11-01 13:57:21.139 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(DG_DATA) birth(2/0) [ CSSD]2019-11-01 13:57:21.139 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(DG_DATA) birth(2/0) [ CSSD]2019-11-01 13:57:21.140 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock OSM_ALL type 2 [ CSSD]2019-11-01 13:57:21.140 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(OSM_ALL) birth(2/0) [ CSSD]2019-11-01 13:57:21.140 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2 [ CSSD]2019-11-01 13:57:21.140 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(DAALL_DB) birth(2/0) [ CSSD]2019-11-01 13:57:21.140 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(DAALL_DB) birth(2/0) [ CSSD]2019-11-01 13:57:21.140 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2 [ CSSD]2019-11-01 13:57:21.140 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(CRSDMAIN) birth(2/0) [ CSSD]2019-11-01 13:57:21.141 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2 [ CSSD]2019-11-01 13:57:21.141 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(EVMDMAIN) birth(2/0) [ CSSD]2019-11-01 13:57:21.141 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_p550a type 3 [ CSSD]2019-11-01 13:57:21.141 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_p550b type 3 [ CSSD]2019-11-01 13:57:21.142 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_MEMBER_p550b) birth(2/0) [ CSSD]2019-11-01 13:57:21.142 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock IGLSYYALL type 2 [ CSSD]2019-11-01 13:57:21.142 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(IGLSYYALL) birth(2/0) [ CSSD]2019-11-01 13:57:21.142 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2 [ CSSD]2019-11-01 13:57:21.142 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(ocr_crs) birth(2/0) [ CSSD]2019-11-01 13:57:21.143 [3859] >TRACE: clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2 [ CSSD]2019-11-01 13:57:21.143 [3859] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(#CSS_CLSSOMON) birth(2/0) [ CSSD]2019-11-01 13:57:21.144 [3859] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 3 [ CSSD]2019-11-01 13:57:21.144 [2829] >TRACE: clssgmPeerListener: connects done (1/1) [ CSSD]2019-11-01 13:57:21.145 [3859] >TRACE: clssgmEstablishMasterNode: MASTER for 3 is node(1) birth(1) [ CSSD]2019-11-01 13:57:21.145 [3859] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status [ CSSD]2019-11-01 13:57:21.146 [3859] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete [ CSSD]CLSS-3000: reconfiguration successful, incarnation 3 with 1 nodes [ CSSD]CLSS-3001: local node number 1, master node number 1
复制

alert_db2

Fri Nov 1 13:47:11 2019 Thread 2 advanced to log sequence 83093 (LGWR switch) Current log# 11 seq# 83093 mem# 0: +DATA/lsyy/onlinelog/group211 Fri Nov 1 13:56:51 2019 Error: KGXGN aborts the instance (6) Fri Nov 1 13:56:51 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy2_lmon_2490836.trc: ORA-29702: error occurred in Cluster Group Service operation LMON: terminating instance due to error 29702 Fri Nov 1 13:56:51 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy2_lms0_2818464.trc: ORA-29702: error occurred in Cluster Group Service operation Fri Nov 1 13:56:51 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy2_lms1_3539096.trc: ORA-29702: error occurred in Cluster Group Service operation Fri Nov 1 13:56:51 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy2_lms2_3276922.trc: ORA-29702: error occurred in Cluster Group Service operation Fri Nov 1 13:56:51 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy2_lms3_3080900.trc: ORA-29702: error occurred in Cluster Group Service operation Fri Nov 1 13:56:51 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy2_lmd0_2818208.trc: ORA-29702: error occurred in Cluster Group Service operation Fri Nov 1 13:56:51 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy2_pmon_3211936.trc: ORA-29702: error occurred in Cluster Group Service operation Fri Nov 1 13:56:52 2019 System state dump is made for local instance System State dumped to trace file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy2_diag_1114194.trc Fri Nov 1 13:56:57 2019 Instance terminated by LMON, pid = 2490836 Fri Nov 1 14:04:39 2019
复制

alert_db1

Fri Nov 1 13:55:06 2019 System State dumped to trace file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_17433530.trc Fri Nov 1 13:56:52 2019 Trace dumping is performing id=[cdmp_20191101135651] Fri Nov 1 13:57:06 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_14614914.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:06 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_6620096.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:09 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_7274598.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:10 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_11403798.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:11 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_22937984.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:11 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_11076094.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:12 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_16384294.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:12 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_8258520.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:14 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_6554444.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:18 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_10551480.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:18 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_12910728.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:19 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_23003910.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:19 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_13763432.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:20 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_16974330.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:23 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_6620098.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:24 2019 Reconfiguration started (old inc 4, new inc 6) List of nodes: 0 Global Resource Directory frozen * dead instance detected - domain 0 invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Fri Nov 1 13:57:24 2019 LMS 2: 5 GCS shadows cancelled, 2 closed Fri Nov 1 13:57:24 2019 LMS 3: 8 GCS shadows cancelled, 4 closed Fri Nov 1 13:57:24 2019 LMS 0: 4 GCS shadows cancelled, 0 closed Fri Nov 1 13:57:24 2019 LMS 1: 2 GCS shadows cancelled, 1 closed Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Fri Nov 1 13:57:26 2019 Instance recovery: looking for dead threads Fri Nov 1 13:57:26 2019 Beginning instance recovery of 1 threads Fri Nov 1 13:57:26 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_12058732.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:27 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_14614616.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:27 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_21627848.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:29 2019 LMS 0: 710688 GCS shadows traversed, 0 replayed Fri Nov 1 13:57:29 2019 LMS 3: 722670 GCS shadows traversed, 0 replayed Fri Nov 1 13:57:29 2019 LMS 2: 723550 GCS shadows traversed, 0 replayed Fri Nov 1 13:57:29 2019 LMS 1: 723294 GCS shadows traversed, 0 replayed Fri Nov 1 13:57:29 2019 Submitted all GCS remote-cache requests Fix write in gcs resources Reconfiguration complete Fri Nov 1 13:57:32 2019 parallel recovery started with 15 processes Fri Nov 1 13:57:33 2019 Started redo scan Fri Nov 1 13:57:33 2019 Completed redo scan 12508 redo blocks read, 1428 data blocks need recovery Fri Nov 1 13:57:33 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/udump/lsyy1_ora_23462652.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-03135: connection lost contact ORA-06512: at line 3 Fri Nov 1 13:57:33 2019 Started redo application at Thread 2: logseq 83093, block 16146 Fri Nov 1 13:57:33 2019 Recovery of Online Redo Log: Thread 2 Group 11 Seq 83093 Reading mem 0 Mem# 0: +DATA/lsyy/onlinelog/group211 Fri Nov 1 13:57:33 2019 Completed redo application Fri Nov 1 13:57:33 2019 Completed instance recovery at Thread 2: logseq 83093, block 28654, scn 22641184229 1037 data blocks read, 1536 data blocks written, 12508 redo blocks read Fri Nov 1 13:57:34 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy1_asmb_3146160.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Fri Nov 1 13:57:34 2019 ASMB: terminating instance due to error 15064 Fri Nov 1 13:57:34 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy1_lms1_3081164.trc: ORA-15064: communication failure with ASM instance Fri Nov 1 13:57:34 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy1_lmon_2621854.trc: ORA-15064: communication failure with ASM instance Fri Nov 1 13:57:34 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy1_lms0_2883754.trc: ORA-15064: communication failure with ASM instance Fri Nov 1 13:57:34 2019 Errors in file /oracle/product/10.2.0/db_1/admin/lsyy/bdump/lsyy1_lms2_3539612.trc: ORA-15064: communication failure with ASM instance Instance terminated by ASMB, pid = 3146160 Fri Nov 1 14:05:55 2019
复制

alert_asm2

NOTE: ASMB process exiting due to lack of ASM file activity Fri Nov 1 13:56:51 2019 Errors in file /oracle/product/10.2.0/db_1/admin/+ASM/bdump/+asm2_gmon_2819014.trc: ORA-29702: error occurred in Cluster Group Service operation ORA-29702: error occurred in Cluster Group Service operation Fri Nov 1 13:56:51 2019 GMON: terminating instance due to error 29702 Fri Nov 1 13:56:52 2019 System state dump is made for local instance System State dumped to trace file /oracle/product/10.2.0/db_1/admin/+ASM/bdump/+asm2_diag_2949468.trc Fri Nov 1 13:56:52 2019 Trace dumping is performing id=[cdmp_20191101135652] Fri Nov 1 13:56:57 2019 Instance terminated by GMON, pid = 2819014 Fri Nov 1 14:04:10 2019 Starting ORACLE instance (normal)
复制

alert_asm1

Starting background process ASMB ASMB started with pid=21, OS id=6030044 Fri Nov 1 13:56:52 2019 Trace dumping is performing id=[cdmp_20191101135652] Fri Nov 1 13:57:21 2019 Reconfiguration started (old inc 2, new inc 3) List of nodes: 0 Global Resource Directory frozen * dead instance detected - domain 1 invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Fri Nov 1 13:57:21 2019 LMS 0: 0 GCS shadows cancelled, 0 closed Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Fri Nov 1 13:57:21 2019 NOTE: SMON starting instance recovery for group 1 (mounted) NOTE: F1X0 found on disk 0 fcn 0.0 NOTE: starting recovery of thread=2 ckpt=580.4028 group=1 Fri Nov 1 13:57:21 2019 LMS 0: 5486 GCS shadows traversed, 0 replayed Fri Nov 1 13:57:21 2019 Submitted all GCS remote-cache requests Fix write in gcs resources Reconfiguration complete Fri Nov 1 13:57:21 2019 NOTE: advancing ckpt for thread=2 ckpt=580.4028 NOTE: smon did instance recovery for domain 1 Fri Nov 1 13:57:21 2019 NOTE: recovering COD for group 1/0x7fc555b1 (DATA) SUCCESS: completed COD recovery for group 1/0x7fc555b1 (DATA) Fri Nov 1 13:57:34 2019 Error: KGXGN aborts the instance (6) Fri Nov 1 13:57:34 2019 Errors in file /oracle/product/10.2.0/db_1/admin/+ASM/bdump/+asm1_lmon_2622036.trc: ORA-29702: error occurred in Cluster Group Service operation LMON: terminating instance due to error 29702 Fri Nov 1 13:57:34 2019 System state dump is made for local instance System State dumped to trace file /oracle/product/10.2.0/db_1/admin/+ASM/bdump/+asm1_diag_2687330.trc Fri Nov 1 13:57:34 2019 Trace dumping is performing id=[cdmp_20191101135734] Fri Nov 1 13:57:39 2019 Instance terminated by LMON, pid = 2622036 Fri Nov 1 14:05:36 2019 Starting ORACLE instance (normal)
复制

awk_dump

[root@pr7 ~]# awk -f ass109.awk lsyy1_ora_17433530.trc Starting Systemstate 1 .............................................................................. ............................................................................... ............................................................................... ............................................................................... ............................................................................... ............................................................................... ............................................................................... ............................................................................... ............................................................................... ............................................................................... .................................................... Ass.Awk Version 1.0.9 - Processing lsyy1_ora_17433530.trc System State 1 ~~~~~~~~~~~~~~~~ 1: 2: waiting for 'pmon timer' wait 3: waiting for 'DIAG idle wait' wait 4: waiting for 'rdbms ipc message' wait 5: waiting for 'rdbms ipc message' wait 6: waiting for 'ges remote message' wait 7: waiting for 'gcs remote message' wait 8: waiting for 'gcs remote message' wait 9: waiting for 'gcs remote message' wait 10: waiting for 'gcs remote message' wait 11: waiting for 'rdbms ipc message' wait 12: waiting for 'rdbms ipc message' wait 13: waiting for 'rdbms ipc message' wait 14: waiting for 'rdbms ipc message' wait 15: waiting for 'rdbms ipc message' wait 16: waiting for 'smon timer' wait 17: waiting for 'rdbms ipc message' wait 18: waiting for 'rdbms ipc message' wait 19: waiting for 'rdbms ipc message' wait 20: waiting for 'rdbms ipc message' wait 21: 22: 23: waiting for 'rdbms ipc message' wait 24: waiting for 'jobq slave wait' wait 25: waiting for 'ASM background timer' wait 26: waiting for 'rdbms ipc message' wait 27: waiting for 'SQL*Net message from client' wait 28: waiting for 'SQL*Net message from client' wait 29: waiting for 'SQL*Net message from client' wait 30: waiting for 'SQL*Net message from client' wait 31: waiting for 'SQL*Net message from client' wait Cmd: Select 32: waiting for 'SQL*Net message from client' wait 33: waiting for 'SQL*Net message from client' wait 34: waiting for 'SQL*Net message from client' wait Cmd: Select 35: waiting for 'SQL*Net message from client' wait Cmd: Select 36: waiting for 'SQL*Net message from client' wait Cmd: Select 37: waiting for 'SQL*Net message from client' wait Cmd: Select 38: waiting for 'SQL*Net message from client' wait Cmd: Select 39: waiting for 'SQL*Net message from client' wait Cmd: Select 40: waiting for 'SQL*Net message from client' wait 41: last wait for 'ksdxexeotherwait' 42: waiting for 'SQL*Net message from client' wait Cmd: Select 43: waiting for 'rdbms ipc message' wait 44: waiting for 'rdbms ipc message' wait 45: waiting for 'SQL*Net message from client' wait 46: waiting for 'Streams AQ: qmn coordinator idle wait' wait 47: waiting for 'jobq slave wait' wait 48: waiting for 'jobq slave wait' wait 49: waiting for 'jobq slave wait' wait 50: waiting for 'jobq slave wait' wait 51: for 'Streams AQ: waiting for messages in the queue' wait 52: for 'Streams AQ: waiting for time management or cleanup tasks' wait 53: waiting for 'SQL*Net message from client' wait 54: waiting for 'Streams AQ: delete acknowledged messages' wait 55: waiting for 'SQL*Net message from client' wait Cmd: Select 56: waiting for 'SQL*Net message from client' wait Cmd: PL/SQL Execute 57: waiting for 'SQL*Net message from client' wait Cmd: Select 省略 NO BLOCKING PROCESSES FOUND 3075273 Lines Processed.
复制
暂无图片 评论
暂无图片 有用 0
打赏 0
章芋文

操作系统日志有报错吗?服务器有没有监控,比如CPU和内存。另外,有没有重启前的AWR报告可以提供下?

暂无图片 评论
暂无图片 有用 0
打赏 0
gelyon

关闭DRM解决

暂无图片 评论
暂无图片 有用 0
打赏 0
MR.CTHREE

1、看到了章芋文的提醒,11月1日晚调整了以下两个DRM参数:
_gc_affinity_limit=250
_gc_affinity_minimum=10485760
正在观察,看看后面还会不会出现hung住的故障。没有找到确实的证据证明是DRM引起的,所以不敢完全断定是DRM造成的。
2、awk -f ass109.awk lsyy2_diag_1114194.trc 分析这个文件,一直卡住,不知道为什么,该文件中等待最多的是gc current request

3、我试图去查看关于DRM的报错,查看了lmd进程文件,虽然里面有begin drm。。。end drm的信息,但是发生的时间段都不在故障时间段,不知道是不是因为系统hung住后无法写进程日志造成的。

4、errpt没有看到什么明显的故障。操作系统层面有oswbb的监控,日志文件在20191101.rar附件中。
p550a_iostat_19.10.30.1300.dat
p550a_prvtnet_19.10.30.1300.dat
p550a_top_19.10.30.1300.dat
p550b_iostat_19.11.01.1300.dat
p550b_prvtnet_19.11.01.1300.dat
从上述日志中,也没看出来什么问题。不过由于硬件资源比较旧,在系统资源层面,CPU和IO经常占用比较高,CPU有时候90%利用率,IO经常100%busy。

5、我又重新上传一部分日志,包含11月1日13:00~13:30的awr报告(故障是在13:50多发生),由于重启操作系统,故障时间段的awr报告没收集到。还包含两个正常时间段的awr。
早上8:00之后,下午13:50之后,业务开始进入高峰期。
6、上传了部分lmd进程的trace,基本都是故障时间段的

暂无图片 评论
暂无图片 有用 0
打赏 0
MR.CTHREE

awr报告上传,不支持11.2.0.3以下的,所以我放到新上传的附件中

暂无图片 评论
暂无图片 有用 0
打赏 0
MR.CTHREE
上传附件:20191101_2.rar
暂无图片 评论
暂无图片 有用 0
打赏 0
回答交流
Markdown


请输入正文
提交
相关推荐
万里开源数据库符合国产化要求吗?
回答 1
有没有可以指点一下我的大神啊
Oracle 19c dg可以只做某个PDB吗?
回答 1
已采纳
https://www.modb.pro/issue/2784
Oracle不完全恢复Open Resetlogs的时候,几种scn号是怎么变化的
回答 1
这些scn,理解起作用的点,也就没有困惑了。例如,所谓的start/stopscn,是针对数据文件打开后在控制文件中的记录,数据库open之后stopscn是没有赋值的。dbfcheckpointsc
查询V$ACCESS卡住,等待事件是library cache: mutex X
回答 2
已采纳
我记得11g访问这个视图是有问题的,会特别慢。会话级别设置这个参数试试:SQL>altersessionset"optimizercartesianenabled"false
就一份oracle 19C OCP题库解析
回答 2
題庫解析還有用嗎1謝謝
oracle可以限制system、sys用户只能在数据库服务器上登录,其他ip登录不了吗?有什么比较建议的方法吗?
回答 1
已采纳
SHOWPARAMETERREMOTELOGINPASSWORD;值为:exclusive时,启用口令文件,允许远程登录;值为:NONE时,停用口令文件验证,Oracle数据库不允许远程SYSDBA/
chrome访问11.2.0.4的em提示ERR_SSL_VERSION_OR_CIPHER_MISMATCH怎么解决
回答 6
mark
Oracle某些表迁移到MySQL
回答 2
可以试下etlcloud,平台提供很多数据集成和转换功能,比较稳定
oracle10g 迁移 oracle11g 用什么方法最好?
回答 6
一、使用传统的exp/imp导出/导入二、10g以上的版本迁移可以用数据泵expdp/impdp,数据泵有三种模式:1、按表,2、按用户,3、全库导出/导入三、通过DBLINK,调用数据泵expdp远
创意信息子公司万里开源是做什么的?
回答 2
万里开源是前MySQL中国研发中心,北京万里开源软件有限公司为原MySQL中国研发中心,与MySQL联合研发期间主要的贡献集中在Replication复制模块与NDBCluster模块,对分布式数据库