暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle 12.2 Rac redo异常的恢复案例

原创 Roger 2019-12-05
4122
前同事问的一个问题;其一套12.2 rac由于存储问题导致无法启动;重建控制文件后报如下错误:
2019-12-05T10:12:55.112874+08:00
Sleep 5 seconds and then try to clear SRLs in 2 time(s)
Errors in file /u01/oracle/app/oracle/diag/rdbms/orapuabis/xxoo2/trace/xxoo2_ora_327541.trc (incident=5376496):
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/incident/incdir_5376496/xxoo2_ora_327541_i5376496.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-12-05T10:12:56.191277+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-12-05T10:12:56.485111+08:00
Errors in file /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/trace/xxoo2_ora_327541.trc:
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2019-12-05T10:12:56.485226+08:00
Errors in file /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/trace/xxoo2_ora_327541.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2019-12-05T10:12:56.485323+08:00
Errors in file /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/trace/xxoo2_ora_327541.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
Errors in file /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/trace/xxoo2_ora_327541.trc (incident=5376497):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/incident/incdir_5376497/xxoo2_ora_327541_i5376497.trc
2019-12-05T10:12:56.602381+08:00
Dumping diagnostic data in directory=[cdmp_20191205101256], requested by (instance=2, osid=327541), summary=[incident=5376496].
2019-12-05T10:12:57.664168+08:00
opiodr aborting process unknown ospid (327541) as a result of ORA-603
复制

这个600错误我还是第一次遇见。最近2年很少做恢复,因此在华为云服务器针对该错误进行了一次测试;通过模拟事务丢失(dd破坏current redo等)很容易出现该错误;如下是我的模拟过程:
SQL> startup
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size 8797928 bytes
Variable Size 939524376 bytes
Database Buffers 2264924160 bytes
Redo Buffers 7979008 bytes
Database mounted.
Database opened.
SQL> show pdbs;

CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 ENMOTEST MOUNTED
4 PDBTEST1 MOUNTED

SQL> alter pluggable database ENMOTEST open;

Pluggable database altered.

SQL> alter pluggable database PDBTEST1 open;


alter
Pluggable database altered.

SQL> SQL> SQL> session set container=enmotest;

Session altered.

SQL> create table roger as select * from sys.dba_objects where rownum < 10001;

Table created.

SQL> alter system flush buffer_cache;

System altered.

SQL> delete from roger where rownum < 1001;

1000 rows deleted.

SQL> select * from v$Log;

GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIV STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME CON_ID
---------- ---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- ------------ ------------ ------------ ----------
1 1 37 209715200 512 1 NO INACTIVE 3232261 04-DEC-19 3335566 05-DEC-19 0
2 1 38 209715200 512 1 NO CURRENT 3335566 05-DEC-19 1.8447E+19 0
3 1 36 209715200 512 1 NO INACTIVE 3121301 28-NOV-19 3232261 04-DEC-19 0

SQL> select member from v$Logfile;

MEMBER
---------------------------------------------------------------------
/sas_data/12c/app/oracle/oradata/test12c/redo03.log
/sas_data/12c/app/oracle/oradata/test12c/redo02.log
/sas_data/12c/app/oracle/oradata/test12c/redo01.log


SQL> conn /as sysdba
Connected.
SQL> shutdown abort;
ORACLE instance shut down.
SQL> !
[oracle@enmodb3 ~]$ dd if=/dev/zero of=/sas_data/12c/app/oracle/oradata/test12c/redo02.log bs=1024k count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00221134 s, 474 MB/s
[oracle@enmodb3 ~]$ exit
exit

SQL> startup
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size 8797928 bytes
Variable Size 939524376 bytes
Database Buffers 2264924160 bytes
Redo Buffers 7979008 bytes
Database mounted.
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/sas_data/12c/app/oracle/oradata/test12c/redo02.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 2


SQL> startup pfile='/tmp/12_2.ora' mount;
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size 8797928 bytes
Variable Size 939524376 bytes
Database Buffers 2264924160 bytes
Redo Buffers 7979008 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> alter database backup controlfile to trace;

Database altered.

SQL> oradebug tracefile_name
/sas_data/12c/app/oracle/diag/rdbms/test12c/test12c/trace/test12c_ora_30127.trc


在重建控制文件后,分区进行强制open时分别遇到了如下错误:


SQL> alter database open resetlogs;
ERROR:
ORA-03114: not connected to ORACLE


alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Process ID: 13970
Session ID: 1 Serial number: 15896

SQL> alter pluggable database ENMOTEST open force;
alter pluggable database ENMOTEST open force
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [],
[], []

复制

我们言归正传;回到本文的案例中来;针对第一个600错误的处理方式是,首先通过10046 trace定位到访问的数据块;然后通过bbed修改相关block,提交事务后,再使用event="21307096 trace name context forever, level 1' 来 推进scn即可。

当然,有可能在你在恢复过程中还会遇到ora-00600 [4194]等错误,这是跟undo相关的问题,可以通过屏蔽回滚段或者重建undo来解决。

当处理完毕后,强烈建议使用hcheck脚本检查数据字典的一致性,如果检查没有任何问题,那么基本上数据库运行不会有太大问题;如下是本case的检测结果:
SQL> @/home/oracle/hcheck.sql
HCheck Version 07MAY18 on 05-DEC-2019 11:28:59
----------------------------------------------
Catalog Version 12.2.0.1.0 (1202000100)
db_name: xxxx
Is CDB?: NO

Catalog Fixed
Procedure Name Version Vs Release Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- MissingOIDOnObjCol ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- SourceNotInObj ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- OversizedFiles ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- PoorDefaultStorage ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- PoorStorage ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- TabPartCountMismatch ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- OrphanedTabComPart ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- MissingSum$ ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- MissingDir$ ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- DuplicateDataobj ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS
.- ObjSynMissing ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- ObjSeqMissing ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedUndo ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedIndex ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedIndexPartition ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedIndexSubPartition ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedTable ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedTablePartition ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedTableSubPartition ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- MissingPartCol ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedSeg$ ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- OrphanedIndPartObj# ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- DuplicateBlockUse ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- FetUet ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- Uet0Check ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- SeglessUET ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- BadInd$ ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- BadTab$ ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- BadIcolDepCnt ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- ObjIndDobj ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- TrgAfterUpgrade ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- ObjType0 ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- BadOwner ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- StmtAuditOnCommit ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- BadPublicObjects ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- BadSegFreelist ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- BadDepends ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS
.- CheckDual ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- ObjectNames ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- BadCboHiLo ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- ChkIotTs ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- NoSegmentIndex ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- BadNextObject ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- DroppedROTS ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- FilBlkZero ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- DbmsSchemaCopy ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- OrphanedIdnseqObj ... 1202000100 > 1201000000 12/05 11:29:01 PASS
.- OrphanedIdnseqSeq ... 1202000100 > 1201000000 12/05 11:29:01 PASS
.- OrphanedObjError ... 1202000100 > 1102000000 12/05 11:29:01 PASS
.- ObjNotLob ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- MaxControlfSeq ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- SegNotInDeferredStg ... 1202000100 > 1102000000 12/05 11:29:01 PASS
.- SystemNotRfile1 ... 1202000100 > 902000000 12/05 11:29:01 PASS
.- DictOwnNonDefaultSYSTEM ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- OrphanTrigger ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
.- ObjNotTrigger ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS
---------------------------------------
05-DEC-2019 11:29:01 Elapsed: 2 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.
复制

第一次遇到这个问题;很久没搞Oracle了。友情支持;就记录一下吧。

话说这里我自己在华为云12.2 版本测试所遇到的ora-00600 kcffo_online_pdb_check: fno_system 问题;还没找到很好的解决方法;放到下一篇文章吧。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论