ORA-01173的模拟与故障处理

黄廷忠 2019-06-18

1446

问题描述

下面的测试来至于一位网友，他们生产数据库异常，在drop表空间，重建控制文件后，报下面的错误：

Sat Jul 19 00:45:47 2014
SMON: enabling cache recovery
Sat Jul 19 00:45:47 2014
Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_12464.trc:
ORA-01173: data dictionary indicates missing data file from system tablespace
Sat Jul 19 00:45:47 2014
Error 1173 happened during db open, shutting down database
USER: terminating instance due to error 1173
Instance terminated by USER, pid = 12464
ORA-1092 signalled during: alter database open resetlogs…复制

下面是简单的测试一下，提供2种方法来解决此故障。

专家解答

1，数据库版本

www.htz.pw > select * from v$version where rownum<3;
 
BANNER
—————————————————————-
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bi
PL/SQL Release 10.2.0.4.0 – Production
www.htz.pw > !lsb_release -a
LSB Version:    :core-3.0-amd64:core-3.0-ia32:core-3.0-noarch:graphics-3.0-amd64:graphics-3.0-ia32:graphics-3.0-noarch
Distributor ID: RedHatEnterpriseAS
Description:    Red Hat Enterprise Linux AS release 4 (Nahant Update 8)
Release:        4
Codename:       NahantUpdate8复制

2，查询undo段的名字

因为在实验过程中，我们需要使用到undo段的名字，所以这里提前查询出来，如果在生产环境，我们可以直接使用bbed去查询undo$表，或者是使用odu,dul等工具去直接抽取undo$表，另外了可以使用strings system数据文件来过滤UNDO段。

www.htz.pw > @undo_segment.sql                                                              
                                                                                            
                                                    SEGMENT_HEADER                          
TABLESPACE           SEGMENT_NAME                   FILE#.BLOCK    STATUS     SEGMENT_SIZE(M)
——————– —————————— ————– ———- —————
SYSTEM.OLD           PRI.SYSTEM                     1.9            ONLINE                   0
UNDOTBS1.CURRENT     PUB._SYSSMU1$                  2.9            ONLINE                   1
UNDOTBS1.CURRENT     PUB._SYSSMU10$                 2.153          ONLINE                   1
UNDOTBS1.CURRENT     PUB._SYSSMU9$                  2.137          ONLINE                  13
UNDOTBS1.CURRENT     PUB._SYSSMU8$                  2.121          ONLINE                  18
UNDOTBS1.CURRENT     PUB._SYSSMU7$                  2.105          ONLINE                   0
UNDOTBS1.CURRENT     PUB._SYSSMU6$                  2.89           ONLINE                   6
UNDOTBS1.CURRENT     PUB._SYSSMU5$                  2.73           ONLINE                   2
UNDOTBS1.CURRENT     PUB._SYSSMU4$                  2.57           ONLINE                   0
UNDOTBS1.CURRENT     PUB._SYSSMU3$                  2.41           ONLINE                   1
UNDOTBS1.CURRENT     PUB._SYSSMU2$                  2.25           ONLINE复制

3，生成创建控制文件脚本

[oracle@www.htz.pw sql]$./create_controlfile_sql.sh
please input direcotry default /tmp:
please input file name default control.ctl:
Database altered.复制

这里生成的默认文件位置在/tmp/control.ctl

4，重建控制文件

www.htz.pw > shutdown abort;
ORACLE instance shut down.
 
 
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "ORCL1024" NORESETLOGS  NOARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
GROUP 1 ‘/oracle/app/oracle/oradata/orcl1024/redo01.log’  SIZE 50M,
GROUP 2 ‘/oracle/app/oracle/oradata/orcl1024/redo02.log’  SIZE 50M,
GROUP 3 ‘/oracle/app/oracle/oradata/orcl1024/redo03.log’  SIZE 50M
DATAFILE
‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’,
‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’,（需要删除这行）
‘/oracle/app/oracle/oradata/orcl1024/sysaux01.dbf’,
‘/oracle/app/oracle/oradata/orcl1024/users01.dbf’
CHARACTER SET ZHS16GBK
;
RECOVER DATABASE
ALTER DATABASE OPEN;
ALTER TABLESPACE TEMP ADD TEMPFILE ‘/oracle/app/oracle/oradata/orcl1024/temp01.dbf’
 SIZE 1482M REUSE AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;复制

5，故障现象出现

www.htz.pw > recover database using backup controlfile until cancel;
ORA-00279: change 2170641 generated at 07/19/2014 00:35:54 needed for thread 1
ORA-00289: suggestion :
/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_72
_%u_.arc
ORA-00280: change 2170641 for thread 1 is in sequence #72
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/oracle/app/oracle/oradata/orcl1024/redo02.log
ORA-00310: archived log contains sequence 71; sequence 72 required
ORA-00334: archived log: ‘/oracle/app/oracle/oradata/orcl1024/redo02.log’
 
 
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
 
 
www.htz.pw >  recover database using backup controlfile until cancel;
ORA-00279: change 2170641 generated at 07/19/2014 00:35:54 needed for thread 1
ORA-00289: suggestion :
/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_72
_%u_.arc
ORA-00280: change 2170641 for thread 1 is in sequence #72
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/oracle/app/oracle/oradata/orcl1024/redo03.log
Log applied.
Media recovery complete.
 
 
www.htz.pw > alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced复制

alert中出现下面的报错

Sat Jul 19 00:45:47 2014
SMON: enabling cache recovery
Sat Jul 19 00:45:47 2014
Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_12464.trc:
ORA-01173: data dictionary indicates missing data file from system tablespace
Sat Jul 19 00:45:47 2014
Error 1173 happened during db open, shutting down database
USER: terminating instance due to error 1173
Instance terminated by USER, pid = 12464
ORA-1092 signalled during: alter database open resetlogs…复制

6，故障处理方法1

在运气比较好的情况下使用此方案是可行的，朋友的数据库使用此方法，数据库能正常的OPEN。

6.1 修改参数文件

这里手动创建pfile文件，直接修改pfile文件比较简单，并且不影响原spfile文件，增加下面红色部分参数

www.htz.pw > !vi /tmp/123.ora
orcl1024.__db_cache_size=54525952
orcl1024.__java_pool_size=4194304
orcl1024.__large_pool_size=8388608
orcl1024.__shared_pool_size=88080384
orcl1024.__streams_pool_size=0
*._backup_ksfq_bufsz=1048576
*._log_parallelism=2
*._log_parallelism_max=4
*._pga_max_size=5368709120
*._smm_max_size=3145728
*.audit_file_dest=’/oracle/app/oracle/admin/orcl1024/adump’
*.background_dump_dest=’/oracle/app/oracle/admin/orcl1024/bdump’
*.compatible=’10.2.0.3.0′
*.control_files=’/oracle/app/oracle/oradata/orcl1024/control01.ctl’,’/oracle/app/oracle/oradata/orcl1024/control02.ctl’,’/oracle/app/oracle/oradata/orcl1024/control03.ctl’
*.core_dump_dest=’/oracle/app/oracle/admin/orcl1024/cdump’
*.cpu_count=3
*.db_block_size=8192
*.db_domain=”
*.db_file_multiblock_read_count=16
*.db_name=’orcl1024′
*.db_recovery_file_dest=’/oracle/app/oracle/flash_recovery_area’
*.db_recovery_file_dest_size=4294967296
*.dbwr_io_slaves=4
*.disk_asynch_io=FALSE
*.dispatchers='(PROTOCOL=TCP) (SERVICE=orcl1024XDB)’
*.event=”
*.job_queue_processes=10
*.open_cursors=300
*.pga_aggregate_target=1073741824
*.processes=150
*.recyclebin=’OFF’
*.remote_login_passwordfile=’EXCLUSIVE’
*.sga_target=167772160
#*.undo_management=’AUTO’
*.undo_management=’manual’
*.undo_tablespace=’UNDOTBS1′
*.user_dump_dest=’/oracle/app/oracle/admin/orcl1024/udump’
_corrupted_rollback_segments=(_SYSSMU1$,_SYSSMU2$,_SYSSMU3$,_SYSSMU4$,_SYSSMU5$,_SYSSMU6$,_SYSSMU7$,_SYSSMU8$,_SYSSMU9$,_SYSSMU10$)
这里通常还需要增加下面的2个参数
_allow_resetlogs_corruption=true
_allow_error_simulation=true
另外还可以会增加一个event，如果smon一些功能的event。复制

6.2 启动数据库

www.htz.pw > startup mount pfile=’/tmp/123.ora’;
ORACLE instance started.
 
Total System Global Area  167772160 bytes
Fixed Size                  2082432 bytes
Variable Size             100665728 bytes
Database Buffers           54525952 bytes
Redo Buffers               10498048 bytes
Database mounted.
www.htz.pw > recover database using backup controlfile until cancel;
ORA-00279: change 2171386 generated at 07/19/2014 00:45:47 needed for thread 1
ORA-00289: suggestion :
/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_
%u_.arc
ORA-00280: change 2171386 for thread 1 is in sequence #1
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/oracle/app/oracle/oradata/orcl1024/redo03.log
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: ‘/oracle/app/oracle/oradata/orcl1024/redo03.log’
 
 
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
 
 
www.htz.pw > recover database using backup controlfile until cancel;
ORA-00279: change 2171386 generated at 07/19/2014 00:45:47 needed for thread 1
ORA-00289: suggestion :
/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_
%u_.arc
ORA-00280: change 2171386 for thread 1 is in sequence #1
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/oracle/app/oracle/oradata/orcl1024/redo01.log
Log applied.
Media recovery complete.
www.htz.pw > alter database open resetlogs;
 
Database altered.
这里看到数据库已经正常打开，这里还需要注意观察，alert日志文件是否有异常报错。复制

6.3 重建undo表空间

www.htz.pw > !rm /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf
 
www.htz.pw >  create undo tablespace undotbs1 datafile ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’ size 10m;
 
Tablespace created.复制

6.4 使用源参数文件启动数据库

www.htz.pw > shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
www.htz.pw > startup
ORACLE instance started.
 
Total System Global Area  167772160 bytes
Fixed Size                  2082432 bytes
Variable Size             100665728 bytes
Database Buffers           54525952 bytes
Redo Buffers               10498048 bytes
Database mounted.
Database opened.
www.htz.pw >复制

数据库启动正常，注意观察alert日志中是否有报错。

7 故障处理方法2

使用此方法，要求原来的UNDO数据文件存在，此方法就是将原来的undo数据文件再次增加到控制文件中去，此方法比较复制，因为在开启数据库的都会遇到其它很多的一些问题。

7.1 故障现象重现

www.htz.pw > select open_mode from v$database;
 
OPEN_MODE
———-
READ WRITE
 
数据库的状态是正常的
www.htz.pw > select name from v$dbfile;
 
NAME
——————————————————————————–
/oracle/app/oracle/oradata/orcl1024/users01.dbf
/oracle/app/oracle/oradata/orcl1024/sysaux01.dbf
/oracle/app/oracle/oradata/orcl1024/system01.dbf
/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf
存在的数据文件
www.htz.pw > shutdown abort;
ORACLE instance shut down.
重建控制文件，控制文件中不包括undo表空间
www.htz.pw > @/tmp/control.ctl
ORACLE instance started.
 
Total System Global Area  167772160 bytes
Fixed Size                  2082432 bytes
Variable Size             100665728 bytes
Database Buffers           54525952 bytes
Redo Buffers               10498048 bytes
 
Control file created.
 
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
 
 
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open
 
 
ALTER TABLESPACE TEMP ADD TEMPFILE ‘/oracle/app/oracle/oradata/orcl1024/temp01.dbf’
*
ERROR at line 1:
ORA-01109: database not open
 
 
www.htz.pw >
 
 
 
 
www.htz.pw > recover database using backup controlfile until cancel;
ORA-00279: change 2171941 generated at 07/19/2014 00:51:38 needed for thread 1
ORA-00289: suggestion :
/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_
%u_.arc
ORA-00280: change 2171941 for thread 1 is in sequence #1
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
 
 
ORA-01112: media recovery not started
 
 
*.remote_login_passwordfile=’EXCLUSIVE’
*.sga_target=167772160
#*.undo_management=’AUTO’
*.undo_management=’manual’
*.undo_tablespace=’UNDOTBS1′
*.user_dump_dest=’/oracle/app/oracle/admin/orcl1024/udump’
_allow_resetlogs_corruption=true
_allow_error_simulation=true
增加上面的参数文件
 
www.htz.pw > startup force mount pfile=’/tmp/123.ora’;
ORACLE instance started.
 
Total System Global Area  167772160 bytes
Fixed Size                  2082432 bytes
Variable Size             100665728 bytes
Database Buffers           54525952 bytes
Redo Buffers               10498048 bytes
Database mounted.
www.htz.pw > recover database using backup controlfile until cancel;
ORA-00279: change 2171941 generated at 07/19/2014 00:51:38 needed for thread 1
ORA-00289: suggestion :
/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_
%u_.arc
ORA-00280: change 2171941 for thread 1 is in sequence #1
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
/oracle/app/oracle/oradata/orcl1024/redo01.log
Log applied.
Media recovery complete.
www.htz.pw >
 
 
www.htz.pw > alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
 
 
重现故障现在
Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_14960.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01173: data dictionary indicates missing data file from system tablespace
Sat Jul 19 01:01:40 2014
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 14960
ORA-1092 signalled during: alter database open resetlogs…复制

7.2 重建控制文件

重建控制文件，控制文件中包括undo表空间的数据文件

[oracle@www.htz.pw sql]$sqlplus / as sysdba
 
SQL*Plus: Release 10.2.0.4.0 – Production on Sat Jul 19 01:09:05 2014
 
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
 
Connected to an idle instance.
 
www.htz.pw > startup nomount pfile=’/tmp/123.ora’;
ORACLE instance started.
 
Total System Global Area  167772160 bytes
Fixed Size                  2082432 bytes
Variable Size             100665728 bytes
Database Buffers           54525952 bytes
Redo Buffers               10498048 bytes
www.htz.pw > @/tmp/control.ctl
ORA-01081: cannot start already-running ORACLE – shut it down first
CREATE CONTROLFILE REUSE DATABASE "ORCL1024" RESETLOGS  NOARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01189: file is from a different RESETLOGS than previous files
ORA-01110: data file 2: ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’复制

这里提示ORA-01189的错误。

1189的错误很简单，因为数据文件头的resetlogs信息不一致导致的。

7.3 bbed修改数据文件头中RESETLOG与SCN信息

www.htz.pw > !cat /tmp/bbed.par
listfile=/tmp/bbed.datafile
 
www.htz.pw > !cat /tmp/bbed.datafile
1 /oracle/app/oracle/oradata/orcl1024/system01.dbf
2 /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf
3 /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf
4 /oracle/app/oracle/oradata/orcl1024/users01.dbf
[oracle@www.htz.pw ~]$bbed parfile=/tmp/bbed.par
Password:
 
BBED: Release 2.0.0.0.0 – Limited Production on Sat Jul 19 01:12:13 2014
 
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
 
************* !!! For Oracle Internal Use only !!! ***************
 
BBED> info
 File#  Name                                                        Size(blks)
 —–  —-                                                        ———-
     1  /oracle/app/oracle/oradata/orcl1024/system01.dbf                     0
     2  /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf                    0
     3  /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf                     0
     4  /oracle/app/oracle/oradata/orcl1024/users01.dbf                      0
这里只需要修改上次resetlogs与SCN的值就可以了
BBED> assign file 2 block 1 offset 112 = file 1 block 1 offset 112;
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
ub4 kcvfhrlc                                @112      0x32dc2c73
 
BBED> assign file 2 block 1 offset 116 = file 1 block 1 offset 116;
ub4 kscnbas                                 @116      0x002125e0
 
BBED> assign file 2 block 1 offset 484 = file 1 block 1 offset 484;
ub1 pad                                     @484      0xe1
 
BBED> assign file 2 block 1 offset 492 = file 1 block 1 offset 492;
ub1 pad                                     @492      0x74
BBED> sum apply dba 2,1
Check value for File 2, Block 1:
current = 0x093b, required = 0x093b
 
www.htz.pw > startup force nomount pfile=’/tmp/123.ora’;
ORACLE instance started.
 
Total System Global Area  167772160 bytes
Fixed Size                  2082432 bytes
Variable Size             100665728 bytes
Database Buffers           54525952 bytes
Redo Buffers               10498048 bytes复制

7.4 重建控制文件

www.htz.pw > @/tmp/control.ctl
ORA-01081: cannot start already-running ORACLE – shut it down first
 
Control file created.
 
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done复制

控制文件重建成功

7.5 遇到600错误

www.htz.pw > recover database using backup controlfile until cancel;
ORA-00279: change 2172129 generated at 07/19/2014 00:53:08 needed for thread 1
ORA-00289: suggestion :
/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_
%u_.arc
ORA-00280: change 2172129 for thread 1 is in sequence #1
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
 
 
ORA-01112: media recovery not started
 
 
www.htz.pw > alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [],[], []复制

这里触发了ORA-00600 kcbgtcr_13错误，只需要手动提交事务就可以了。

7.6 手动提交事务信息

[09:50:33]www.htz.pw > startup mount pfile=’/tmp/123.ora’;
[09:50:34]ORACLE instance started.
[09:50:34]
[09:50:34]Total System Global Area  167772160 bytes
[09:50:34]Fixed Size                  2082432 bytes
[09:50:34]Variable Size             100665728 bytes
[09:50:34]Database Buffers           54525952 bytes
[09:50:34]Redo Buffers               10498048 bytes
[09:50:38]Database mounted.
[09:50:51]www.htz.pw > recover database using backup controlfile until cancel;
[09:50:51]ORA-00279: change 2172135 generated at 07/19/2014 01:41:17 needed for thread 1
[09:50:51]ORA-00289: suggestion :
[09:50:51]/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_
[09:50:51]%u_.arc
[09:50:51]ORA-00280: change 2172135 for thread 1 is in sequence #1
[09:50:51]
[09:50:51]
[09:50:51]Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
[09:50:53]cancel
[09:50:54]ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
[09:50:54]ORA-01194: file 1 needs more recovery to be consistent
[09:50:54]ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
[09:50:54]
[09:50:54]
[09:50:54]ORA-01112: media recovery not started
[09:50:54]
[09:50:54]
[09:51:06]www.htz.pw > alter database open resetlogs;
[09:51:09]alter database open resetlogs
[09:51:09]*
[09:51:09]ERROR at line 1:
[09:51:09]ORA-01092: ORACLE instance terminated. Disconnection forced复制

后面alert报下面的错误

Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc:
ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], []
Sat Jul 19 01:43:01 2014
Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], []
Sat Jul 19 01:43:01 2014
Error 704 happened during db open, shutting down database复制

在trace文件中查看有那些块没有提交。

[oracle@www.htz.pw ~]$grep -E ‘^Block header dump|^0x0’  /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc
0x01 0x1f64     0x02 0x1ef8    
0x01 0x1f64     0x02 0x1ef8    
0x01 0x1f64     0x02 0x1ef8    
0x01 0x1f64     0x02 0x1ef8    
0x01 0x1f64     0x02 0x1ef8    
0x01 0x1f64     0x02 0x1ef8    
Block header dump:  0x0040007a
0x01   0x0003.001.00000191  0x0080002b.014c.03  —-    1  fsc 0x0000.00000000
Block header dump:  0x0040017c
0x01   0x0000.022.00000002  0x00400196.0004.37  –U-   12  fsc 0x0000.00000147
Block header dump:  0x004000da
0x01   0x0004.00c.0000011d  0x0080559d.00d3.02  C—    0  scn 0x0000.0008ab18
Block header dump:  0x004000db
0x01   0x0008.017.00000002  0x00800080.0000.01  CBU-    0  scn 0x0000.00002404
0x02   0x0004.01a.0000017a  0x0080003c.016b.32  –U-    1  fsc 0x000e.001e8de0
Block header dump:  0x0040007a
0x01   0x0003.001.00000191  0x0080002b.014c.03  —-    1  fsc 0x0000.00000000
Block header dump:  0x0040006a
0x01   0x0000.008.00000034  0x0040019e.003b.07  C—    0  scn 0x0000.0021245c复制

通过10046跟踪报错的SQL语句

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
—– Call Stack Trace —–
这里看到了报错的SQL语句，以SQL语句来搜索，直到搜索到如下的
Cursor#5(0x2a97ca18b0) state=FETCH curiob=0x2a97cba468
 curflg=f fl2=0 par=0x2a97ca1710 ses=0x69f82a30
 sqltxt(0x69a944b0)=select ctime, mtime, stime from obj$ where obj# = :1
  hash=fa0bd3f60d6ee4f2495f9af8199b75b9
  parent=0x6677c4b8 maxchild=01 plk=0x66f56af0 ppn=n
cursor instantiation=0x2a97cba468 used=1405705379
 child#0(0x69a94288) pcs=0x6677c0c8
  clk=0x66f56dd0 ci=0x6677b7b0 pn=0x69ad37f0 ctx=0x6616fe90
 kgsccflg=0 llk[0x2a97cba470,0x2a97cba470] idx=0
 xscflg=e0141476 fl2=45000401 fl3=4022210c fl4=100
 Bind bytecodes
  Opcode = 1   Unoptimized
  Offsi = 48, Offsi = 0
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2a97cba020  bln=22  avl=02  flg=05
  value=20复制

这里中可以看到绑定变量的值是20.在相同的版本其它的数据库中执行下面的操作

SQL> select rowid  from obj$ where obj# =20;
 
ROWID
——————
AAAAASAABAAAAB6AAA
 
SQL> @rowid_to_info.sql
Enter value for rowid: AAAAASAABAAAAB6AAA
ROWID_TYPE:  1
OBJECT_NUMBER:  18
RELATIVE_FNO:  1
BLOCK_NUMBER:  122
ROW_NUMBER:  0
 
PL/SQL procedure successfully completed.
 
 
正在好trace文件中的
Block header dump:  0x0040007a
0x01   0x0003.001.00000191  0x0080002b.014c.03  —-    1  fsc 0x0000.00000000复制

其实我们还可以从10046trace文件中找到此信息如下：

=====================
PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1372757926978214 hv=429618617 ad=’69a944b0′
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #5:c=0,e=234,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1372757926978212
BINDS #5:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2a97cba020  bln=22  avl=02  flg=05
  value=20
EXEC #5:c=0,e=330,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1372757926978586
WAIT #5: nam=’db file sequential read’ ela= 23 file#=1 block#=218 blocks=1 obj#=-1 tim=1372757926978753
WAIT #5: nam=’db file sequential read’ ela= 9 file#=1 block#=219 blocks=1 obj#=-1 tim=1372757926978804
WAIT #5: nam=’db file sequential read’ ela= 7 file#=1 block#=122 blocks=1 obj#=-1 tim=1372757926978841复制

这里需要注意的是绑定变量的值

在trace文件中可以发现下面的内容
 
tab 0, row 26, @0x18f1
tl: 70 fb: –H-FL– lb: 0x1  cc: 17
col  0: [ 2]  c1 02
col  1: [ 4]  c3 06 17 08
col  2: [ 1]  80
col  3: [12]  5f 4e 45 58 54 5f 4f 42 4a 45 43 54
col  4: [ 2]  c1 02
col  5: *NULL*
col  6: [ 1]  80
col  7: [ 7]  78 6c 03 0c 01 28 31
col  8: [ 7]  78 72 07 13 01 3b 01
col  9: [ 7]  78 6c 03 0c 01 28 31
col 10: [ 1]  80
col 11: *NULL*
col 12: *NULL*
col 13: [ 1]  80
col 14: *NULL*
col 15: [ 1]  80
col 16: [ 4]  c3 07 38 24复制

bbed手动提交事务，需要更改itl与行中的lck值

BBED> p ktbbh
struct ktbbh, 48 bytes                      @20     
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24     
      ub4 ktbbhsg1                          @24       0x00000012
      ub4 ktbbhod1                          @24       0x00000012
   struct ktbbhcsc, 8 bytes                 @28     
      ub4 kscnbas                           @28       0x0021251b
      ub2 kscnwrp                           @32       0x0000
   b2 ktbbhict                              @36       1
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44     
      struct ktbitxid, 8 bytes              @44     
         ub2 kxidusn                        @44       0x0003
         ub2 kxidslt                        @46       0x0001
         ub4 kxidsqn                        @48       0x00000191
      struct ktbituba, 8 bytes              @52     
         ub4 kubadba                        @52       0x0080002b
         ub2 kubaseq                        @56       0x014c
         ub1 kubarec                        @58       0x03
      ub2 ktbitflg                          @60       0x0001 (NONE)
      union _ktbitun, 2 bytes               @62     
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0x00000000
BBED> modify /x 80 offset 61
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /oracle/app/oracle/oradata/orcl1024/system01.dbf (1)
 Block: 122              Offsets:   61 to   70           Dba:0x0040007a
————————————————————————
 80000000 00000000 016c
BBED> x /rn *kdbr[26]
rowdata[5278]                               @6453   
————-
flag@6453: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@6454: 0x01
cols@6455:   17
 
col    0[2] @6456: 1
col    1[4] @6459: 52207
col    2[1] @6464: 0
col   3[12] @6466: -0
col    4[2] @6479: 1
col    5[0] @6482: *NULL*
col    6[1] @6483: 0
col    7[7] @6485: -0
col    8[7] @6493: -0
col    9[7] @6501: -0
col   10[1] @6509: 0
col   11[0] @6511: *NULL*
col   12[0] @6512: *NULL*
col   13[1] @6513: 0
col   14[0] @6515: *NULL*
col   15[1] @6516: 0
col   16[4] @6518: 65535
 
 
BBED> modify /x 00 offset 6454
 File: /oracle/app/oracle/oradata/orcl1024/system01.dbf (1)
 Block: 122              Offsets: 6454 to 6463           Dba:0x0040007a
————————————————————————
 001102c1 0204c306 1708
BBED> sum apply
Check value for File 1, Block 122:
current = 0x3d20, required = 0x3d20
 
BBED> verify
DBVERIFY – Verification starting
FILE = /oracle/app/oracle/oradata/orcl1024/system01.dbf
BLOCK = 122
 
 
DBVERIFY – Verification complete
 
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0复制

7.7 报00600坏块的错误

www.htz.pw > alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
 
Sat Jul 19 01:56:37 2014
Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_1894.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [1], [106], [6101], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 1894
ORA-1092 signalled during: alter database open resetlogs…
这里可以看到数据文件1，块106，出现了6101的错误。此错误由于是ITL中的值与LOCK不一致导致的。复制

bbed修改行的lock值

BBED> set dba 1,106
        DBA             0x0040006a (4194410 1,106)
 
BBED> verify
DBVERIFY – Verification starting
FILE = /oracle/app/oracle/oradata/orcl1024/system01.dbf
BLOCK = 106
 
Block Checking: DBA = 4194410, Block Type = KTB-managed data block
data header at 0x2a97696244
kdbchk: row locked by non-existent transaction
        table=0   slot=10
        lockid=1   ktbbhitc=1
Block 106 failed with check code 6101
 
DBVERIFY – Verification complete
 
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0复制

此报错的修改见6101(row locked by non-existent transaction)

7.8 启动数据库

通过上面几步操作，再次启动数据库

www.htz.pw > startup mount pfile=’/tmp/123.ora’;
ORACLE instance started.
 
Total System Global Area  167772160 bytes
Fixed Size                  2082432 bytes
Variable Size             100665728 bytes
Database Buffers           54525952 bytes
Redo Buffers               10498048 bytes
Database mounted.
www.htz.pw > recover database using backup controlfile unitl cancel;
ORA-00905: missing keyword
 
 
www.htz.pw > recover database using backup controlfile until cancel;
ORA-00279: change 2172139 generated at 07/19/2014 01:56:34 needed for thread 1
ORA-00289: suggestion :
/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_
%u_.arc
ORA-00280: change 2172139 for thread 1 is in sequence #1
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’
 
 
ORA-01112: media recovery not started
 
 
www.htz.pw > alter database open resetlogs;
 
Database altered.
 
使用原参数能正常启动数据库。
www.htz.pw > startup force;
ORACLE instance started.
 
Total System Global Area  167772160 bytes
Fixed Size                  2082432 bytes
Variable Size             100665728 bytes
Database Buffers           54525952 bytes
Redo Buffers               10498048 bytes
Database mounted.
Database opened.
www.htz.pw >复制

oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

ORA-01173的模拟与故障处理

问题描述

专家解答

评论