问题描述
下面的测试来至于一位网友,他们生产数据库异常,在drop表空间,重建控制文件后,报下面的错误:
Sat Jul 19 00:45:47 2014 SMON: enabling cache recovery Sat Jul 19 00:45:47 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_12464.trc: ORA-01173: data dictionary indicates missing data file from system tablespace Sat Jul 19 00:45:47 2014 Error 1173 happened during db open, shutting down database USER: terminating instance due to error 1173 Instance terminated by USER, pid = 12464 ORA-1092 signalled during: alter database open resetlogs…复制
下面是简单的测试一下,提供2种方法来解决此故障。
专家解答
1,数据库版本
www.htz.pw > select * from v$version where rownum<3; BANNER —————————————————————- Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bi PL/SQL Release 10.2.0.4.0 – Production www.htz.pw > !lsb_release -a LSB Version: :core-3.0-amd64:core-3.0-ia32:core-3.0-noarch:graphics-3.0-amd64:graphics-3.0-ia32:graphics-3.0-noarch Distributor ID: RedHatEnterpriseAS Description: Red Hat Enterprise Linux AS release 4 (Nahant Update 8) Release: 4 Codename: NahantUpdate8复制
2,查询undo段的名字
因为在实验过程中,我们需要使用到undo段的名字,所以这里提前查询出来,如果在生产环境,我们可以直接使用bbed去查询undo$表,或者是使用odu,dul等工具去直接抽取undo$表,另外了可以使用strings system数据文件来过滤UNDO段。
www.htz.pw > @undo_segment.sql SEGMENT_HEADER TABLESPACE SEGMENT_NAME FILE#.BLOCK STATUS SEGMENT_SIZE(M) ——————– —————————— ————– ———- ————— SYSTEM.OLD PRI.SYSTEM 1.9 ONLINE 0 UNDOTBS1.CURRENT PUB._SYSSMU1$ 2.9 ONLINE 1 UNDOTBS1.CURRENT PUB._SYSSMU10$ 2.153 ONLINE 1 UNDOTBS1.CURRENT PUB._SYSSMU9$ 2.137 ONLINE 13 UNDOTBS1.CURRENT PUB._SYSSMU8$ 2.121 ONLINE 18 UNDOTBS1.CURRENT PUB._SYSSMU7$ 2.105 ONLINE 0 UNDOTBS1.CURRENT PUB._SYSSMU6$ 2.89 ONLINE 6 UNDOTBS1.CURRENT PUB._SYSSMU5$ 2.73 ONLINE 2 UNDOTBS1.CURRENT PUB._SYSSMU4$ 2.57 ONLINE 0 UNDOTBS1.CURRENT PUB._SYSSMU3$ 2.41 ONLINE 1 UNDOTBS1.CURRENT PUB._SYSSMU2$ 2.25 ONLINE复制
3,生成创建控制文件脚本
[oracle@www.htz.pw sql]$./create_controlfile_sql.sh please input direcotry default /tmp: please input file name default control.ctl: Database altered.复制
这里生成的默认文件位置在/tmp/control.ctl
4,重建控制文件
www.htz.pw > shutdown abort; ORACLE instance shut down. STARTUP NOMOUNT CREATE CONTROLFILE REUSE DATABASE "ORCL1024" NORESETLOGS NOARCHIVELOG MAXLOGFILES 16 MAXLOGMEMBERS 3 MAXDATAFILES 100 MAXINSTANCES 8 MAXLOGHISTORY 292 LOGFILE GROUP 1 ‘/oracle/app/oracle/oradata/orcl1024/redo01.log’ SIZE 50M, GROUP 2 ‘/oracle/app/oracle/oradata/orcl1024/redo02.log’ SIZE 50M, GROUP 3 ‘/oracle/app/oracle/oradata/orcl1024/redo03.log’ SIZE 50M DATAFILE ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’, ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’,(需要删除这行) ‘/oracle/app/oracle/oradata/orcl1024/sysaux01.dbf’, ‘/oracle/app/oracle/oradata/orcl1024/users01.dbf’ CHARACTER SET ZHS16GBK ; RECOVER DATABASE ALTER DATABASE OPEN; ALTER TABLESPACE TEMP ADD TEMPFILE ‘/oracle/app/oracle/oradata/orcl1024/temp01.dbf’ SIZE 1482M REUSE AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M;复制
5,故障现象出现
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2170641 generated at 07/19/2014 00:35:54 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_72 _%u_.arc ORA-00280: change 2170641 for thread 1 is in sequence #72 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo02.log ORA-00310: archived log contains sequence 71; sequence 72 required ORA-00334: archived log: ‘/oracle/app/oracle/oradata/orcl1024/redo02.log’ ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’ www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2170641 generated at 07/19/2014 00:35:54 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_72 _%u_.arc ORA-00280: change 2170641 for thread 1 is in sequence #72 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo03.log Log applied. Media recovery complete. www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced复制
alert中出现下面的报错
Sat Jul 19 00:45:47 2014 SMON: enabling cache recovery Sat Jul 19 00:45:47 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_12464.trc: ORA-01173: data dictionary indicates missing data file from system tablespace Sat Jul 19 00:45:47 2014 Error 1173 happened during db open, shutting down database USER: terminating instance due to error 1173 Instance terminated by USER, pid = 12464 ORA-1092 signalled during: alter database open resetlogs…复制
6,故障处理方法1
在运气比较好的情况下使用此方案是可行的,朋友的数据库使用此方法,数据库能正常的OPEN。
6.1 修改参数文件
这里手动创建pfile文件,直接修改pfile文件比较简单,并且不影响原spfile文件,增加下面红色部分参数
www.htz.pw > !vi /tmp/123.ora orcl1024.__db_cache_size=54525952 orcl1024.__java_pool_size=4194304 orcl1024.__large_pool_size=8388608 orcl1024.__shared_pool_size=88080384 orcl1024.__streams_pool_size=0 *._backup_ksfq_bufsz=1048576 *._log_parallelism=2 *._log_parallelism_max=4 *._pga_max_size=5368709120 *._smm_max_size=3145728 *.audit_file_dest=’/oracle/app/oracle/admin/orcl1024/adump’ *.background_dump_dest=’/oracle/app/oracle/admin/orcl1024/bdump’ *.compatible=’10.2.0.3.0′ *.control_files=’/oracle/app/oracle/oradata/orcl1024/control01.ctl’,’/oracle/app/oracle/oradata/orcl1024/control02.ctl’,’/oracle/app/oracle/oradata/orcl1024/control03.ctl’ *.core_dump_dest=’/oracle/app/oracle/admin/orcl1024/cdump’ *.cpu_count=3 *.db_block_size=8192 *.db_domain=” *.db_file_multiblock_read_count=16 *.db_name=’orcl1024′ *.db_recovery_file_dest=’/oracle/app/oracle/flash_recovery_area’ *.db_recovery_file_dest_size=4294967296 *.dbwr_io_slaves=4 *.disk_asynch_io=FALSE *.dispatchers='(PROTOCOL=TCP) (SERVICE=orcl1024XDB)’ *.event=” *.job_queue_processes=10 *.open_cursors=300 *.pga_aggregate_target=1073741824 *.processes=150 *.recyclebin=’OFF’ *.remote_login_passwordfile=’EXCLUSIVE’ *.sga_target=167772160 #*.undo_management=’AUTO’ *.undo_management=’manual’ *.undo_tablespace=’UNDOTBS1′ *.user_dump_dest=’/oracle/app/oracle/admin/orcl1024/udump’ _corrupted_rollback_segments=(_SYSSMU1$,_SYSSMU2$,_SYSSMU3$,_SYSSMU4$,_SYSSMU5$,_SYSSMU6$,_SYSSMU7$,_SYSSMU8$,_SYSSMU9$,_SYSSMU10$) 这里通常还需要增加下面的2个参数 _allow_resetlogs_corruption=true _allow_error_simulation=true 另外还可以会增加一个event,如果smon一些功能的event。复制
6.2 启动数据库
www.htz.pw > startup mount pfile=’/tmp/123.ora’; ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2171386 generated at 07/19/2014 00:45:47 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2171386 for thread 1 is in sequence #1 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo03.log ORA-00339: archived log does not contain any redo ORA-00334: archived log: ‘/oracle/app/oracle/oradata/orcl1024/redo03.log’ ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’ www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2171386 generated at 07/19/2014 00:45:47 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2171386 for thread 1 is in sequence #1 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo01.log Log applied. Media recovery complete. www.htz.pw > alter database open resetlogs; Database altered. 这里看到数据库已经正常打开,这里还需要注意观察,alert日志文件是否有异常报错。复制
6.3 重建undo表空间
www.htz.pw > !rm /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf www.htz.pw > create undo tablespace undotbs1 datafile ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’ size 10m; Tablespace created.复制
6.4 使用源参数文件启动数据库
www.htz.pw > shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. www.htz.pw > startup ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. Database opened. www.htz.pw >复制
数据库启动正常,注意观察alert日志中是否有报错。
7 故障处理方法2
使用此方法,要求原来的UNDO数据文件存在,此方法就是将原来的undo数据文件再次增加到控制文件中去,此方法比较复制,因为在开启数据库的都会遇到其它很多的一些问题。
7.1 故障现象重现
www.htz.pw > select open_mode from v$database; OPEN_MODE ———- READ WRITE 数据库的状态是正常的 www.htz.pw > select name from v$dbfile; NAME ——————————————————————————– /oracle/app/oracle/oradata/orcl1024/users01.dbf /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf /oracle/app/oracle/oradata/orcl1024/system01.dbf /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf 存在的数据文件 www.htz.pw > shutdown abort; ORACLE instance shut down. 重建控制文件,控制文件中不包括undo表空间 www.htz.pw > @/tmp/control.ctl ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Control file created. ORA-00283: recovery session canceled due to errors ORA-01610: recovery using the BACKUP CONTROLFILE option must be done ALTER DATABASE OPEN * ERROR at line 1: ORA-01589: must use RESETLOGS or NORESETLOGS option for database open ALTER TABLESPACE TEMP ADD TEMPFILE ‘/oracle/app/oracle/oradata/orcl1024/temp01.dbf’ * ERROR at line 1: ORA-01109: database not open www.htz.pw > www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2171941 generated at 07/19/2014 00:51:38 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2171941 for thread 1 is in sequence #1 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’ ORA-01112: media recovery not started *.remote_login_passwordfile=’EXCLUSIVE’ *.sga_target=167772160 #*.undo_management=’AUTO’ *.undo_management=’manual’ *.undo_tablespace=’UNDOTBS1′ *.user_dump_dest=’/oracle/app/oracle/admin/orcl1024/udump’ _allow_resetlogs_corruption=true _allow_error_simulation=true 增加上面的参数文件 www.htz.pw > startup force mount pfile=’/tmp/123.ora’; ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2171941 generated at 07/19/2014 00:51:38 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2171941 for thread 1 is in sequence #1 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /oracle/app/oracle/oradata/orcl1024/redo01.log Log applied. Media recovery complete. www.htz.pw > www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced 重现故障现在 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_14960.trc: ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 2 ORA-01173: data dictionary indicates missing data file from system tablespace Sat Jul 19 01:01:40 2014 Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Instance terminated by USER, pid = 14960 ORA-1092 signalled during: alter database open resetlogs…复制
7.2 重建控制文件
重建控制文件,控制文件中包括undo表空间的数据文件
[oracle@www.htz.pw sql]$sqlplus / as sysdba SQL*Plus: Release 10.2.0.4.0 – Production on Sat Jul 19 01:09:05 2014 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to an idle instance. www.htz.pw > startup nomount pfile=’/tmp/123.ora’; ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes www.htz.pw > @/tmp/control.ctl ORA-01081: cannot start already-running ORACLE – shut it down first CREATE CONTROLFILE REUSE DATABASE "ORCL1024" RESETLOGS NOARCHIVELOG * ERROR at line 1: ORA-01503: CREATE CONTROLFILE failed ORA-01189: file is from a different RESETLOGS than previous files ORA-01110: data file 2: ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’复制
这里提示ORA-01189的错误。
1189的错误很简单,因为数据文件头的resetlogs信息不一致导致的。
7.3 bbed修改数据文件头中RESETLOG与SCN信息
www.htz.pw > !cat /tmp/bbed.par listfile=/tmp/bbed.datafile www.htz.pw > !cat /tmp/bbed.datafile 1 /oracle/app/oracle/oradata/orcl1024/system01.dbf 2 /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf 3 /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf 4 /oracle/app/oracle/oradata/orcl1024/users01.dbf [oracle@www.htz.pw ~]$bbed parfile=/tmp/bbed.par Password: BBED: Release 2.0.0.0.0 – Limited Production on Sat Jul 19 01:12:13 2014 Copyright (c) 1982, 2007, Oracle. All rights reserved. ************* !!! For Oracle Internal Use only !!! *************** BBED> info File# Name Size(blks) —– —- ———- 1 /oracle/app/oracle/oradata/orcl1024/system01.dbf 0 2 /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf 0 3 /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf 0 4 /oracle/app/oracle/oradata/orcl1024/users01.dbf 0 这里只需要修改上次resetlogs与SCN的值就可以了 BBED> assign file 2 block 1 offset 112 = file 1 block 1 offset 112; Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y ub4 kcvfhrlc @112 0x32dc2c73 BBED> assign file 2 block 1 offset 116 = file 1 block 1 offset 116; ub4 kscnbas @116 0x002125e0 BBED> assign file 2 block 1 offset 484 = file 1 block 1 offset 484; ub1 pad @484 0xe1 BBED> assign file 2 block 1 offset 492 = file 1 block 1 offset 492; ub1 pad @492 0x74 BBED> sum apply dba 2,1 Check value for File 2, Block 1: current = 0x093b, required = 0x093b www.htz.pw > startup force nomount pfile=’/tmp/123.ora’; ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes复制
7.4 重建控制文件
www.htz.pw > @/tmp/control.ctl ORA-01081: cannot start already-running ORACLE – shut it down first Control file created. ORA-00283: recovery session canceled due to errors ORA-01610: recovery using the BACKUP CONTROLFILE option must be done复制
控制文件重建成功
7.5 遇到600错误
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2172129 generated at 07/19/2014 00:53:08 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2172129 for thread 1 is in sequence #1 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’ ORA-01112: media recovery not started www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [],[], []复制
这里触发了ORA-00600 kcbgtcr_13错误,只需要手动提交事务就可以了。
7.6 手动提交事务信息
[09:50:33]www.htz.pw > startup mount pfile=’/tmp/123.ora’; [09:50:34]ORACLE instance started. [09:50:34] [09:50:34]Total System Global Area 167772160 bytes [09:50:34]Fixed Size 2082432 bytes [09:50:34]Variable Size 100665728 bytes [09:50:34]Database Buffers 54525952 bytes [09:50:34]Redo Buffers 10498048 bytes [09:50:38]Database mounted. [09:50:51]www.htz.pw > recover database using backup controlfile until cancel; [09:50:51]ORA-00279: change 2172135 generated at 07/19/2014 01:41:17 needed for thread 1 [09:50:51]ORA-00289: suggestion : [09:50:51]/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ [09:50:51]%u_.arc [09:50:51]ORA-00280: change 2172135 for thread 1 is in sequence #1 [09:50:51] [09:50:51] [09:50:51]Specify log: {<RET>=suggested | filename | AUTO | CANCEL} [09:50:53]cancel [09:50:54]ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below [09:50:54]ORA-01194: file 1 needs more recovery to be consistent [09:50:54]ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’ [09:50:54] [09:50:54] [09:50:54]ORA-01112: media recovery not started [09:50:54] [09:50:54] [09:51:06]www.htz.pw > alter database open resetlogs; [09:51:09]alter database open resetlogs [09:51:09]* [09:51:09]ERROR at line 1: [09:51:09]ORA-01092: ORACLE instance terminated. Disconnection forced复制
后面alert报下面的错误
Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc: ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [] Sat Jul 19 01:43:01 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [] Sat Jul 19 01:43:01 2014 Error 704 happened during db open, shutting down database复制
在trace文件中查看有那些块没有提交。
[oracle@www.htz.pw ~]$grep -E ‘^Block header dump|^0x0’ /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 0x01 0x1f64 0x02 0x1ef8 Block header dump: 0x0040007a 0x01 0x0003.001.00000191 0x0080002b.014c.03 —- 1 fsc 0x0000.00000000 Block header dump: 0x0040017c 0x01 0x0000.022.00000002 0x00400196.0004.37 –U- 12 fsc 0x0000.00000147 Block header dump: 0x004000da 0x01 0x0004.00c.0000011d 0x0080559d.00d3.02 C— 0 scn 0x0000.0008ab18 Block header dump: 0x004000db 0x01 0x0008.017.00000002 0x00800080.0000.01 CBU- 0 scn 0x0000.00002404 0x02 0x0004.01a.0000017a 0x0080003c.016b.32 –U- 1 fsc 0x000e.001e8de0 Block header dump: 0x0040007a 0x01 0x0003.001.00000191 0x0080002b.014c.03 —- 1 fsc 0x0000.00000000 Block header dump: 0x0040006a 0x01 0x0000.008.00000034 0x0040019e.003b.07 C— 0 scn 0x0000.0021245c复制
通过10046跟踪报错的SQL语句
ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [] Current SQL statement for this session: select ctime, mtime, stime from obj$ where obj# = :1 —– Call Stack Trace —– 这里看到了报错的SQL语句,以SQL语句来搜索,直到搜索到如下的 Cursor#5(0x2a97ca18b0) state=FETCH curiob=0x2a97cba468 curflg=f fl2=0 par=0x2a97ca1710 ses=0x69f82a30 sqltxt(0x69a944b0)=select ctime, mtime, stime from obj$ where obj# = :1 hash=fa0bd3f60d6ee4f2495f9af8199b75b9 parent=0x6677c4b8 maxchild=01 plk=0x66f56af0 ppn=n cursor instantiation=0x2a97cba468 used=1405705379 child#0(0x69a94288) pcs=0x6677c0c8 clk=0x66f56dd0 ci=0x6677b7b0 pn=0x69ad37f0 ctx=0x6616fe90 kgsccflg=0 llk[0x2a97cba470,0x2a97cba470] idx=0 xscflg=e0141476 fl2=45000401 fl3=4022210c fl4=100 Bind bytecodes Opcode = 1 Unoptimized Offsi = 48, Offsi = 0 kkscoacd Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2a97cba020 bln=22 avl=02 flg=05 value=20复制
这里中可以看到绑定变量的值是20.在相同的版本其它的数据库中执行下面的操作
SQL> select rowid from obj$ where obj# =20; ROWID —————— AAAAASAABAAAAB6AAA SQL> @rowid_to_info.sql Enter value for rowid: AAAAASAABAAAAB6AAA ROWID_TYPE: 1 OBJECT_NUMBER: 18 RELATIVE_FNO: 1 BLOCK_NUMBER: 122 ROW_NUMBER: 0 PL/SQL procedure successfully completed. 正在好trace文件中的 Block header dump: 0x0040007a 0x01 0x0003.001.00000191 0x0080002b.014c.03 —- 1 fsc 0x0000.00000000复制
其实我们还可以从10046trace文件中找到此信息如下:
===================== PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1372757926978214 hv=429618617 ad=’69a944b0′ select ctime, mtime, stime from obj$ where obj# = :1 END OF STMT PARSE #5:c=0,e=234,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1372757926978212 BINDS #5: kkscoacd Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2a97cba020 bln=22 avl=02 flg=05 value=20 EXEC #5:c=0,e=330,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1372757926978586 WAIT #5: nam=’db file sequential read’ ela= 23 file#=1 block#=218 blocks=1 obj#=-1 tim=1372757926978753 WAIT #5: nam=’db file sequential read’ ela= 9 file#=1 block#=219 blocks=1 obj#=-1 tim=1372757926978804 WAIT #5: nam=’db file sequential read’ ela= 7 file#=1 block#=122 blocks=1 obj#=-1 tim=1372757926978841复制
这里需要注意的是绑定变量的值
在trace文件中可以发现下面的内容 tab 0, row 26, @0x18f1 tl: 70 fb: –H-FL– lb: 0x1 cc: 17 col 0: [ 2] c1 02 col 1: [ 4] c3 06 17 08 col 2: [ 1] 80 col 3: [12] 5f 4e 45 58 54 5f 4f 42 4a 45 43 54 col 4: [ 2] c1 02 col 5: *NULL* col 6: [ 1] 80 col 7: [ 7] 78 6c 03 0c 01 28 31 col 8: [ 7] 78 72 07 13 01 3b 01 col 9: [ 7] 78 6c 03 0c 01 28 31 col 10: [ 1] 80 col 11: *NULL* col 12: *NULL* col 13: [ 1] 80 col 14: *NULL* col 15: [ 1] 80 col 16: [ 4] c3 07 38 24复制
bbed手动提交事务,需要更改itl与行中的lck值
BBED> p ktbbh struct ktbbh, 48 bytes @20 ub1 ktbbhtyp @20 0x01 (KDDBTDATA) union ktbbhsid, 4 bytes @24 ub4 ktbbhsg1 @24 0x00000012 ub4 ktbbhod1 @24 0x00000012 struct ktbbhcsc, 8 bytes @28 ub4 kscnbas @28 0x0021251b ub2 kscnwrp @32 0x0000 b2 ktbbhict @36 1 ub1 ktbbhflg @38 0x02 (NONE) ub1 ktbbhfsl @39 0x00 ub4 ktbbhfnx @40 0x00000000 struct ktbbhitl[0], 24 bytes @44 struct ktbitxid, 8 bytes @44 ub2 kxidusn @44 0x0003 ub2 kxidslt @46 0x0001 ub4 kxidsqn @48 0x00000191 struct ktbituba, 8 bytes @52 ub4 kubadba @52 0x0080002b ub2 kubaseq @56 0x014c ub1 kubarec @58 0x03 ub2 ktbitflg @60 0x0001 (NONE) union _ktbitun, 2 bytes @62 b2 _ktbitfsc @62 0 ub2 _ktbitwrp @62 0x0000 ub4 ktbitbas @64 0x00000000 BBED> modify /x 80 offset 61 Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y File: /oracle/app/oracle/oradata/orcl1024/system01.dbf (1) Block: 122 Offsets: 61 to 70 Dba:0x0040007a ———————————————————————— 80000000 00000000 016c BBED> x /rn *kdbr[26] rowdata[5278] @6453 ————- flag@6453: 0x2c (KDRHFL, KDRHFF, KDRHFH) lock@6454: 0x01 cols@6455: 17 col 0[2] @6456: 1 col 1[4] @6459: 52207 col 2[1] @6464: 0 col 3[12] @6466: -0 col 4[2] @6479: 1 col 5[0] @6482: *NULL* col 6[1] @6483: 0 col 7[7] @6485: -0 col 8[7] @6493: -0 col 9[7] @6501: -0 col 10[1] @6509: 0 col 11[0] @6511: *NULL* col 12[0] @6512: *NULL* col 13[1] @6513: 0 col 14[0] @6515: *NULL* col 15[1] @6516: 0 col 16[4] @6518: 65535 BBED> modify /x 00 offset 6454 File: /oracle/app/oracle/oradata/orcl1024/system01.dbf (1) Block: 122 Offsets: 6454 to 6463 Dba:0x0040007a ———————————————————————— 001102c1 0204c306 1708 BBED> sum apply Check value for File 1, Block 122: current = 0x3d20, required = 0x3d20 BBED> verify DBVERIFY – Verification starting FILE = /oracle/app/oracle/oradata/orcl1024/system01.dbf BLOCK = 122 DBVERIFY – Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 1 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 0 Total Blocks Influx : 0复制
7.7 报00600坏块的错误
www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced Sat Jul 19 01:56:37 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_1894.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-00607: Internal error occurred while making a change to a data block ORA-00600: internal error code, arguments: [kddummy_blkchk], [1], [106], [6101], [], [], [], [] Error 604 happened during db open, shutting down database USER: terminating instance due to error 604 Instance terminated by USER, pid = 1894 ORA-1092 signalled during: alter database open resetlogs… 这里可以看到数据文件1,块106,出现了6101的错误。此错误由于是ITL中的值与LOCK不一致导致的。复制
bbed修改行的lock值
BBED> set dba 1,106 DBA 0x0040006a (4194410 1,106) BBED> verify DBVERIFY – Verification starting FILE = /oracle/app/oracle/oradata/orcl1024/system01.dbf BLOCK = 106 Block Checking: DBA = 4194410, Block Type = KTB-managed data block data header at 0x2a97696244 kdbchk: row locked by non-existent transaction table=0 slot=10 lockid=1 ktbbhitc=1 Block 106 failed with check code 6101 DBVERIFY – Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 1 Total Blocks Failing (Data) : 1 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 0 Total Blocks Influx : 0复制
此报错的修改见6101(row locked by non-existent transaction)
7.8 启动数据库
通过上面几步操作,再次启动数据库
www.htz.pw > startup mount pfile=’/tmp/123.ora’; ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. www.htz.pw > recover database using backup controlfile unitl cancel; ORA-00905: missing keyword www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2172139 generated at 07/19/2014 01:56:34 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_ %u_.arc ORA-00280: change 2172139 for thread 1 is in sequence #1 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’ ORA-01112: media recovery not started www.htz.pw > alter database open resetlogs; Database altered. 使用原参数能正常启动数据库。 www.htz.pw > startup force; ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Database mounted. Database opened. www.htz.pw >复制