环境说明:RedHat Linux 5.10 16G 8C,Oracle 11.2.0.4,在一个P to v的迁移过程中,没有停机迁移,出现了数据库内部不一致现象
故障现象:数据库无法正常open,报错ORA-00704 ORA-01555
SQL>alter database open;
alter database open
*
ERROR at line 1:
ORA-01092:ORACLE instance terminated. Disconnection forced
ORA-00704:bootstrap process failure
ORA-00704:bootstrap process failure
ORA-00604:error occurred at recursive SQL level 1
ORA-01555:snapshot too old: rollback segment number 4 with name
"_SYSSMU4_3204654454$"too small
Process ID: 17876
Session ID: 292 Serial number: 7
在碰到该问题时,同事已使用各种方法,比如:
设置undo_management=manual,
oradebug手工推进SCN,
设置隐含参数_allow_resetlogs_corruption,_offline_rollback_segments,_corrupted_rollback_segments屏蔽回滚段
10513 event,均无法启动。
首先检查数据库的环境:
SQL>select current_scn,to_char(current_scn,'XXXXXXXXXXXXXXXXXX')cs_16,checkpoint_change#,to_char(checkpoint_change#,'XXXXXXXXXXXXXXXXXX') cc_16 from v$database;
CURRENT_SCN CS_16 CHECKPOINT_CHANGE# CC_16
------------------- ------------------- ------------------- ---------
0 0 10396391515119 974993923EF
SQL>select a.file#,a.name,a.checkpoint_change#,a.last_change#,status from v$datafile a;
FILE# NAME CHECKPOINT_CHANGE# LAST_CHANGE# STATUS
----- ------------------------------------------------ --------------
1 oradata/training/system01.dbf 10396391515119 SYSTEM
2 oradata/training/sysaux01.dbf 10396391515119 ONLINE
3 oradata/training/undotbs01.dbf 10396391515119 ONLINE
4 oradata/training/users01.dbf 10396391515119 ONLINE
5 oradata/training/xxxxxxx01.dbf 10396391515119 ONLINE
......
34 oradata/training/xxxxxxx30.dbf 10396391515119 ONLINE
35 oradata/training/tcsdata01.dbf 10396391515119 ONLINE
36 oradata/training/gdciq01.dbf 10396391515119 ONLINE
36 rows selected.
SQL>select a.file#,a.name,a.recover,a.checkpoint_change#,status fromv$datafile_header a;
FILE# NAME REC CHECKPOINT_CHANGE# STATUS
---------------------------------- --- ------------------- -------
1 oradata/training/system01.dbf NO 10396391515119 ONLINE 2 oradata/training/sysaux01.dbf NO 10396391515119 ONLINE 3 oradata/training/undotbs01.dbf NO 10396391515119 ONLINE 4 oradata/training/users01.dbf NO 10396391515119 ONLINE 5 oradata/training/xxxxxxx01.dbf NO 10396391515119 ONLINE ......
34 oradata/training/xxxxxxx30.dbf NO 10396391515119 ONLINE 35 oradata/training/tcsdata01.dbf NO 10396391515119 ONLINE 36 oradata/training/gdciq01.dbf NO 10396391515119 ONLINE36 rows selected.
可以看到数据文件头都是处于一致的SCN,推进SCN是没有必要且无效的,那么什么原因导致无法open,做10046 event可以看到,
=====================
PARSINGIN CURSOR #47646723591832 len=142 dep=3 uid=0 oct=3 lid=0 tim=1470896704109860 hv=361892850 ad='25ee1a9b0' sqlid='7bd391hat42zk'
select *+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
ENDOF STMT
PARSE #47646723591832:c=1000,e=603,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=0,tim=1470896704109859
BINDS #47646723591832:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2b559e08ca50 bln=22 avl=02 flg=05 value=4
EXEC #47646723591832:c=1000,e=884,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=3,plh=906473769,tim=1470896704110869
WAIT #47646723591832: nam='db file sequential read' ela= 12 file#=1 block#=321 blocks=1 obj#=34 tim=1470896704110933
WAIT #47646723591832: nam='db file sequential read' ela= 8 file#=1 block#=225 blocks=1 obj#=15 tim=1470896704110999
FETCH #47646723591832:c=0,e=146,p=2,cr=2,cu=0,mis=0,r=1,dep=3,og=3,plh=906473769,tim=1470896704111050
STAT #47646723591832 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWIDUNDO$ (cr=2 pr=2 pw=0 time=128 us)'
STAT #47646723591832 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=70 us)'
CLOSE #47646723591832:c=0,e=5,dep=3,type=0,tim=1470896704111158
WAIT #47646723178048: nam='db file sequential read' ela= 13 file#=3 block#=176 blocks=1 obj#=0 tim=1470896704111207
WAIT #47646723178048: nam='db file sequential read' ela= 13 file#=3 block#=38674 blocks=1 obj#=0 tim=1470896704111366
FETCH #47646723178048:c=3000,e=2718,p=7,cr=7,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1470896704111762
STAT #47646723178048 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWIDOBJ$ (cr=0 pr=0 pw=0 time=11 us)'
STAT #47646723178048 id=2 cnt=0 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=0pr=0 pw=0 time=9 us)'
=====================
PARSEERROR #47646723179568:len=60 dep=1 uid=0 oct=3 lid=0 tim=1470896704111888 err=604
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'
ORA-00704:bootstrap process failure
ORA-00604:error occurred at recursive SQL level 2
ORA-01555:snapshot too old: rollback segment number 4 with name
"_SYSSMU4_3204654454$" too small
ORA-00704:bootstrap process failure
ORA-00604:error occurred at recursive SQL level 2
ORA-01555:snapshot too old: rollback segment number 4 with name
"_SYSSMU4_3204654454$" too small
尝试使用非常规手段gdb方法启动数据库无效
[oracle@training]$ ps -ef|grep LOCAL
oracle 19562 19561 0 15:24 ? 00:00:00 oracletraining (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 19564 19080 0 15:24 pts/2 00:00:00 grep LOCAL
[oracle@training]$ gdb $ORACLE_HOME/bin/oracle 19562
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-45.el5)
......
(gdb) break kokiasg
Breakpoint 1 at 0x15036a4
(gdb) continue
Continuing.Program received signal SIGSEGV, Segmentation fault. 0x0000000009805715 in slaac_int ()
Thu Aug 11 15:26:21 2016 SMON:enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0d17.ebce9d23):select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /oracle/app/oracle/diag/rdbms/training/training/trace/training_ora_19562.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 4 with name
"_SYSSMU4_3204654454$" too small
Errors in file /oracle/app/oracle/diag/rdbms/training/training/trace/training_ora_19562.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 4 with name
"_SYSSMU4_3204654454$" too small Error 704 happened during db open, shutting down database
USER (ospid: 19562): terminating the instance due to error 704
此时只能使用终极武器,使用BBED上古神器,将rollback segment置为offline状态,首先将10g Linux平台bbed文件拷贝到相应的目录,并且编译
[oracle@training lib]$ ./bbed
Password:
BBED: Release 2.0.0.0.0 - Limited Production on Thu Aug 11 15:35:56 2016 Copyright (c) 1982, 2011, Oracle and/or its affiliates.
All rights reserved. *************
!!! For Oracle Internal Use only !!! ***************
BBED>set filename "/oradata/training/system01.dbf"
FILENAME /oradata/training/system01.dbf
BBED>set mode edit
MODE Edit
BBED>set block 225
BLOCK# 225
BBED>map
File: /oradata/training/system01.dbf (0)
Block: 225 Dba:0x00000000
------------------------------------------------------------
KTB Data Block (Table/Cluster)
struct kcbh, 20 bytes @0
struct ktbbh, 48 bytes @20
struct kdbh, 14 bytes @68
struct kdbt[1], 4 bytes @82
sb2 kdbr[24] @86
ub1 freespace[3690] @134
ub1 rowdata[4364] @3824
ub4 tailchk @8188
BBED>p *kdbr[4]
rowdata[1872]
-------------
ub1 rowdata[1872] @5696 0x2c
BBED>x /rncnnnnn
rowdata[1872] @5696
-------------
flag@5696:0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@5697:0x00
cols@5698: 17
col 0[2] @5699: 4
col 1[20] @5702: _SYSSMU4_3204654454$
col 2[2] @5723: 1
col 3[2] @5726: 3
col 4[3] @5729: 176
col 5[6] @5733: 2527430713
col 6[3] @5740: 2420
col 7[4] @5744: 156153
col 8[3] @5749: 2475
col 9[1] @5753: 0
col 10[2] @5755: 3
col 11[2] @5758: 2
col 12[0] @5761: *NULL*
col 13[0] @5762: *NULL*
col 14[0] @5763: *NULL*
col 15[0] @5764: *NULL*
col 16[2] @5765: 2
BBED>dump /v offset 5757 count 64
File: /oradata/training/system01.dbf (0)
Block: 225 Offsets: 5757 to 5820 Dba:0x00000000
-------------------------------------------------------
0402c103 ffffffff 02c1032c 001102c1 l...........,....
06135f53 5953534d 55355f33 38373635 l.._SYSSMU5_38765
32343033 2402c102 02c10403 c2025d06 l2403$.........].
c51a1c2c 081e03c2 191504c3 103e3803 l...,.........>8.
<16 bytes per line>
BBED>m /x 02 offset 5757
Warning:contents of previous BIFILE will be lost. Proceed? (Y/N) y
File: /oradata/training/system01.dbf (0)
Block: 225 Offsets: 5757 to 5820 Dba:0x00000000
---------------------------------------------------------------------
0202c103 ffffffff 02c1032c 001102c1 06135f53 5953534d 55355f33 38373635 32343033 2402c102 02c10403 c2025d06 c51a1c2c 081e03c2 191504c3 103e3803
<32 bytes per line>
BBED>sum apply
Check value for File 0, Block 225:
current = 0x9d05, required = 0x9d05
修改完成后,open时又报出_SYSSMU8 ORA-01555,以同样的方式处理
Thu Aug 11 15:38:53 2016 SMON:enabling cache recovery
......
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 8 with name
"_SYSSMU8_1016684853$" too small
Errors in file /oracle/app/oracle/diag/rdbms/training/training/trace/training_ora_19648.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 8 with name
"_SYSSMU8_1016684853$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 19648): terminating the instance due to error 704
Instance terminated by USER, pid = 19648
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (19648) as a result of ORA-1092
Thu Aug 11 15:38:56 2016 ORA-1092 : opitsk aborting process
BBED> p *kdbr[8]
rowdata[2159]
-------------
ub1 rowdata[2159]
@5983 0x2c
BBED>x /rncnnnnn
rowdata[2159] @5983
-------------
flag@5983:0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@5984:0x00
cols@5985: 17
col 0[2] @5986: 8
col 1[20] @5989: _SYSSMU8_1016684853$
col 2[2] @6010: 1
col 3[2] @6013: 3
col 4[3] @6016: 240
col 5[6] @6020: 2527430702
col 6[3] @6027: 2420
col 7[4] @6031: 156228
col 8[3] @6036: 2562
col 9[1] @6040: 0
col 10[2] @6042: 3
col 11[2] @6045: 2
col 12[0] @6048: *NULL*
col 13[0] @6049: *NULL*
col 14[0] @6050: *NULL*
col 15[0] @6051: *NULL*
col 16[2] @6052: 2
BBED>dump /v offset 6044 count 64
File: /oradata/training/system01.dbf (0)
Block: 225 Offsets: 6044 to 6107
Dba:0x00000000
-------------------------------------------------------
0402c103 ffffffff 02c1032c 001102c1 l...........,....
0a145f53 5953534d 55395f31 37323439 l.._SYSSMU9_17249
31363134 342402c1 0202c104 03c20339 l16144$.........9
06c51a1c 2c081903 c2191504 c3104902 l....,.........I.
<16 bytes per line>
BBED> m /x 02 offset 6044
File: /oradata/training/system01.dbf (0)
Block: 225 Offsets: 6044 to 6107 Dba:0x00000000
---------------------------------------------------------------------
0202c103 ffffffff 02c1032c 001102c1 0a145f53 5953534d 55395f31 37323439 31363134 342402c1 0202c104 03c20339 06c51a1c 2c081903 c2191504 c3104902
<32 bytes per line>
BBED>sum apply
Check value for File 0, Block 225:
current = 0x9d03, required = 0x9d03
SQL>conn / as sysdba
Connected to an idle instance.
SQL> startup mount
ORACLE instance started.
Total System Global Area 8551575552 bytes
Fixed Size 2270360 bytes
Variable Size 1342180200 bytes
Database Buffers 7197425664 bytes
Redo Buffers 9699328 bytes
Database mounted.
SQL> alter database open;
Database altered.
SQL> select status from v$instance;
STATUS
------------
OPEN
后续处理,重建undo表空间:
Thu Aug 11 16:20:06 2016 Errors in file /oracle/app/oracle/diag/rdbms/training/training/trace/training_j000_20028.trc:
ORA-12012:error on auto execute of job 101
ORA-12008:error in materialized view refresh path
ORA-01552:cannot use system rollback segment for non-system tablespace 'SINOAIR'
SQL>drop tablespace UNDOTBS1 including contents and datafiles;
drop tablespace UNDOTBS1 including contents and datafiles
*
ERROR at line 1:
ORA-01561:failed to remove all objects in the tablespace specified
SQL>select * from v$tablespace;
TS# NAME INC BIG FLA ENC
---------- ------------------------------ --- --- --- ---
0 SYSTEM YES NO YES
1 SYSAUX YES NO YES
2 UNDOTBS1 YES NO YES
3 TEMP NO NO YES
4 USERS YES NO YES
5 SINOAIR YES NO YES
6 TCSTEST_DATA YES NO YES
7 GDCIQ YES NO YES
8 UNDOTBS2 YES NO YES
9 rows selected.
SQL>update seg$ set type# = 3 where ts#=2;
3 rows updated.
SQL>commit;
Commit complete.
SQL>drop tablespace UNDOTBS1 including contents and datafiles;
Tablespace dropped.