ora-600 [kdsgrp1] 发生fetch 操作查找预期的行失败时, 常有ora-600 [qertbFetchByRowID] 伴随,这次遇到的一个aix 10.2.0.4 rac案例有 ora-7445 [kdr9ir2F0rst4srp0+0068] 和 ora-600 [25027], [6], [825242163] 陪伴, 而且 好几个ora-600 [25027] 的rdba 莫名指向了一些不相干数据块(这点是通过truncate 原表后25027没再出现推断)
The ORA-600 [25027]
ERROR:
Format: ORA-600 [25027] [a] [b]
VERSIONS:
versions 9.2 and above
ARGUMENTS:
Arg [a] Tablespace Number (TSN)
Arg [b] Decimal Relative Data Block Address (RDBA)
The ora-600 [qertbFetchByRowID]
cause Index corruption.
The ora-600 [kdsgrp1]
The ora-600 [kdsgrp1] error is thrown when a fetch operation fails to find the expected row. The error is hit in memory and so may be a memory only error or an error that results from corruption on disk.
This error may indicate (but is not restricted to) any of the following conditions:
The following is ora-600 [kdsgrp1] trace content.
注意以下信息,从rdba 0183c4b8 或 file#+block#,都指向了这个快的上的第5行数据没有找到(slot start with 0)。
做了以下尝试(部分数据是在本地测试演示)
1, alter system flush buffer_cache;
Re-execute the sql , the error still
2, Disabling rowCR setting _row_cr=FALSE
This is a static system parameter need to restart perform effect , not to consider .
3, Try to Recreate index
根据问题SQL,确定了执行计划中访问访表时的索引,drop and recreate the index(es) .( 生产环境需要时间窗口,避开业务期),结果index drop 后,重建报ora-3113 异常中断了。
4, Analyze index xxx validate structure online;
try all index report ORA-1499 or ora-600,
5, Analyze table <table_name> validate structure online;
report ORA-1499 error.
6, Try to Exp and CTAS with set 10231 event, Fails ora-3113.
If analyze and exporting the table (in the presence of chained rows) both report no errors then this should be considered a consistent read issue.
7, Try to skip corrupted block using rowid
# 下面有使用Tanel Poder大师的脚本
Note"
到这里我们就可以用以上的方法来跳过corrupted block,甚至可以跳过块上的某行,把损失降到最低。
TIP:
该列上如果有pk or unique index or 其它index ,可以尝试利用索引来找出corrupted block上的行记录部分信息, 找以前的备份或手工补录。其它方法也可以使用bbed 标注block 状态。
附上一些相关的sql
Quick method to identify table/index mismatch when analyze validate structure cascade takes significant time (文档 ID 1554054.1)
Script to identify TABLE/INDEX inconsistency
Use the next procedure to identify TABLE/INDEX inconsistency, which is based on the above query, for all indexes in a table:
References:
Causes and Solutions for ora-600 [kdsgrp1] (文档 ID 1332252.1)
The ORA-600 [25027]
ERROR:
Format: ORA-600 [25027] [a] [b]
VERSIONS:
versions 9.2 and above
ARGUMENTS:
Arg [a] Tablespace Number (TSN)
Arg [b] Decimal Relative Data Block Address (RDBA)
The ora-600 [qertbFetchByRowID]
cause Index corruption.
The ora-600 [kdsgrp1]
The ora-600 [kdsgrp1] error is thrown when a fetch operation fails to find the expected row. The error is hit in memory and so may be a memory only error or an error that results from corruption on disk.
This error may indicate (but is not restricted to) any of the following conditions:
1. Lost writes
2. Parallel DML issues
3. Index corruption
4. Data block corruption
5. Consistent read [CR] issues
6. Buffer cache corruption
7. Bugs (A full list of known issues is given in Note 285586.1 - ORA-600 [kdsgrp1] )
8. Storage device issue(ie. EMC SRDF/A DR switchover )
etc..
The following is ora-600 [kdsgrp1] trace content.
oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/product
System name: AIX
Node name: par3a
Release: 3
Version: 5
Machine: 00C45CC44C00
Instance name: zyk1
Redo thread mounted by this instance: 1
Oracle process number: 547
Unix process pid: 2470174, image: oracle@par3a
*** ACTION NAME:() 2014-08-04 10:30:04.677
*** MODULE NAME:(xxxx.exe) 2014-08-04 10:30:04.677
*** SERVICE NAME:(xxx) 2014-08-04 10:30:04.677
*** SESSION ID:(1057.655) 2014-08-04 10:30:04.676
row 0183c4b8.3 continuation at
file# 6 block# 246968 slot 4 not found
**************************************************
KDSTABN_GET: 1 ..... ntab: 76
curSlot: 4 ..... nrows: 8194
**************************************************
*** 2014-08-04 10:30:04.685
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
select BJZDRYBH,CJSJ,...
from a_view
where ...
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000101 ? 054435000 ?
ksedmp+0290 bl ksedst 104A2C690 ?
ksfdmp+0018 bl 03F26C3C
kgerinv+00dc bl _ptrgl
kgeasnmierr+004c bl kgerinv 000000000 ? 000000001 ? 2E700000000 ? 000000000 ? 11048AF40 ?
kdsgrp+0460 bl 01F94108
kdsfbr+0268 bl kdsgrp 1051B12D8 ? 4330000000000094 ? 1105111C8 ?
qertbFetchByRowID+0 bl 03F276FC9d0
kpofrws+019c bl _ptrgl
opifch2+13dc bl 03F26538
opifch+003c bl opifch2 700000203CAB6EC ? 000000000 ? FFFFFFFFFFF9DD0 ?
opiodr+0ae0 bl _ptrgl
ttcpip+1020 bl _ptrgl
opitsk+1124 bl 01F96AC8
opiino+0990 bl opitsk 0FFFFD8C0 ? 000000000 ?
opiodr+0ae0 bl _ptrgl
opidrv+0484 bl 01F95914
sou2o+0090 bl opidrv 3C02D99B7C ? 4A0145928 ? FFFFFFFFFFFF7C0 ?
opimai_real+01bc bl 01F93294
main+0098 bl opimai_real 000000000 ? 000000000 ?
__start+0098 bl main 000000000 ? 000000000 ?
注意以下信息,从rdba 0183c4b8 或 file#+block#,都指向了这个快的上的第5行数据没有找到(slot start with 0)。
row 0183c4b8.3 continuation at
file# 6 block# 246968 slot 4 not found
做了以下尝试(部分数据是在本地测试演示)
1, alter system flush buffer_cache;
Re-execute the sql , the error still
2, Disabling rowCR setting _row_cr=FALSE
This is a static system parameter need to restart perform effect , not to consider .
3, Try to Recreate index
根据问题SQL,确定了执行计划中访问访表时的索引,drop and recreate the index(es) .( 生产环境需要时间窗口,避开业务期),结果index drop 后,重建报ora-3113 异常中断了。
4, Analyze index xxx validate structure online;
try all index report ORA-1499 or ora-600,
5, Analyze table <table_name> validate structure online;
report ORA-1499 error.
6, Try to Exp and CTAS with set 10231 event, Fails ora-3113.
If analyze and exporting the table (in the presence of chained rows) both report no errors then this should be considered a consistent read issue.
7, Try to skip corrupted block using rowid
# 下面有使用Tanel Poder大师的脚本
sys@ANBOB>@dba 183c4b8
RFILE# BLOCK# BIGFILE_BLOCK#
-------------------- -------------------- --------------------
DUMP_CMD
----------------------------------------------------------------------------------------
6 246968 25412792
-- alter system dump datafile 6 block 246968
7.1 Try to dump block report ora-3113 error.
sys@ANBOB>@dba_mk 6 246968
RDBA
--------------------
0x183C4B8
sys@ANBOB>@dba2 6 246968
OWNER SEGMENT_NAME PARTITION_NAME TABLESPACE_NAME EXTENT_ID
------------- ------------------ ------------------ ------------------ -----------
anbob tt USERS 0
sys@ANBOB>@o2 anbob.tt
owner object_name object_typ status OID D_OID CREATED LAST_DDL_TIME
--------------- ------------------ ---------- --------- ----------- --------- ------------------- -------------------
ANBOB TT TABLE VALID 108933 109805 2014-07-01 14:22:08 2014-07-11 11:38:15
sys@ANBOB>select rowid r,id from anbob.tt;
R ID
------------------ --------------------
AAAaztAAEAAAKbHAAA 1
AAAaztAAEAAAKbHAAB 2
AAAaztAAEAAAKbHAAC 3
sys@ANBOB>@rowid AAAaztAAEAAAKbHAAA
Show file, block, row numbers from rowid AAAaztAAEAAAKbHAAA...
RFILE# BLOCK# ROW# ROWID_DBA
-------------------- -------------------- -------------------- --------------------
DUMP_COMMAND
----------------------------------------------------------------------------------
4 42695 0 0x100A6C7
alter system dump datafile 4 block 42695; -- @dump 4 42695 .
sys@ANBOB>@rowid_mk 109805 4,42695,0
get rowid data object id 109805
ROWID_STR
------------------
AAAaztAAEAAAKbHAAA
sys@ANBOB>select * from anbob.tt where rowid='AAAaztAAEAAAKbHAAA';
ID T
-------------------- -------------------
1 1990-01-02 00:00:00
sys@ANBOB>@rowid_mk 109805 4,42695,2
get rowid data object id 109805
ROWID_STR
------------------
AAAaztAAEAAAKbHAAC
sys@ANBOB>select * from anbob.tt where rowid='AAAaztAAEAAAKbHAAC';
ID T
-------------------- -------------------
3 1990-01-04 00:00:00
Note"
到这里我们就可以用以上的方法来跳过corrupted block,甚至可以跳过块上的某行,把损失降到最低。
# if in archivelog mode
create table t_bak nologging as select * from t where rowid<'';
insert /*+append*/ into t_bak select * from t where rowid>'';
truncate table t;
insert into t select* from t_bak;
TIP:
该列上如果有pk or unique index or 其它index ,可以尝试利用索引来找出corrupted block上的行记录部分信息, 找以前的备份或手工补录。其它方法也可以使用bbed 标注block 状态。
select /*+index(t xxx)*/ COLUMN from t where rowid='';
附上一些相关的sql
[oracle@db231 ~]$ cat dba.sql
set verify off
col rfile# new_value v_dba_rfile
col block# new_value v_dba_block
col bigfile_block# new_value v_bigfile_block
col dba_object head object for a40 truncate
col dba_DBA head DBA for a20
select
dbms_utility.data_block_address_file(to_number('&1','XXXXXXXXXX')) RFILE#,
dbms_utility.data_block_address_block(to_number('&1','XXXXXXXXXX')) BLOCK#,
TO_NUMBER('&1','XXXXXXXXXX') bigfile_block#,
'-- alter system dump datafile '||dbms_utility.data_block_address_file(to_number('&1','XXXXXXXXXX'))
||' block '||dbms_utility.data_block_address_block(to_number('&1','XXXXXXXXXX')) dump_cmd
from dual;
[oracle@db231 ~]$ cat dba2.sql
col SEGMENT_NAME for a30
col owner for a30
col partition_name for a30
col tablespace_name for a30
select owner, segment_name, partition_name, tablespace_name, extent_id
from dba_extents
where file_id = &1
and &2 between block_id and block_id + blocks - 1;
[oracle@db231 ~]$ cat dba_mk.sql
set verify off
def fno=&1
def blkno=&2
select lpad('0x'||trim(to_char(dbms_utility.MAKE_DATA_BLOCK_ADDRESS(&fno, &blkno), 'XXXXXXXX')), 10) rdba from dual;
[oracle@db231 ~]$ cat rowid.sql
prompt Show file, block, row numbers from rowid &1....
def rowid=&1
select
dbms_rowid.ROWID_RELATIVE_FNO('&rowid') rfile#
, dbms_rowid.ROWID_BLOCK_NUMBER('&rowid') block#
, dbms_rowid.ROWID_ROW_NUMBER('&rowid') row#
, lpad('0x'||trim(to_char(dbms_utility.MAKE_DATA_BLOCK_ADDRESS(dbms_rowid.ROWID_RELATIVE_FNO('&rowid')
, dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')), 'XXXXXXXX')), 10) rowid_dba
, 'alter system dump datafile '||dbms_rowid.ROWID_RELATIVE_FNO('&rowid')||' block '||
dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')||'; -- @dump '||
dbms_rowid.ROWID_RELATIVE_FNO('&rowid')||' '||
dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')||' .' dump_command
from
dual
/
[oracle@db231 ~]$ cat rowid_mk.sql
Prompt get rowid data object id &1
set verify off
def doid=&1
def blk=&2
select DBMS_ROWID.ROWID_CREATE(1, &doid, &blk) rowid_str from dual;
Quick method to identify table/index mismatch when analyze validate structure cascade takes significant time (文档 ID 1554054.1)
Script to identify TABLE/INDEX inconsistency
Use the next procedure to identify TABLE/INDEX inconsistency, which is based on the above query, for all indexes in a table:
create or replace procedure analyze_quick(owner_table varchar2, name_table varchar2, name_index varchar2 default null)
is
s varchar2(30000);
num_indexes number := 0;
sum_hash number;
begin
for i in (select a.owner, a.index_name, b.column_name
from dba_indexes a, dba_ind_columns b
where a.table_owner = upper(owner_table)
and a.table_name = upper(name_table)
and (a.index_name = upper(name_index) or name_index is null)
and a.index_type not in ('IOT - TOP'
,'LOB'
,'FUNCTION-BASED NORMAL'
,'FUNCTION-BASED DOMAIN'
,'CLUSTER')
and a.owner = b.index_owner
and a.index_name = b.index_name
and a.table_name = b.table_name
and b.column_position = 1) loop
num_indexes := num_indexes+1;
s := 'select /*+ full(t1) parallel */ sum(ora_hash(rowid)) from ';
s := s || owner_table || '.' || name_table || ' t1 where ' || i.column_name ||' is not null MINUS ';
s := s || 'select /*+ index_ffs(t '|| i.index_name||') */ sum(ora_hash(rowid)) from ';
s := s || owner_table || '.' || name_table || ' t where ' || i.column_name ||' is not null';
begin
execute immediate s into sum_hash;
if sum_hash > 0 then
raise_application_error(-20220,'TABLE/INDEX MISMATCH detected!! Table: '
||upper(owner_table)||'.'||upper(name_table)
||' Index: '
||upper(i.index_name));
end if;
exception
when no_data_found then null; -- no_data_found means that there is not inconsistency
end;
end loop;
if num_indexes = 0 and name_index is not null then
raise_application_error(-20221,'Check was not executed. Index '||upper(name_index)||' does not exist for table '||upper(name_table)|| ' or table does not exist');
elsif num_indexes = 0 then
raise_application_error(-20222,'Check was not executed. No INDEXES with index_type=NORMAL found for table '||upper(name_table)|| ' or table does not exist');
end if;
end;
/
SYNTAX
Identify TABLE_NAME inconsistency with all its indexes:
SQL> execute analyze_quick('&TABLE_OWNER','&TABLE_NAME')
If an inconsistency is detected, the procedure will stop and will not check the rest of indexes for TABLE_NAME.
Identify TABLE_NAME inconsistency with index INDEX_NAME:
SQL> execute analyze_quick('&TABLE_OWNER','&TABLE_NAME', '&INDEX_NAME')
Execution example when TABLE/INDEX inconsistency is identified:
SQL> execute analyze_quick('SCOTT','EMP')
BEGIN analyze_faster('SYS','OBJ$'); END;
*
ERROR at line 1:
ORA-20220: TABLE/INDEX MISMATCH detected!! Table: SCOTT.EMP Index: I_EMP_EMPNO
ORA-06512: at "SYS.ANALYZE_QUICK", line 34
ORA-06512: at line 1
References:
Causes and Solutions for ora-600 [kdsgrp1] (文档 ID 1332252.1)
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。