故障描述
最近迁移中遇到一个free block 问题。模拟记录。
xtts迁移备份报错(测试环境数据):
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/24/2020 09:45:47
ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/oracle/oradata/orcl10/t1.dbf
原因分析
之前确实有坏块,因为没有rman备份,无法直接修复。使用event 10231和dbms.repair跳过,迁移前也已经drop。这个网上教程很多,处理方式也有很多,但都没有后续。使用上述办法数据层次上已经正常,其实数据修复后这个块只是不在具体对象中,是一个free block,在下一次format前仍是坏块。
xtts调用rman,rman对空块不备份并不是直接跳过,也会检测。
!!!再次说明rman备份的重要性。有一个rman就不用这么大费周章。
处理思路:
1、 bbed制造一个空块替换 (慎用)
2、 allocate extent datafile方式覆盖 (推荐)
3、 resize或10g环境中的truncate或许有用,但都是有限制的,本来也不是针对这个问题的处理方式。成功率并不高
故障模拟
说的有点绕,模拟一下就完事了。
模拟坏块及没有rman备份的修复
1、新建一个表插入6行数据。通过 rowid查看测试数据的文件号(rel_fno)、块号( blockno ,)和行号(rowno)
select rowid,
dbms_rowid.rowid_relative_fno(rowid) rel_fno,
dbms_rowid.rowid_block_number(rowid) blockno,
dbms_rowid.rowid_row_number(rowid) rowno
from t11;
ROWID REL_FNO BLOCKNO ROWNO
------------------ ---------- ---------- ----------
AAAMpCAAFAAAAAOAAA 5 14 0
AAAMpCAAFAAAAAOAAB 5 14 1
AAAMpCAAFAAAAAOAAC 5 14 2
AAAMpCAAFAAAAAOAAD 5 14 3
AAAMpCAAFAAAAAOAAE 5 14 4
AAAMpCAAFAAAAAOAAF 5 14 5
2、使用bbed对file#=5,block#=14进行破坏
bbed网上教程也很多,简单记录下:
set dba 5,14
dump /v dba 5,14 offset 0
modify /x 12345678 dba 5,14 offset 0 --修改
sum dba 5,14 apply 提交
3、测试数据是否正常
alter system flush buffer_cache; -数据已经缓存到内存中,手工刷新,不然不会报错
查询提示坏块:
09:42:48 SQL> select * from t11;
select * from t11
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 14)
ORA-01110: data file 5: '/u01/app/oracle/oradata/orcl10/t1.dbf'
4、修复坏块
修复坏块方法很多:
- event 10231
- dbms.repair
- truncate 重新导入
- drop 重建
- rebulid 索引重建
09:42:57 SQL> truncate table t11;
Table truncated.
09:43:04 SQL> select * from t11;
no rows selected
确实可以了查询了,之前查行号显示6行数据,这里0行,没有备份的修复代价就是这部分数据丢失。再次提醒备份的重要性。
网上大部分的教程也都是这一部分,基本到这就结束了。这个工作真的彻底完成了吗?
5、dbv与rman检测
- DBA检测
[oracle@rhel6 ~]$ dbv file='/u01/app/oracle/oradata/orcl10/t1.dbf'
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Nov 24 09:47:02 2020
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl10/t1.dbf
Page 14 is marked corrupt
Corrupt block relative dba: 0x0140000e (file 5, block 14)
Bad header found during dbv:
Data in bad block:
type: 18 format: 4 rdba: 0x0140000e
last change scn: 0x0000.000959ff seq: 0x8 flg: 0x06
spare1: 0x56 spare2: 0x78 spare3: 0x0
consistency value in tail: 0x59ff0608
check value in block header: 0x588c
computed block checksum: 0x0
DBVERIFY - Verification complete
Total Pages Examined : 1280
Total Pages Processed (Data) : 4
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 11
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 1264
**Total Pages Marked Corrupt : 1**
Total Pages Influx : 0
Highest block SCN : 613140 (0.613140)
- RMAN检测
backup validate check logical datafile 5;
--校验后查询
09:50:36 SQL> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
5 14 1 0 CORRUPT
--还不死心,直接备份:
RMAN> backup datafile 5;
Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: starting piece 1 at 24-NOV-20
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/24/2020 09:45:47
ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/oracle/oradata/orcl10/t1.dbf
基本验证了上述理论。重建方式或手工标记只是解决了数据上的。坏块还是存在的,被置为free block。如果确认以后不再使用rman的情况下,也可以这么做。
模拟故障处理
1、根据以上dbv或rman信息确定坏块信息
- 文件号: FILE#=5
- 块 号: BLOCK#=14
- 文件名: /u01/app/oracle/oradata/orcl10/t1.dbf --T1表空间
2、检查块是否是某个对象的一部分
这个查询在大库中会很慢,尝试从v$bh中抓取失败,等一会。
select segment_name, segment_type, owner
from dba_extents
where file_id = 5
and 14 between block_id
and block_id + blocks -1;
no rows selected --dba_extents已经查不到了,不属于任何对象
3、如果块不属于任何对象,查询dba_free_space确认块是否属于数据文件的可用空间
Select * from dba_free_space where file_id= 5
and 14 between block_id and block_id + blocks -1;
TABLESPACE_NAME FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
T1 5 9 65536 8 5
运行以下SQL,确认块是位于可用空间中还是已占用空间中
这sql也很慢,模拟的时候可以执行找到原因,方便理解。生产修复可以略过。
set lines 200 pages 10000
col segment_name format a30
SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
, greatest(e.block_id, c.block#) corr_start_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
- greatest(e.block_id, c.block#) + 1 blocks_corrupted
, null description
FROM dba_extents e, v$database_block_corruption c
WHERE e.file_id = c.file#
AND e.block_id <= c.block# + c.blocks - 1
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
, header_block corr_start_block#
, header_block corr_end_block#
, 1 blocks_corrupted
, 'Segment Header' description
FROM dba_segments s, v$database_block_corruption c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
, greatest(f.block_id, c.block#) corr_start_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
- greatest(f.block_id, c.block#) + 1 blocks_corrupted
, 'Free Block' description
FROM dba_free_space f, v$database_block_corruption c
WHERE f.file_id = c.file#
AND f.block_id <= c.block# + c.blocks - 1
AND f.block_id + f.blocks - 1 >= c.block#
ORDER BY file#, corr_start_block#;
--最后列,free block
OWNER SEGMENT_TYPE SEGMENT_NAME PARTITION_NAME FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED DESCRIPTION
------------------------------ ------------------ ------------------------------ ------------------------------ ---------- ----------------- --------------- ---------------- --------------
5 14 14 1 Free Block
4、创建中间表
创建一个表 ,该表位于出现坏块的表空间T1中,使用nologging选项,避免生成redo
conn t1/t1
create table s (n number,c varchar2(4000)) nologging;
--验证表是否创建在了正确的表空间
select segment_name,tablespace_name from user_segments where segment_name='S' ;
SEGMENT_NAME TABLESPACE_NAME
------------------------------ ------------------------------
S T1
5、在表上创建触发器
根据dbv的结果输入变量,块号14 ,数据文件5
conn / as sysdba
CREATE OR REPLACE TRIGGER corrupt_trigger
AFTER INSERT ON t1.s
REFERENCING OLD AS p_old NEW AS new_p
FOR EACH ROW
DECLARE
corrupt EXCEPTION;
BEGIN
IF (dbms_rowid.rowid_block_number(:new_p.rowid)=&blocknumber)
and (dbms_rowid.rowid_relative_fno(:new_p.rowid)=&filenumber) THEN
RAISE corrupt;
END IF;
EXCEPTION
WHEN corrupt THEN
RAISE_APPLICATION_ERROR(-20000, 'Corrupt block has been formatted');
END;
/
6、计算分配空间
Select BYTES/1024 from dba_free_space where file_id=5 and 1280 between block_id and block_id + blocks -1;
BYTES/1024
----------
64
为受影响的数据文件中的表分配空间,这里计算下来是64K。
7、 关闭datafile 自动扩展
先查询下,坏块修复在改回原值保持一致。
select FILE_NAME,AUTOEXTENSIBLE from dba_data_files where FILE_NAME='/u01/app/oracle/oradata/orcl10/t1.dbf';
alter database datafile '/u01/app/oracle/oradata/orcl10/t1.dbf' autoextend off;
8、分配空间
alter table t1.s allocate extent (DATAFILE '/u01/app/oracle/oradata/orcl10/t1.dbf' SIZE 64K);
如果在这个数据文件中有多个空闲extent,或者计算的分配空间很大,也可以使用这个循环:
BEGIN
for i in 1..1000000 loop
EXECUTE IMMEDIATE 'alter table t1.s allocate extent (DATAFILE '||'''/u01/app/oracle/oradata/orcl10/t1.dbf''' ||'SIZE 64K) ';
end loop;
end ;
/
使用这个循环也有一个好处,第9步中的查询会很慢(测试环境可能几秒钟跑完了,实际环境中跑了接近2小时),使用for 循环可以避免多次查询,由于datafile自动扩展关了,受限于datafile 32G并不会造成太多数据。
9、确认坏块成为新建表的一部分
select segment_name, segment_type, owner
from dba_extents
where file_id = 5
and 1280 between block_id
and block_id + blocks -1 ;
SEGMENT_NAME SEGMENT_TYPE OWNER
------------------------------ ------------------ ------------------------------
S TABLE T1
10、向表中插入数据格式化块
BEGIN
FOR i IN 1..1000000000 LOOP
INSERT /*+ APPEND */ INTO t1.s select i, lpad('REFORMAT',3092, 'R') from dual;
commit ;
END LOOP;
END;
每向表中插入一行就会触发触发器,当向坏块中插入第一行数据的时候,会产生ORA-2000异常 。
ERROR at line 1:
ORA-20000: Corrupt block has been formatted
ORA-06512: at “SYS.CORRUPT_TRIGGER”, line 10
ORA-04088: error during execution of trigger ‘SYS.CORRUPT_TRIGGER’
ORA-06512: at line 4
11、 重新验证
直接dbv不行,要先使用rman在用dbv就正常了
- RMAN
RMAN> backup validate check logical datafile 5;
Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=159 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 24-NOV-20
SQL> select * from v$database_block_corruption;
no rows selected
RMAN> backup datafile 5;
Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: starting piece 1 at 24-NOV-20
channel ORA_DISK_1: finished piece 1 at 24-NOV-20
piece handle=/u01/app/oracle/product/10.2.0/db_1/dbs/05vganpf_1_1 tag=TAG20201124T110415 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 24-NOV-20
- DBV
[oracle@rhel6 ~]$ dbv file='/u01/app/oracle/oradata/orcl10/t1.dbf'
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Nov 24 11:03:53 2020
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl10/t1.dbf
DBVERIFY - Verification complete
Total Pages Examined : 7680
Total Pages Processed (Data) : 1147
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 134
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 6399
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 631762 (0.631762)
12、清理
drop table,切换日志和checkpoint后删除触发器
drop table scott.s;
alter system switch logfile;
alter system switch logfile;
alter system checkpoint;
DROP trigger corrupt_trigger ;