暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

allocate处理free block报错RMAN-03009,ORA-1956

1547

故障描述

最近迁移中遇到一个free block 问题。模拟记录。
xtts迁移备份报错(测试环境数据):
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/24/2020 09:45:47
ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/oracle/oradata/orcl10/t1.dbf

原因分析

之前确实有坏块,因为没有rman备份,无法直接修复。使用event 10231和dbms.repair跳过,迁移前也已经drop。这个网上教程很多,处理方式也有很多,但都没有后续。使用上述办法数据层次上已经正常,其实数据修复后这个块只是不在具体对象中,是一个free block,在下一次format前仍是坏块。
xtts调用rman,rman对空块不备份并不是直接跳过,也会检测。
!!!再次说明rman备份的重要性。有一个rman就不用这么大费周章。

处理思路:

1、 bbed制造一个空块替换 (慎用)
2、 allocate extent datafile方式覆盖 (推荐)
3、 resize或10g环境中的truncate或许有用,但都是有限制的,本来也不是针对这个问题的处理方式。成功率并不高

故障模拟

说的有点绕,模拟一下就完事了。

模拟坏块及没有rman备份的修复

1、新建一个表插入6行数据。通过 rowid查看测试数据的文件号(rel_fno)、块号( blockno ,)和行号(rowno)

select rowid, 
dbms_rowid.rowid_relative_fno(rowid) rel_fno, 
dbms_rowid.rowid_block_number(rowid) blockno, 
dbms_rowid.rowid_row_number(rowid) rowno 
from t11;

ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAAMpCAAFAAAAAOAAA          5         14          0
AAAMpCAAFAAAAAOAAB          5         14          1
AAAMpCAAFAAAAAOAAC          5         14          2
AAAMpCAAFAAAAAOAAD          5         14          3
AAAMpCAAFAAAAAOAAE          5         14          4
AAAMpCAAFAAAAAOAAF          5         14          5

2、使用bbed对file#=5,block#=14进行破坏
bbed网上教程也很多,简单记录下:

set dba 5,14
dump /v dba 5,14 offset 0
modify /x 12345678 dba 5,14 offset 0 --修改
sum dba 5,14 apply 提交

3、测试数据是否正常

alter system flush buffer_cache;  -数据已经缓存到内存中,手工刷新,不然不会报错

查询提示坏块:

09:42:48 SQL> select * from t11;
select * from t11
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 14)
ORA-01110: data file 5: '/u01/app/oracle/oradata/orcl10/t1.dbf'

4、修复坏块
修复坏块方法很多:

  • event 10231
  • dbms.repair
  • truncate 重新导入
  • drop 重建
  • rebulid 索引重建
09:42:57 SQL> truncate table t11;
Table truncated.

09:43:04 SQL> select * from t11;
no rows selected

确实可以了查询了,之前查行号显示6行数据,这里0行,没有备份的修复代价就是这部分数据丢失。再次提醒备份的重要性。
网上大部分的教程也都是这一部分,基本到这就结束了。这个工作真的彻底完成了吗?
5、dbv与rman检测

  • DBA检测
[oracle@rhel6 ~]$ dbv file='/u01/app/oracle/oradata/orcl10/t1.dbf'

DBVERIFY: Release 10.2.0.4.0 - Production on Tue Nov 24 09:47:02 2020

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl10/t1.dbf
Page 14 is marked corrupt
Corrupt block relative dba: 0x0140000e (file 5, block 14)
Bad header found during dbv: 
Data in bad block:
 type: 18 format: 4 rdba: 0x0140000e
 last change scn: 0x0000.000959ff seq: 0x8 flg: 0x06
 spare1: 0x56 spare2: 0x78 spare3: 0x0
 consistency value in tail: 0x59ff0608
 check value in block header: 0x588c
 computed block checksum: 0x0
DBVERIFY - Verification complete

Total Pages Examined         : 1280
Total Pages Processed (Data) : 4
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 11
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1264
**Total Pages Marked Corrupt   : 1**
Total Pages Influx           : 0
Highest block SCN            : 613140 (0.613140)
  • RMAN检测
backup validate check logical datafile 5;  

--校验后查询
09:50:36 SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         5         14          1                  0 CORRUPT
--还不死心,直接备份:	 
RMAN> backup datafile 5;

Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: starting piece 1 at 24-NOV-20
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/24/2020 09:45:47
ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/oracle/oradata/orcl10/t1.dbf

基本验证了上述理论。重建方式或手工标记只是解决了数据上的。坏块还是存在的,被置为free block。如果确认以后不再使用rman的情况下,也可以这么做。

模拟故障处理

1、根据以上dbv或rman信息确定坏块信息

  • 文件号: FILE#=5
  • 块 号: BLOCK#=14
  • 文件名: /u01/app/oracle/oradata/orcl10/t1.dbf --T1表空间

2、检查块是否是某个对象的一部分

这个查询在大库中会很慢,尝试从v$bh中抓取失败,等一会。

select segment_name, segment_type, owner
       from dba_extents
      where file_id = 5
        and 14 between block_id
            and block_id + blocks -1;
            
 no rows selected                  --dba_extents已经查不到了,不属于任何对象

3、如果块不属于任何对象,查询dba_free_space确认块是否属于数据文件的可用空间

Select * from dba_free_space where file_id= 5
     and 14 between block_id and block_id + blocks -1;

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
T1                                      5          9      65536          8            5

运行以下SQL,确认块是位于可用空间中还是已占用空间中
这sql也很慢,模拟的时候可以执行找到原因,方便理解。生产修复可以略过。

set lines 200 pages 10000
col segment_name format a30
 
SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
, greatest(e.block_id, c.block#) corr_start_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
- greatest(e.block_id, c.block#) + 1 blocks_corrupted
, null description
FROM dba_extents e, v$database_block_corruption c
WHERE e.file_id = c.file#
AND e.block_id <= c.block# + c.blocks - 1
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
, header_block corr_start_block#
, header_block corr_end_block#
, 1 blocks_corrupted
, 'Segment Header' description
FROM dba_segments s, v$database_block_corruption c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
, greatest(f.block_id, c.block#) corr_start_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
- greatest(f.block_id, c.block#) + 1 blocks_corrupted
, 'Free Block' description
FROM dba_free_space f, v$database_block_corruption c
WHERE f.file_id = c.file#
AND f.block_id <= c.block# + c.blocks - 1
AND f.block_id + f.blocks - 1 >= c.block#
ORDER BY file#, corr_start_block#;
 
--最后列,free block
OWNER                          SEGMENT_TYPE       SEGMENT_NAME                   PARTITION_NAME                      FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED DESCRIPTION
------------------------------ ------------------ ------------------------------ ------------------------------ ---------- ----------------- --------------- ---------------- --------------
                                                                                                                        5                14              14           1 Free Block

4、创建中间表

创建一个表 ,该表位于出现坏块的表空间T1中,使用nologging选项,避免生成redo

conn t1/t1
create table s (n number,c varchar2(4000)) nologging; 
--验证表是否创建在了正确的表空间
select segment_name,tablespace_name from user_segments where segment_name='S' ;
SEGMENT_NAME                   TABLESPACE_NAME
------------------------------ ------------------------------
S                              T1

5、在表上创建触发器

根据dbv的结果输入变量,块号14 ,数据文件5

conn / as sysdba 
CREATE OR REPLACE TRIGGER corrupt_trigger 
  AFTER INSERT ON t1.s 
  REFERENCING OLD AS p_old NEW AS new_p 
  FOR EACH ROW 
DECLARE 
  corrupt EXCEPTION; 
BEGIN 
  IF (dbms_rowid.rowid_block_number(:new_p.rowid)=&blocknumber)
 and (dbms_rowid.rowid_relative_fno(:new_p.rowid)=&filenumber) THEN 
     RAISE corrupt; 
  END IF; 
EXCEPTION 
  WHEN corrupt THEN 
     RAISE_APPLICATION_ERROR(-20000, 'Corrupt block has been formatted'); 
END; 
/

6、计算分配空间

Select BYTES/1024 from dba_free_space where file_id=5 and 1280 between block_id and block_id + blocks -1;

BYTES/1024
----------
      64

为受影响的数据文件中的表分配空间,这里计算下来是64K。

7、 关闭datafile 自动扩展

先查询下,坏块修复在改回原值保持一致。

select FILE_NAME,AUTOEXTENSIBLE from dba_data_files where FILE_NAME='/u01/app/oracle/oradata/orcl10/t1.dbf';  
alter database datafile '/u01/app/oracle/oradata/orcl10/t1.dbf' autoextend off;

8、分配空间

alter table t1.s allocate extent (DATAFILE '/u01/app/oracle/oradata/orcl10/t1.dbf' SIZE 64K);
如果在这个数据文件中有多个空闲extent,或者计算的分配空间很大,也可以使用这个循环:
BEGIN
for i in 1..1000000 loop
EXECUTE IMMEDIATE 'alter table t1.s allocate extent (DATAFILE '||'''/u01/app/oracle/oradata/orcl10/t1.dbf''' ||'SIZE 64K) ';
end loop;
end ;
/

使用这个循环也有一个好处,第9步中的查询会很慢(测试环境可能几秒钟跑完了,实际环境中跑了接近2小时),使用for 循环可以避免多次查询,由于datafile自动扩展关了,受限于datafile 32G并不会造成太多数据。

9、确认坏块成为新建表的一部分

select segment_name, segment_type, owner
       from dba_extents
      where file_id = 5
        and 1280 between block_id
            and block_id + blocks -1 ;

SEGMENT_NAME                   SEGMENT_TYPE       OWNER
------------------------------ ------------------ ------------------------------
S                              TABLE              T1

10、向表中插入数据格式化块

BEGIN
FOR i IN 1..1000000000 LOOP
INSERT /*+ APPEND */ INTO t1.s select i, lpad('REFORMAT',3092, 'R') from dual;
commit ;
END LOOP;
END;

每向表中插入一行就会触发触发器,当向坏块中插入第一行数据的时候,会产生ORA-2000异常 。
ERROR at line 1:
ORA-20000: Corrupt block has been formatted
ORA-06512: at “SYS.CORRUPT_TRIGGER”, line 10
ORA-04088: error during execution of trigger ‘SYS.CORRUPT_TRIGGER’
ORA-06512: at line 4

11、 重新验证

直接dbv不行,要先使用rman在用dbv就正常了

  • RMAN
RMAN> backup validate check logical datafile 5;

Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=159 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 24-NOV-20

SQL> select * from v$database_block_corruption;
no rows selected


RMAN> backup datafile 5;

Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: starting piece 1 at 24-NOV-20
channel ORA_DISK_1: finished piece 1 at 24-NOV-20
piece handle=/u01/app/oracle/product/10.2.0/db_1/dbs/05vganpf_1_1 tag=TAG20201124T110415 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 24-NOV-20
  • DBV
[oracle@rhel6 ~]$ dbv file='/u01/app/oracle/oradata/orcl10/t1.dbf'

DBVERIFY: Release 10.2.0.4.0 - Production on Tue Nov 24 11:03:53 2020
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl10/t1.dbf
DBVERIFY - Verification complete
Total Pages Examined         : 7680
Total Pages Processed (Data) : 1147
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 134
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 6399
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 631762 (0.631762)

12、清理

drop table,切换日志和checkpoint后删除触发器

drop table scott.s;
alter system switch logfile;
alter system switch logfile;
alter system checkpoint;
DROP trigger corrupt_trigger ;
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论