一、实验环境
序号 | 主机名 | IP地址 | 系统版本 |
---|---|---|---|
1 | ora11g | 192.168.116.3 | Centos7.4 |
二、一键安装ORACLE
为了快速实验,省去安装的繁琐步骤。本次实验直接一键安装ORACLE(含安装数据库软件、创建监听、创建实例)
[root@oracle11g soft]# pwd /soft [root@oracle11g soft]# ll total 1900 -rwxrwxr-x 1 oracle oinstall 651780 Nov 5 2021 bison-2.4.1-5.el6.x86_64.rpm -rwxrwxr-x 1 oracle oinstall 19112 Jul 11 16:15 compat-libcap1-1.10-7.el7.x86_64.rpm -rwxrwxr-x 1 oracle oinstall 228841 Nov 5 2021 compat-libstdc++-33-3.2.3-47.3.x86_64.rpm -rw-r--r-- 1 oracle oinstall 502 Aug 1 11:16 db.rsp -rwxrwxr-x 1 oracle oinstall 54070 Jun 1 2022 elfutils-libelf-0.137-3.el5.x86_64.rpm -rwxrwxr-x 1 oracle oinstall 24980 Jun 1 2022 elfutils-libelf-devel-0.137-3.el5.x86_64.rpm -rwxrwxr-x 1 oracle oinstall 64118 Jun 1 2022 elfutils-libelf-devel-static-0.137-3.el5.x86_64.rpm -rwxrwxr-x 1 oracle oinstall 291380 Nov 5 2021 flex-2.5.35-9.el6.x86_64.rpm -rwxrwxr-x 1 oracle oinstall 17327 Aug 1 11:22 oracleAllSilent_20230801111436.log -rwxrwxr-x 1 oracle oinstall 166252 Nov 5 2021 ORACLE-INSTALL.sh -rwxrwxr-x 1 oracle oinstall 210877 Nov 5 2021 pdksh-5.2.14-37.el5_8.1.x86_64.rpm -rwxrwxr-x 1 oracle oinstall 59332 Nov 5 2021 telnet-0.17-48.el6.x86_64.rpm -rwxrwxr-x 1 oracle oinstall 124812 Nov 5 2021 xinetd-2.3.14-40.el6.x86_64.rpm [root@oracle11g soft]# ./ORACLE-INSTALL.sh -i 192.168.116.3 --------------------------------------------------------------------------------------------- 安装过程略 --------------------------------------------------------------------------------------------- Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options Congratulations, Install Successful! Please Reboot Later. [oracle@orcl:/home/oracle]$ sqlplus / as sysdba SQL> shutdown immediate SQL> startup mount; SQL> alter database archivelog; SQL> alter database open; SQL> alter system switch logfile; SQL> archive log list; Database log mode Archive Mode Automatic archival Enabled Archive destination /archivelog Oldest online log sequence 3 Next log sequence to archive 5 Current log sequence 5
复制
三、数据库坏块实验
3.1、坏块介绍
出现坏块现象是指:在Oracle数据库的一个或多个数据块(一个数据块的容量在创建数据库时由db_block_size参数指定,缺省为8K)内出现内容混乱的现象。由于正常的数据块都有固定的合法内容格式,坏块的出现,导致数据库进程无法正常解析数据块的内容,进而使数据库进程报错乃至挂起,并级联导致整个数据库实例出现异常。
对于 Oracle 数据块物理损坏的情形,通常可以分为两种情况:
-
有备份,通过 RMAN 恢复
有备份的情况下,这是很理想的情形,可以直接通过 RMAN 块介质恢复(BLOCK MEDIA RECOVERY)功能来完成受损块的恢复。
-
无备份,通过 ORACLE自带的DBMS_REPAIR 包来修复
📢 注意: 使用 DBMS_REPAIR 包来修复,并非完全恢复,而是标记坏块,然后不对起进行访问,这部分被标记的数据也就丢失了,这是无法避免的。
1,坏块产生的原因
坏块产生的原因大致有以下几种:
-
硬件问题
Oracle进程在处理一个数据块时,首先将其读入物理内存空间,在处理完成后,再由特定进程将其写回磁盘;如果在这个过程中,出现内存故障,CPU计算失误,都会导致内存数据块的内容混乱,最后反映到写回磁盘的数据块内容有误。同样,如果存储子系统出现异常,数据块损坏也就随之出现了。
-
操作系统BUG
由于Oracle进程对数据块的读写,都是以操作系统内核调用(system call)的方式完成的,如果操作系统在内核调用存在问题,必然导致Oracle进程写入非法的内容。
-
操作系统的I/O错误或缓冲问题
-
内存或paging问题
-
OracleBUG
Oracle软件特定版本上,可能出现导致数据块的内容出现异常BUG。
-
非Oracle进程扰乱Oracle共享内存区域
如上文所述,在当数据块的内容被读入主机的物理内存时,如果其他非Oracle进程,对Oracle使用的共享内存区域形成了扰乱,最终导致写回磁盘的数据块内容混乱。
-
异常关机,掉电,终止服务
异常关机,掉电,终止服务使进程异常终止,而破坏数据块的完整性,导致坏块产生。注:这也是为什么突然断电会导致数据库无法启动
由上可见,坏块的形成原因复杂。当出现坏块时,为了找到确切的原因,需要大量的分析时间和排查操作,甚至需要多次重现才能找出根本原因。但当故障发生在生产系统上,我们为了减少停机时间,会尽快实施应急权变措施以保证系统的可用性,这样就破坏了故障现场,对根本原因的分析因而也更加困难了。
2,坏块的预防
坏块问题破坏性大,但并非不可预防。
-
在Metalink.oracle.com网站,Oracle定期发布基于特定软件版本的“已知问题(known issues)说明”。对于可能导致坏块的Oracle软件BUG,在Oracle公司内部,是作为高严重级别的问题进行处理,在“已知问题(known issues)说明”中,这些BUG以严重(Noticable)问题标出(标记为*或+),部分问题,Oracle还会发布警告(Alert)通告。在文档中,Oracle会提供相应的补丁或应对措施。
-
Oracle提供恢复工具-Recovery Manager,提供了扫描文件检查坏块的功能。
在Recovery Manager界面中,使用:
RMAN> BACKUP CHECK LOGICAL VALIDATE DATAFILE n ;
可以检查数据文件是否包含坏块,同时并不产生实际的备份输出。
-
Dbv工具检查
因为dbv要求file后面跟的必须是一个文件扩展名,所以如果用裸设备存储的,就必须使用ln链接裸设备到一个文件,然后再用dbv对这个链接文件进行检查。ANALYZE TABLE tablename VALIDATE STRUCTURE CASCADE它执行坏块的检查,但是不会标记坏块为corrupt,检测的结果保存在USER_DUMP_DEST目录下的用户trace文件中。
-
利用exp工具导出整个数据库可以检测坏块
对以下情况的坏块是检测不出来的:
HWM以上的坏块是不会发现的
索引中存在的坏块是不会发现的
数据字典中的坏块是不会发现的
结合数据库性能综合考虑db_block_checksum和db_blockchecking参数。
当我们使用Recovery Manager进行实际的数据库备份时,同时也就进行了坏块检查。但要注意的是,在线使用Recovery Manager扫描坏块和备份时,需要数
据库运行在归档模式(archive log),否则只能在数据库未打开的情况下进行。对于操作系统问题和硬件故障,则需要相应厂商的配合支持。同时,避免在数
据库主机运行其他用户进程,避免异常停机,也会减少坏块发生的几率。
3,坏块故障的识别
遇到坏块问题时,数据库的异常表现通常有:
报告ORA-01578错误。
报告Ora-1110错误。
报告ORA-00600错误,其中,第一个参数为2000-8000,Cache layer 2000 – 4000,Transaction layer 4000 – 6000, layer 6000 - 8000。
Trace文件中出现Corrupt block dba: 0x160c5958 . found。
分析对象失败。
后台进程,如DBWR,LGWR出现长时间异常等待,如“LGWR wait for redo copy”。
3.2、模拟标记坏块方法的恢复
1,准备数据
--创建用于演示的data file SQL> create tablespace tbs_tmp datafile '/oradata/orcl/tbs_tmp.dbf' size 10m autoextend on; SQL> create user test identified by test; SQL> grant dba to test; SQL> conn test/test --基于新的数据文件创建对象tb_tmp SQL> create table tb_tmp tablespace tbs_tmp as select * from dba_objects; SQL> select count(*) from TB_TMP; COUNT(*) ---------- 86953 SQL> col file_name format a60 SQL> select file_id,file_name from dba_data_files where tablespace_name='TBS_TMP'; FILE_ID FILE_NAME ---------- ------------------------------------------------------------ 6 /oradata/orcl/tbs_tmp.dbf --表对象tb_tmp上的信息,包含对应的文件信息,头部块,总块数 SQL> col SEGMENT_NAME for a20 SQL> select segment_name , header_file , header_block,blocks from dba_segments where segment_name = 'TB_TMP' and owner='TEST'; SEGMENT_NAME HEADER_FILE HEADER_BLOCK BLOCKS -------------------- ----------- ------------ ---------- TB_TMP 6 130 1280
复制
2,模拟坏块
此处模拟物理坏块,注意,在没有备份的情况下,物理坏块基本无救,但是我们需要对坏块进行跳过处理。 破坏137 数据块的内容seek=n从输出文件开头跳过 n个blocks 个块后再开始复制。 conv=notrunc不截短输出文件 [oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/tbs_tmp.dbf bs=8192 conv=notrunc seek=137 count=1 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 0.000214285 s, 38.2 MB/s --清空buffer cache SQL> alter system flush buffer_cache; --查询表对相 tb_tmp,收到ORA-01578的报错 SQL> select count(*) from tb_tmp; ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 6, block # 137) ORA-01110: data file 6: '/oradata/orcl/tbs_tmp.dbf' 同时,alert文件也收到对应的报错信息 Hex dump of (file 6, block 137) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_44345.trc Corrupt block relative dba: 0x01800089 (file 6, block 137) Completely zero block found during multiblock buffer read Reading datafile '/oradata/orcl/tbs_tmp.dbf' for corruption at rdba: 0x01800089 (file 6, block 137) Reread (file 6, block 137) found same corrupt data (no logical check) Tue Aug 01 14:30:34 2023 Corrupt Block Found TSN = 7, TSNAME = TBS_TMP RFN = 6, BLK = 137, RDBA = 25165961 OBJN = 88605, OBJD = 88605, OBJECT = TB_TMP, SUBOBJECT = SEGMENT OWNER = TEST, SEGMENT TYPE = Table Segment Corrupt Block Found TSN = 7, TSNAME = TBS_TMP RFN = 6, BLK = 137, RDBA = 25165961 OBJN = 88605, OBJD = 88605, OBJECT = TB_TMP, SUBOBJECT = SEGMENT OWNER = TEST, SEGMENT TYPE = Table Segment Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_44345.trc (incident=5025): ORA-01578: ORACLE data block corrupted (file # 6, block # 137) ORA-01110: data file 6: '/oradata/orcl/tbs_tmp.dbf' Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5025/orcl_ora_44345_i5025.trc Tue Aug 01 14:30:35 2023 Sweep [inc][5025]: completed Hex dump of (file 6, block 137) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5025/orcl_m000_44572_i5025_a.trc Corrupt block relative dba: 0x01800089 (file 6, block 137) Completely zero block found during validation Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data Tue Aug 01 14:30:35 2023 Dumping diagnostic data in directory=[cdmp_20230801143035], requested by (instance=1, osid=44345), summary=[incident=5025]. Checker run found 1 new persistent data failures --此时dbv命令已经可以检查出坏块了: [oracle@orcl:/home/oracle]$ dbv file=/oradata/orcl/tbs_tmp.dbf DBVERIFY: Release 11.2.0.4.0 - Production on Tue Aug 1 14:32:15 2023 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /oradata/orcl/tbs_tmp.dbf Page 137 is marked corrupt Corrupt block relative dba: 0x01800089 (file 6, block 137) Completely zero block found during dbv: DBVERIFY - Verification complete Total Pages Examined : 1408 Total Pages Processed (Data) : 1241 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 155 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 11 Total Pages Marked Corrupt : 1 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 997436 (0.997436) --通过backup validate检查数据文件,看是否还有其它block损坏 RMAN> backup validate check logical datafile 6; Starting backup at 01-AUG-23 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=39 device type=DISK channel ORA_DISK_1: starting full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00006 name=/oradata/orcl/tbs_tmp.dbf channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------- 6 FAILED 0 139 1536 997839 File Name: /oradata/orcl/tbs_tmp.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data 0 1241 Index 0 0 Other 1 156 validate found one or more corrupt blocks See trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_44745.trc for details Finished backup at 01-AUG-23 --查询视图v$database_block_corruption,提示有坏块,可以看到该文件只有一个数据块损坏。注意该视图可能不会返回任何数据,如无返回,先在RMAN执行backup validate。 SQL> select * from V$DATABASE_BLOCK_CORRUPTION; FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO ---------- ---------- ---------- ------------------ --------- 6 137 1 0 ALL ZERO --查看坏块的对象,这里很明显是个表对象,如果是索引可以重建,如果是表,则有可能会丢数据 SQL> set linesize 200 SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME FROM DBA_EXTENTS A WHERE FILE_ID = 6 AND 137 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 / OWNER SEGMENT_NAME SEGMENT_TYPE TABLESPACE_NAME ------------------------------ -------------------- ------------------ ------------------------------ TEST TB_TMP TABLE TBS_TMP
复制
3,跳过坏块
--产生坏块的时候,exp导出的时候会发现坏块同时报错无法导出 [oracle@orcl:/home/oracle]$ exp test/test tables=tb_tmp file=tb_tmp.dump EXP-00056: ORACLE error 1578 encountered ORA-01578: ORACLE data block corrupted (file # 6, block # 137) ORA-01110: data file 6: '/oradata/orcl/tbs_tmp.dbf' Export terminated successfully with warnings. --我们可以参考MOS中提供的几种方法,任选一种来处理: SQL> select owner,table_name,skip_corrupt from dba_tables where table_name='TB_TMP'; OWNER TABLE_NAME SKIP_COR ------------------------------ --------------- -------- TEST TB_TMP DISABLED --这里先采用DBMS_REPAIR.SKIP_CORRUPT_BLOCKS程序来将表上进行SKIP CORRUPT标记跳过。 [oracle@orcl:/archivelog]$ sqlplus / as sysdba BEGIN DBMS_REPAIR.SKIP_CORRUPT_BLOCKS ( SCHEMA_NAME => 'TEST', OBJECT_NAME => 'TB_TMP', OBJECT_TYPE => dbms_repair.table_object, FLAGS => dbms_repair.SKIP_FLAG ); END; / PL/SQL procedure successfully completed. SQL> select owner,table_name,skip_corrupt from dba_tables where table_name='TB_TMP'; OWNER TABLE_NAME SKIP_COR ------------------------------ ------------------------------ -------- TEST TB_TMP ENABLED
复制
4,验证
上面将TB_TMP表标记坏块后,查询之前报错的SQL句就正常了。 SQL> select count(*) from tb_tmp; COUNT(*) ---------- 86873 再次导出,没有报错 [oracle@orcl:/home/oracle]$ exp test/test tables=tb_tmp file=tb_tmp.dump About to export specified tables via Conventional Path ... . . exporting table TB_TMP 86873 rows exported Export terminated successfully without warnings. --取消表上的SKIP CORRUPT标记 [oracle@orcl:/archivelog]$ sqlplus / as sysdba BEGIN DBMS_REPAIR.SKIP_CORRUPT_BLOCKS ( SCHEMA_NAME => 'TEST', OBJECT_NAME => 'TB_TMP', FLAGS => dbms_repair.NOSKIP_FLAG ); END; / PL/SQL procedure successfully completed. SQL> select count(*) from tb_tmp; select count(*) from tb_tmp * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 6, block # 137) ORA-01110: data file 6: '/oradata/orcl/tbs_tmp.dbf' 后台alert继续报错 --删除表 SQL> conn test/test SQL> drop table TB_TMP; --导入表 [oracle@orcl:/home/oracle]$ imp test/test tables=tb_tmp file=tb_tmp.dump Export file created by EXPORT:V11.02.00 via conventional path import done in AL32UTF8 character set and AL16UTF16 NCHAR character set . importing TEST's objects into TEST . importing TEST's objects into TEST . . importing table "TB_TMP" 86873 rows imported Import terminated successfully without warnings. SQL> select count(*) from tb_tmp; COUNT(*) ---------- 86873
复制
3.3、模拟blockrecover方法的恢复
1,准备数据
--创建用于演示的data file [oracle@orcl:/home/oracle]$ sqlplus / as sysdba SQL> create tablespace rman_tmp datafile '/oradata/orcl/rman_tmp.dbf' size 100m autoextend on; SQL> create user test01 identified by test01; SQL> grant dba to test01; SQL> conn test01/test01; --基于新的数据文件创建表对象tb_rman SQL> create table tb_rman tablespace rman_tmp as select * from dba_objects; SQL> select count (*) from tb_rman; COUNT(*) ---------- 86953 SQL> col file_name format a60 SQL> select file_id,file_name from dba_data_files where tablespace_name='RMAN_TMP'; FILE_ID FILE_NAME ---------- ------------------------------------------------------------ 6 /oradata/orcl/rman_tmp.dbf --表对象tb_rman上的信息,包含对应的文件信息,头部块,总块数 SQL> col SEGMENT_NAME for a20 SQL> select segment_name , header_file , header_block,blocks from dba_segments where segment_name = 'TB_RMAN' and owner='TEST01'; SEGMENT_NAME HEADER_FILE HEADER_BLOCK BLOCKS -------------------- ----------- ------------ ---------- TB_RMAN 6 130 1280
复制
2,RMAN备份数据库
[oracle@orcl:/home/oracle]$ rman target / RMAN> backup full database; piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T160510_ldkh5q28_.bkp tag=TAG20230801T160510 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15 piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_ncsnf_TAG20230801T160510_ldkh674r_.bkp tag=TAG20230801T160510 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 Finished backup at 01-AUG-23
复制
3,模拟单个坏块
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=130 count=1 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 0.000152197 s, 53.8 MB/s --清空buffer cache SQL> alter system flush buffer_cache; --查询表tb_rman,收到ORA-01578报错 SQL> select count (*) from tb_rman; select count (*) from tb_rman * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 6, block # 130) ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf' --同时alert也有报错日志输出 Hex dump of (file 6, block 130) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_46480.trc Corrupt block relative dba: 0x01800082 (file 6, block 130) Completely zero block found during buffer read Reading datafile '/oradata/orcl/rman_tmp.dbf' for corruption at rdba: 0x01800082 (file 6, block 130) Reread (file 6, block 130) found same corrupt data (no logical check) Tue Aug 01 16:35:04 2023 Corrupt Block Found TSN = 8, TSNAME = RMAN_TMP RFN = 6, BLK = 130, RDBA = 25165954 OBJN = 88611, OBJD = 88611, OBJECT = TB_RMAN, SUBOBJECT = SEGMENT OWNER = TEST01, SEGMENT TYPE = Table Segment Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_46480.trc (incident=5027): ORA-01578: ORACLE data block corrupted (file # 6, block # 130) ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf' Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5027/orcl_ora_46480_i5027.trc Tue Aug 01 16:35:05 2023 Sweep [inc][5027]: completed Hex dump of (file 6, block 130) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5027/orcl_m000_46980_i5027_a.trc Corrupt block relative dba: 0x01800082 (file 6, block 130) Completely zero block found during validation Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data Hex dump of (file 6, block 130) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5027/orcl_m000_46980_i5027_a.trc Corrupt block relative dba: 0x01800082 (file 6, block 130) Completely zero block found during buffer read Reading datafile '/oradata/orcl/rman_tmp.dbf' for corruption at rdba: 0x01800082 (file 6, block 130) Reread (file 6, block 130) found same corrupt data (no logical check) Corrupt Block Found TSN = 8, TSNAME = RMAN_TMP RFN = 6, BLK = 130, RDBA = 25165954 OBJN = -1, OBJD = 88611, OBJECT = RMAN_TMP, SUBOBJECT = SEGMENT OWNER = , SEGMENT TYPE = Temporary Segment Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_46980.trc (incident=5049): ORA-01578: ORACLE data block corrupted (file # 6, block # 130) ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf' Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5049/orcl_m000_46980_i5049.trc Tue Aug 01 16:35:06 2023 Dumping diagnostic data in directory=[cdmp_20230801163506], requested by (instance=1, osid=46480), summary=[incident=5027]. Checker run found 1 new persistent data failures Dumping diagnostic data in directory=[cdmp_20230801163507], requested by (instance=1, osid=46980 (M000)), summary=[incident=5049]. ----查询视图v$database_block_corruption,提示有坏块,注意该视图可能不会返回任何数据,如无返回,先执行backup validate SQL> select * from v$database_block_corruption; FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO ---------- ---------- ---------- ------------------ --------- 6 130 1 0 ALL ZERO
复制
4,blockrecover恢复单个坏块
[oracle@orcl:/home/oracle]$ rman target / RMAN> blockrecover datafile 6 block 130; Starting recover at 01-AUG-23 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=47 device type=DISK channel ORA_DISK_1: restoring block(s) channel ORA_DISK_1: specifying block(s) to restore from backup set restoring blocks of datafile 00006 channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T162916_ldkjlwgs_.bkp channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T162916_ldkjlwgs_.bkp tag=TAG20230801T162916 channel ORA_DISK_1: restored block(s) from backup piece 1 channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01 starting media recovery media recovery complete, elapsed time: 00:00:01 Finished recover at 01-AUG-23 --再次查询表tb_rman正常无报错 SQL> select count (*) from tb_rman; COUNT(*) ---------- 86953
复制
5,模拟多个数据坏块
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=133 count=1 [oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=143 count=1 [oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=153 count=1 SQL> alter system flush buffer_cache; --下面提示块133被损坏,注意我们损坏了多块数据块,但查询时,从块号最小的开始提示,如133被修复后还有坏块则继续提示133之后的坏块 SQL> select count (*) from tb_rman; select count (*) from tb_rman * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 6, block # 133) ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf' 后台alert也只报了133块的错误 ORA-01578: ORACLE data block corrupted (file # 6, block # 133) ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf' --查询视图v$database_block_corruption无任何记录 SQL> select * from v$database_block_corruption; no rows selected --下面使用backup validate来校验数据文件 List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------- 6 FAILED 0 11403 12800 1002692 File Name: /oradata/orcl/rman_tmp.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data 0 1239 Index 0 0 Other 3 158 --再次查询v$database_block_corruption,表明有3个损坏的块 SQL> select * from v$database_block_corruption; FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO ---------- ---------- ---------- ------------------ --------- 6 133 1 0 ALL ZERO 6 143 1 0 ALL ZERO 6 153 1 0 ALL ZERO --使用dbv工具来校验数据文件,查出3个坏块 [oracle@orcl:/home/oracle]$ dbv file=/oradata/orcl/rman_tmp.dbf DBVERIFY: Release 11.2.0.4.0 - Production on Tue Aug 1 16:48:21 2023 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /oradata/orcl/rman_tmp.dbf Page 133 is marked corrupt Corrupt block relative dba: 0x01800085 (file 6, block 133) Completely zero block found during dbv: Page 143 is marked corrupt Corrupt block relative dba: 0x0180008f (file 6, block 143) Completely zero block found during dbv: Page 153 is marked corrupt Corrupt block relative dba: 0x01800099 (file 6, block 153) Completely zero block found during dbv: DBVERIFY - Verification complete Total Pages Examined : 12800 Total Pages Processed (Data) : 1239 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 155 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 11403 Total Pages Marked Corrupt : 3 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 1002692 (0.1002692)
复制
6,blockrecover恢复多个坏块
--下面直接使用blockrecover corruption list来恢复,刚刚被校验的坏块都会被恢复 [oracle@orcl:/home/oracle]$ rman target / RMAN> blockrecover corruption list; Starting recover at 01-AUG-23 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=42 device type=DISK channel ORA_DISK_1: restoring block(s) channel ORA_DISK_1: specifying block(s) to restore from backup set restoring blocks of datafile 00006 channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T162916_ldkjlwgs_.bkp channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T162916_ldkjlwgs_.bkp tag=TAG20230801T162916 channel ORA_DISK_1: restored block(s) from backup piece 1 channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01 starting media recovery media recovery complete, elapsed time: 00:00:01 Finished recover at 01-AUG-23 --校验结果,查询表tb_rman无报错 SQL> select count (*) from tb_rman; COUNT(*) ---------- 86953 SQL> col OWNER for a20 SQL> col OBJECT_NAME for a20 SQL> set linesize 200 SQL> select dbms_rowid.rowid_object(rowid) object_id,dbms_rowid.rowid_relative_fno(rowid) file_id, dbms_rowid.rowid_block_number(rowid) block_id,owner,object_name,object_id from TEST01.tb_rman where dbms_rowid.rowid_block_number(rowid)=163 and rownum<=2; OBJECT_ID FILE_ID BLOCK_ID OWNER OBJECT_NAME OBJECT_ID ---------- ---------- ---------- -------------------- -------------------- ---------- 88611 6 163 PUBLIC GV$BACKUP_SET 2364 88611 6 163 SYS GV_$BACKUP_PIECE 2365
复制
3.4、坏块的对象定位与影响
--下面我们查询块号为163上的对象 SQL> set linesize 200 SQL> col OBJECT_NAME for a20 SQL> select dbms_rowid.rowid_object(rowid) object_id,dbms_rowid.rowid_relative_fno(rowid) file_id, dbms_rowid.rowid_block_number(rowid) block_id,owner,object_name,object_id from test01.tb_rman where dbms_rowid.rowid_block_number(rowid)=163 and rownum<=2; OBJECT_ID FILE_ID BLOCK_ID OWNER OBJECT_NAME OBJECT_ID ---------- ---------- ---------- ------------------------------ -------------------- ---------- 88611 6 163 PUBLIC GV$BACKUP_SET 2364 88611 6 163 SYS GV_$BACKUP_PIECE 2365 --使用上面的方法,我们损块块163,173 [oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=163 count=1 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 0.000175403 s, 46.7 MB/s [oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=173 count=1 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 0.000140643 s, 58.2 MB/s --对于坏块对象无法进行聚合汇总等操作 SQL> alter system flush buffer_cache; SQL> select count(*) from test01.tb_rman; select count(*) from test01.tb_rman * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 6, block # 163) ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf' --对于坏块上的记录无法被查询,我们使用基于之前查询到的OBJECT_ID来查询 SQL> select owner,object_name,object_id from test01.tb_rman where object_id in(2364,2365); select owner,object_name,object_id from test01.tb_rman where object_id in(2364,2365) * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 6, block # 163) ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf' --如下面的查询,位于损坏块上的数据无法被查询到,但对于未损坏的依旧可以查询。下面的查询时块161上的对象 SQL> select owner,object_name,object_id from test01.tb_rman 2 where dbms_rowid.rowid_block_number(rowid)=161 and rownum<3; OWNER OBJECT_NAME OBJECT_ID ------------------------------ -------------------- ---------- SYS GV_$LATCHNAME 2203 PUBLIC GV$LATCHNAME 2204 --定位受损块所对应的对象 SQL> col SEGMENT_NAME for a20 SQL> run get_obj_name_from_corrupt_block 1 SELECT tablespace_name, 2 segment_type, 3 owner, 4 segment_name, 5 partition_name 6 FROM dba_extents 7* WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1 Enter value for file_id: 6 Enter value for block_id: 163 old 7: WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1 new 7: WHERE file_id = 6 AND 163 BETWEEN block_id AND block_id + blocks - 1 TABLESPACE_NAME SEGMENT_TYPE OWNER SEGMENT_NAME PARTITION_NAME ------------------------------ ------------------ ------------------------------ -------------------- ------------------------------ RMAN_TMP TABLE TEST01 TB_RMAN SQL> run get_obj_name_from_corrupt_block 1 SELECT tablespace_name, 2 segment_type, 3 owner, 4 segment_name, 5 partition_name 6 FROM dba_extents 7* WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1 Enter value for file_id: 6 Enter value for block_id: 173 old 7: WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1 new 7: WHERE file_id = 6 AND 173 BETWEEN block_id AND block_id + blocks - 1 TABLESPACE_NAME SEGMENT_TYPE OWNER SEGMENT_NAME PARTITION_NAME ------------------------------ ------------------ ------------------------------ -------------------- -------------------------- RMAN_TMP TABLE TEST01 TB_RMAN --对于损坏的数据文件,缺省情况下,不能对其进行备份,如下 RMAN> backup datafile 6 tag='corruption'; Starting backup at 02-AUG-23 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=40 device type=DISK channel ORA_DISK_1: starting full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00006 name=/oradata/orcl/rman_tmp.dbf channel ORA_DISK_1: starting piece 1 at 02-AUG-23 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/02/2023 09:51:08 ORA-19566: exceeded limit of 0 corrupt blocks for file /oradata/orcl/rman_tmp.dbf --需要设定允许损坏块的数量之后才能进行备份 RMAN> run{ 2> set maxcorrupt for datafile 6 to 2; 3> backup datafile 6 tag='corruption'; 4> } executing command: SET MAX CORRUPT Starting backup at 02-AUG-23 using channel ORA_DISK_1 channel ORA_DISK_1: starting full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00006 name=/oradata/orcl/rman_tmp.dbf channel ORA_DISK_1: starting piece 1 at 02-AUG-23 channel ORA_DISK_1: finished piece 1 at 02-AUG-23 piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_02/o1_mf_nnndf_CORRUPTION_ldmfpw3v_.bkp tag=CORRUPTION comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 Finished backup at 02-AUG-23 --查看备份信息如下,应在修复坏块后重新备份以避免由于保留策略导致先前可用的备份被aged out RMAN> list backup summary; List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1 B F A DISK 01-AUG-23 1 1 NO TAG20230801T160510 2 B F A DISK 01-AUG-23 1 1 NO TAG20230801T160510 3 B F A DISK 01-AUG-23 1 1 NO TAG20230801T162916 4 B F A DISK 01-AUG-23 1 1 NO TAG20230801T162916 5 B F A DISK 02-AUG-23 1 1 NO CORRUPTION
复制
3.5、总结
- 对于受损的数据块,仅仅坏块上的数据无法被查询或读取,其余正常块的数据依旧可以使用。
- 对于受损的表对象进行聚合等相关运算时收到错误提示,因为坏块上的数据无法被统计。如果你聚合的是索引列,索引未损坏的情形则可正常返回。
- 可以基于RMAN可用的备份文件实现块介质恢复,其数据文件无需offline,开销最小,影响最小。
- 对于多个数据块的损坏,先执行backup validate校验数据库或相应的数据文件以便标记受损的坏块后,填充v$database_block_corruption以及后续恢复。
- 对于使用backup validate 校验后的情形,坏块恢复时可以直接使用blockrecover corruption list一次性恢复所有的坏块。
- 缺省情况下,存在坏块的数据文件无法成功备份,也会导致自动备份脚本失败。