暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

oracle DG备库磁盘坏块问题处理

IT那活儿 2022-01-19
1194

点击上方“IT那活儿”,关注后了解更多精彩内容!!


01

问题背景


2021年12月15日,17:33分,收到某数据库表空间告警,业务人员对告警表空间WEBOA添加数据文件195-198号数据文件,20:21收到该备库同步中断告警,检查日志发现报错ORA-01119、ORA-27040,备库磁盘已满,无法同步主库数据。通知业务扩容时,业务侧反馈备库hdisk5存在坏块,需要进行磁盘替换才能完成扩容。

02

坏块修复


2.1 确认损坏的LV

lspv -l hdisk5hdisk5:LV NAME LPs PPs DISTRIBUTION MOUNT POINTrac_data28_32g 32 32 00..00..00..32..00 N/Arac_data27_32g 32 32 00..00..00..32..00 N/Arac_data26_32g 32 32 29..00..00..03..00 N/Arac_data25_32g 32 32 32..00..00..00..00 N/Arac_data32_32g 32 32 00..00..00..32..00 N/Arac_data31_32g 32 32 00..00..00..32..00 N/Arac_da40_32g 32 32 00..00..00..32..00 N/Arac_data29_32g 32 32 00..00..00..32..00 N/Arac_data34_32g 32 32 00..00..00..32..00 N/Arac_data33_32g 32 32 00..00..00..32..00 N/Arac_data4_32g 32 32 00..32..00..00..00 N/Arac_data37_32g 32 32 00..00..00..00..32 N/Arac_data2_32g 32 32 00..32..00..00..00 N/Arac_data38_32g 32 32 00..00..00..00..32 N/Arac_data3_32g 32 32 00..32..00..00..00 N/Arac_data35_32g 32 32 00..00..00..32..00 N/Arac_redo2_3 1 1 00..01..00..00..00 N/Adbv file=/dev/rac_da40_32g blocksize=8192DBVERIFY: Release 10.2.0.3.0 - Production on Sun Dec 19 13:44:35 2021Copyright (c) 1982, 2005, Oracle. All rights reserved.DBVERIFY - Verification starting : FILE = /dev/rac_da40_32gDBVERIFY - Verification completeTotal Pages Examined : 4194303Total Pages Processed (Data) : 295443Total Pages Failing (Data) : 0Total Pages Processed (Index): 119458Total Pages Failing (Index): 0Total Pages Processed (Other): 3777343Total Pages Processed (Seg) : 0Total Pages Failing (Seg) : 0Total Pages Empty            : 1932Total Pages Marked Corrupt : 127Total Pages Influx : 0Highest block SCN            : 2559490068 (2.2559490068)
存储侧确认rac_da40_32g这个lv有坏块,虽然在hdisk只占有2个PPS(128M),但需要把对应的整个数据文件删除才可以把hdsik5替换掉,然后扩容。

2.2 备份rac_da40_32g

2.2.1 裸设备数据备份到文件

dd if=/dev/rac_da40_32g of=/archive/rac_da40_32g.dbf bs=8k skip=8 count=4194296
2.2.2 备库用rman备份到文件系统
RMAN> copy datafile '/dev/rrac_data40_32g' to '/migrate/recover_backup/rrac_data40_32g.dbf';Starting backup at 19-DEC-21using target database control file instead of recovery catalogallocated channel: ORA_DISK_1channel ORA_DISK_1: sid=5085 instance=oadb1 devtype=DISKchannel ORA_DISK_1: starting datafile copyinput datafile fno=00163 name=/dev/rrac_da62_32goutput filename=/migrate/recover_backup/rrac_data40_32g.dbf tag=TAG20211219T132823 recid=232 stamp=1091712654channel ORA_DISK_1: datafile copy complete, elapsed time: 00:02:35Finished backup at 19-DEC-21


2.2.3 主库用rman备份到文件系统作为备用,防止备库数据文件不可用


RMAN> copy datafile '/dev/rrac_data40_32g' to '/migrate/recover_backup/backup_rrac_da40_32g.dbf';

2.3 检查换盘后数据文件状态

待存储侧更换完新盘后,检查rac_da40_32g状态:
lslv rac_da40_32gLOGICAL VOLUME: rac_da40_32g VOLUME GROUP: datavg2LV IDENTIFIER: 00f7275700004c000000013b08bcea61.40 PERMISSION: read/writeVG STATE: active/complete        LV STATE: closed/syncdTYPE: raw WRITE VERIFY: offMAX LPs:            512                    PP SIZE: 64 megabyte(s)COPIES: 1                      SCHED POLICY: stripedLPs:                512                    PPs:            512STALE PPs:          0                      BB POLICY: relocatableINTER-POLICY: maximum RELOCATABLE: noINTRA-POLICY: middle UPPER BOUND: 16MOUNT POINT: N/A LABEL: NoneMIRROR WRITE CONSISTENCY: on/ACTIVEEACH LP COPY ON A SEPARATE PV ?: yes (superstrict)Serialize IO ?: NOSTRIPE WIDTH: 16STRIPE SIZE: 512kDEVICESUBTYPE : DS_LVZ


2.4 恢复数据文件到新盘

2.4.1 使用dd从文件系统到裸设备
dd if=/archive/rac_da40_32g.dbf of=/dev/rrac_da40_32g bs=8k seek=84194296+0 records in.4194296+0 records out.
恢复完成后,启动数据库到mount状态,开启实时应用,后台日志报错:
Thu Dec 23 14:50:53 2021Managed Standby Recovery not using Real Time ApplyThu Dec 23 14:50:53 2021Errors in file /opt/app/oracle/admin/oadb/bdump/oadb1_mrp0_3449322.trc:ORA-01110: 数据文件 141: '/dev/rrac_da40_32g'ORA-01122: 数据库文件 141 验证失败
使用备库RMAN备份的文件恢复同样报错,判断备份的文件存在坏块,无法进行恢复,尝试通过主库备份的数据文件进行恢复。
2.4.2 使用rman从文件系统迁移文件至裸设备(从生产备份)
RMAN> copy datafile '/archive/backup_rrac_da40_32g.dbf' to '/dev/rrac_da40_32g';Starting backup at 23-DEC-21using target database control file instead of recovery catalogallocated channel: ORA_DISK_1channel ORA_DISK_1: sid=5472 instance=oadb1 devtype=DISKRMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03002: failure of backup command at 12/23/2021 16:03:35RMAN-20201: datafile not found in the recovery catalogRMAN-06010: error while looking up datafile: /archive/backup_rrac_da40_32g.dbf
这是由于控制文件记录的文件名为/dev/rrac_da40_32g,需要将备份的数据文件rename成记录的文件,然后进行恢复:
SQL> alter database datafile 141 offline;alter database datafile 141 offlineERROR at line 1:ORA-01668: standby database requires DROP option for offline of data fileSQL> alter database datafile 141 offline drop;Database altered.SQL>  alter database rename file '/dev/rrac_da40_32g' to '/archive/backup_rrac_da40_32g.dbf';Database altered.


再次恢复:


RMAN> copy datafile '/archive/backup_rrac_da40_32g.dbf' to '/dev/rrac_da40_32g';Starting backup at 23-DEC-21using target database control file instead of recovery catalogallocated channel: ORA_DISK_1channel ORA_DISK_1: sid=5475 instance=oadb1 devtype=DISKchannel ORA_DISK_1: starting datafile copyinput datafile fno=00045 name=/archive/backup_rrac_da40_32g.dbf

2.5 开启实时应用

打开实时应用:
alter database recover managed standby database using current logfile disconnect;
检查同步状态:
set lines 200SELECT PROCESS ,STATUS , THREAD#,SEQUENCE#,BLOCKS,BLOCK# FROM gV$MANAGED_STANDBY;PROCESS STATUS THREAD# SEQUENCE# BLOCKS BLOCK#--------- ------------ ---------- ---------- ---------- ----------ARCH CLOSING               2      24914        171          1ARCH CLOSING               1      32563        330     641025ARCH CLOSING               2      24912        689    1019905ARCH CLOSING               1      32565       1119          1RFS IDLE 0          0          0          0RFS IDLE 0          0          0          0RFS IDLE 0          0          0          0MRP0 APPLYING_LOG 2      24912    1020593     185859RFS IDLE 0          0          0          0RFS IDLE 0          0          0          0RFS IDLE 0          0          0          0
可以看出,此时已经开始应用日志。
观察alert日志报错:
ORA-01110: 数据文件 297: '/opt/app/oracle/product/10.2.0/db_1/dbs/UNNAMED00297'
打开手动管理:ALTER SYSTEM SET standby_file_management='MANUAL' SCOPE=BOTH;创建到裸设备:alter database create datafile '/opt/app/oracle/product/10.2.0/db_1/dbs/UNNAMED00298' as '/dev/rrac_da197_30g' size   30718m autoextend off
开启实时应用:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
打开自动管理:
ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=BOTH;
多个文件没有同步,需要按照上面的步骤重复,每次都需要重新打开日志应用 。
同步完成后检查同步状态:
select name,value,unit,time_computed from v$dataguard_stats where name in ('transport lag','apply lag');NAME VALUE UNIT TIME_COMPUTED------------- -------------------- ------------------------------ ------------------------------apply lag +00 00:00:00 day(2) to second(0) interval 28-DEC-2021 15:22:23transport lag +00 00:00:00 day(2) to second(0) interval 28-DEC-2021 15:22:23
可以看出,同步已追平。

本 文 原 创 来 源:IT那活儿微信公众号(上海新炬王翦团队)

文章转载自IT那活儿,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论