oracle adg出现长时间gap,增量备份恢复报错 RMAN-06571
背景
故障现象
昨天,某客户反馈,adg同步状态有问题,出现告警,登录环境查看,已经GAP了15天之久了:
登录数据库查看:
确实,此时日志才应用到thread 1 46760,thread 2 32241.
查看alert日志:
2021-11-22T14:29:47.838021+08:00
ALTER SYSTEM SET listener_networks='' SCOPE=MEMORY SID='qhpfdb1';
PR00 (PID:135782): Media Recovery Waiting for T-1.S-46762
PR00 (PID:135782): Fetching gap from T-1.S-46762 to T-1.S-46861
2021-11-22T14:31:18.222791+08:00
rfs (PID:39377): Selected LNO:61 for T-1.S-48684 dbid 1447493774 branch 1046117134
2021-11-22T14:31:25.866540+08:00
ARC2 (PID:8663): Archived Log entry 81223 added for T-1.S-48683 ID 0x56f3dcd9 LAD:1
2021-11-22T14:31:39.991843+08:00
PR00 (PID:135782): FAL: Failed to request gap sequence
PR00 (PID:135782): GAP - thread 1 sequence 46762-46861
PR00 (PID:135782): DBID 1447493774 branch 1046117134
PR00 (PID:135782): FAL: All defined FAL servers have been attempted
PR00 (PID:135782): -------------------------------------------------------------------------
PR00 (PID:135782): Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
PR00 (PID:135782): parameter is defined to a value that's sufficiently large
PR00 (PID:135782): enough to maintain adequate log switch information to resolve
PR00 (PID:135782): archived redo log gaps.
PR00 (PID:135782): -------------------------------------------------------------------------
2021-11-22T14:32:11.573957+08:00
PR00 (PID:135782): MRP0: Background Media Recovery cancelled with status 16037
2021-11-22T14:32:11.574133+08:00
Errors in file /oraclelog/diag/rdbms/qhpfdbstd/qhpfdb1/trace/qhpfdb1_pr00_135782.trc:
ORA-16037: user requested cancel of managed recovery operation
2021-11-22T14:32:11.575939+08:00
.... (PID:161407): Managed Standby Recovery not using Real Time Apply
2021-11-22T14:32:11.781987+08:00
Recovery interrupted!
alert日志中也有GAP信息:GAP - thread 1 sequence 46762-46861
数据库缺少日志SEQUENCE46762开始的日志,但是没显示thread 2缺失的日志。他应该是应用到thread 1 46762的时候,发现缺失了,recover不了,才把1节点的GAP显示在这里。
这套库之前没有纳入到监控系统中,每天巡检做的也不是很仔细,所以GAP这么长时间。有GAP不可怕,处理起来也是比较简单。先看下基本环境,
环境基本信息:
主库:
[root@qhpfdb1 ~]# su - oracle
Last login: Tue Nov 23 18:20:24 CST 2021
qhpfdb1:/home/oracle(qhpfdb1)$opatch lspatches
31771877;Database Release Update : 19.9.0.0.201020 (31771877)
30894985;OCW RELEASE UPDATE 19.7.0.0.0 (30894985)
OPatch succeeded.
qhpfdb1:/home/oracle(qhpfdb1)$logout
[root@qhpfdb1 ~]# su - grid
Last login: Tue Nov 23 18:21:26 CST 2021
qhpfdb1:/home/grid(+ASM1)$opatch lspatches
30898856;TOMCAT RELEASE UPDATE 19.0.0.0.0 (30898856)
30894985;OCW RELEASE UPDATE 19.7.0.0.0 (30894985)
30869304;ACFS RELEASE UPDATE 19.7.0.0.0 (30869304)
30869156;Database Release Update : 19.7.0.0.200414 (30869156)
OPatch succeeded.
qhpfdb1:/home/grid(+ASM1)$logout
[root@qhpfdb1 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
[root@qhpfdb1 ~]# uname -a
Linux qhpfdb1 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
备库:
qhpfdb1:/oraclelog/diag/rdbms/qhpfdbstd/qhpfdb1/trace(qhpfdb1)$opatch lspatches
33208123;OCW RELEASE UPDATE 19.13.0.0.0 (33208123)
33192793;Database Release Update : 19.13.0.0.211019 (33192793)
OPatch succeeded.
qhpfdb1:/oraclelog/diag/rdbms/qhpfdbstd/qhpfdb1/trace(qhpfdb1)$logout
[root@qhpfdb1 ~]# su - grid
Last login: Tue Nov 23 18:10:47 CST 2021 on pts/2
qhpfdb1:/home/grid(+ASM1)$opatch lspatches
33239955;TOMCAT RELEASE UPDATE 19.0.0.0.0 (33239955)
33208123;OCW RELEASE UPDATE 19.13.0.0.0 (33208123)
33208107;ACFS RELEASE UPDATE 19.13.0.0.0 (33208107)
33192793;Database Release Update : 19.13.0.0.211019 (33192793)
32585572;DBWLM RELEASE UPDATE 19.0.0.0.0 (32585572)
OPatch succeeded.
[root@qhpfdb1 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
[root@qhpfdb1 ~]# uname -a
Linux qhpfdb1 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
主备库都为2节点rac:
可以看出,主备除了打的补丁版本不一致以外,主库19.9.0.0,备库 19.13.0.0。剩余的其他环境因素都相同。
一般处理思路
解决方法
一般都不需要手工处理确实的日志,FAL自动会帮我们处理这些问题。 但是,并非我们就完全不用手工处理了,比如,你的磁盘空间爆满,归档日志在传到备库前被转移到其他地方,这种情况下FAL是不能解决问题的,需要手工处理一下。
解决办法:
一、gap较少,或者归档日志还存在,或者可以从备份集中恢复出来可以直接将缺少的归档scp到standby,在standby手工注册下即可。
二、gap较多,在primary 做基于scn的backup,同时创建一个新的standbycontrolfile,将备份好的backupset ,standby controlfile 拷贝到备库的相应目录下,进行restore、recover的操作即可因为这个案例中,standby丢失的归档太多,推荐用第二种方法
本次GAP已经超过15天,比较大了,可以采用第二种方法了,但是如果归档日志不是很多,也可以采用第一种。
可以看出,每天的归档日志平均有200个,每个4G。卧槽,还是很大的,这个库业务还是比较忙的,所以采用第一种的就没啥意义了。
扩展:归档日志是否被删除及是否被备份
此处也可以验证下第一种方法,查看这个归档日志到底还在本地吗?有没有被人为或者自动删除?方便以后别的地方进行恢复
方法1:rman
语法:RMAN> list archivelog <from> sequence xxxx until sequence xxxx <thread x>;
RMAN> list archivelog sequence 46762 thread 1;
specification does not match any archived log in the repository
RMAN> list archivelog sequence 48838 thread 1;
List of Archived Log Copies for database with db_unique_name QHPFDB
=====================================================================
Key Thrd Seq S Low Time
------- ---- ------- - -------------------
163688 1 48838 A 2021-11-23 18:06:43
Name: +ARCHIVEDG/QHPFDB/ARCHIVELOG/2021_11_23/thread_1_seq_48838.625.1089396407
RMAN> list archivelog from sequence 48820 until sequence 48827 thread 1;
List of Archived Log Copies for database with db_unique_name QHPFDB
=====================================================================
Key Thrd Seq S Low Time
------- ---- ------- - -------------------
163641 1 48824 A 2021-11-23 14:25:22
Name: +ARCHIVEDG/QHPFDB/ARCHIVELOG/2021_11_23/thread_1_seq_48824.650.1089384201
163643 1 48825 A 2021-11-23 14:43:20
Name: +ARCHIVEDG/QHPFDB/ARCHIVELOG/2021_11_23/thread_1_seq_48825.364.1089385259
163647 1 48826 A 2021-11-23 15:00:59
Name: +ARCHIVEDG/QHPFDB/ARCHIVELOG/2021_11_23/thread_1_seq_48826.772.1089386307
163651 1 48827 A 2021-11-23 15:18:27
Name: +ARCHIVEDG/QHPFDB/ARCHIVELOG/2021_11_23/thread_1_seq_48827.617.1089387431
RMAN> list archivelog from sequence 33584 until sequence 33589 thread 2;
List of Archived Log Copies for database with db_unique_name QHPFDB
=====================================================================
Key Thrd Seq S Low Time
------- ---- ------- - -------------------
163645 2 33587 A 2021-11-23 14:36:34
Name: +ARCHIVEDG/QHPFDB/ARCHIVELOG/2021_11_23/thread_2_seq_33587.898.1089385597
163649 2 33588 A 2021-11-23 15:06:35
Name: +ARCHIVEDG/QHPFDB/ARCHIVELOG/2021_11_23/thread_2_seq_33588.628.1089387395
163655 2 33589 A 2021-11-23 15:36:34
Name: +ARCHIVEDG/QHPFDB/ARCHIVELOG/2021_11_23/thread_2_seq_33589.704.1089389195
方法2: v$archived_log视图。显示的是控制文件中归档日志的信息。只要归档成功,就会在视图里插入一条信息。
关键字段解析
BACKUP_COUNT
Indicates the number of times this file has been backed up. Values range from 0-15. If the file has been backed up more than 15 times, the value remains 15.判断归档日志到底有没有被备份
COMPLETION_TIME
Time when the archiving completed
APPLIED
只对ADG备库有意义,主库没意义。
STATUS
Status of the archived log:
A - Available
D - Deleted
U - Unavailable
X - Expired RMAN crosscheck以后,就会被标记为过期
DELETED
表示rman是否已经删除了该日志从磁盘
CREATOR
Creator of the archivelog:
■ ARCH - ARCH process
■ FGRD - Foreground process
■ RMAN
■ SRMN - RMAN at standby
■ LGWR - LGWR process
REGISTRAR
Registrar of the entry:
■ RFS
■ ARCH - ARCH process
■ FGRD - Foreground process
■ RMAN
■ SRMN - RMAN at standby
■ LGWR - LGWR process
BLOCKS
Size of the archived log (in blocks)
验证
数据库刚打开归档,产生归档日志
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> col name for a70
SQL> select NAME,THREAD#,SEQUENCE#, FIRST_CHANGE#,DELETED,STATUS,BACKUP_COUNT from v$archived_log;
NAME THREAD# SEQUENCE# FIRST_CHANGE# DEL S BACKUP_COUNT
---------------------------------------------------------------------- ---------- ---------- ------------- --- - ------------
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_60_1024919054.dbf 1 60 912941 NO A 0
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_61_1024919054.dbf 1 61 913190 NO A 0
os层面删除归档日志:
[root@oracle11g ~]# rm -rf /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_60_1024919054.dbf
SQL> /
NAME THREAD# SEQUENCE# FIRST_CHANGE# DEL S BACKUP_COUNT
---------------------------------------------------------------------- ---------- ---------- ------------- --- - ------------
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_60_1024919054.dbf 1 60 912941 NO A 0
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_61_1024919054.dbf 1 61 913190 NO A 0
os层面删除,由于控制文件中的信息没有更新,所以视图内容不会变。
rman crosscheck校验
RMAN> crosscheck archivelog all;
released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=38 device type=DISK
validation failed for archived log
archived log file name=/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_60_1024919054.dbf RECID=3 STAMP=1089139858
validation succeeded for archived log
archived log file name=/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_61_1024919054.dbf RECID=4 STAMP=1089139860
Crosschecked 2 objects
SQL> /
NAME THREAD# SEQUENCE# FIRST_CHANGE# DEL S BACKUP_COUNT
---------------------------------------------------------------------- ---------- ---------- ------------- --- - ------------
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_60_1024919054.dbf 1 60 912941 NO X 0
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_61_1024919054.dbf 1 61 913190 NO A 0
SEQUENCE#=60 的归档日志状态status现在已经变为X,Expired。
RMAN> backup archivelog all;
Starting backup at 20-NOV-21
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=61 RECID=4 STAMP=1089139860
input archived log thread=1 sequence=62 RECID=5 STAMP=1089140440
channel ORA_DISK_1: starting piece 1 at 20-NOV-21
channel ORA_DISK_1: finished piece 1 at 20-NOV-21
piece handle=/u01/app/oracle/product/11.2.0/dbhome_1/dbs/010eltmo_1_1 tag=TAG20211120T190040 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 20-NOV-21
NAME THREAD# SEQUENCE# FIRST_CHANGE# DEL S BACKUP_COUNT
---------------------------------------------------------------------- ---------- ---------- ------------- --- - ------------
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_60_1024919054.dbf 1 60 912941 NO X 0
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_61_1024919054.dbf 1 61 913190 NO A 1
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_62_1024919054.dbf 1 62 913193 NO A 1
日志种显示:current log archived。备份归档日志的时候,会先切换一次归档日志,在备份。所以此次备份了SEQUENCE#=60,61两个。
此时backup_count=1。表示归档日志文件被备份了1次。
删除归档日志:
RMAN> delete archivelog all;
released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=38 device type=DISK
List of Archived Log Copies for database with db_unique_name ZHUO
=====================================================================
Key Thrd Seq S Low Time
------- ---- ------- - ---------
3 1 60 X 20-NOV-21
Name: /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_60_1024919054.dbf
4 1 61 A 20-NOV-21
Name: /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_61_1024919054.dbf
5 1 62 A 20-NOV-21
Name: /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_62_1024919054.dbf
Do you really want to delete the above objects (enter YES or NO)? yes
deleted archived log
archived log file name=/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_60_1024919054.dbf RECID=3 STAMP=1089139858
deleted archived log
archived log file name=/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_61_1024919054.dbf RECID=4 STAMP=1089139860
deleted archived log
archived log file name=/u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_62_1024919054.dbf RECID=5 STAMP=1089140440
Deleted 3 objects
SQL> /
NAME THREAD# SEQUENCE# FIRST_CHANGE# DEL S BACKUP_COUNT
---------------------------------------------------------------------- ---------- ---------- ------------- --- - ----------
1 60 912941 YES D 0
1 61 913190 YES D 1
1 62 913193 YES D 1
此时,3个归档日志文件全被删除,标志是name为null,DELETE为YES,STATUS为D。
所以可以根据这三个字段的任意一个字段的值,来判断这个归档日志文件是否被删除了。
综上,可以查询本次缺失的归档日志情况:
rman:
RMAN> list archivelog from sequence 46762 until sequence 46861;
specification does not match any archived log in the repository
rman显示缺失的归档日志已经被删除.
视图:
SQL> col THREAD# for 999
SQL> col SEQUENCE# for 999999
SQL> select NAME,THREAD#,SEQUENCE#, FIRST_CHANGE#,DELETED,STATUS,BACKUP_COUNT from v$archived_log where SEQUENCE# between 46762 and 46861;
NAME THREAD# SEQUENCE# FIRST_CHANGE# DEL S BACKUP_COUNT
------------------------------ ------- --------- ------------------------------------------------- --- - -------------------------------------------------
1 46762 15331714535658 YES D 3
1 46763 15331715478541 YES D 3
。。。。。。。。。。。。。。。。。。
1 46859 15331953453148 YES D 3
1 46860 15331956911363 YES D 3
1 46861 15331958473086 YES D 3
100 rows selected.
由name为null,DELETE为YES,STATUS为D,可以判断,seq#=46762到seq#=46861的日志已经被rman删除,但是BACKUP_COUNT=3,即rman备份了此日志,该日志由备份集,可以从备份集中恢复出来。
查看归档日志备份集信息:
RMAN> list backup of archivelog from sequence 46762 until sequence 46861;
List of Backup Sets
===================
BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
23942 36.99G SBT_TAPE 00:01:29 2021-11-07 04:08:09
List of Archived Logs in backup set 23942
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- ------------------- ---------- ---------
1 46762 15331714535658 2021-11-07 03:50:49 15331715478541 2021-11-07 03:58:38
1 46763 15331715478541 2021-11-07 03:58:38 15331716466625 2021-11-07 04:06:32
1 46764 15331716466625 2021-11-07 04:06:32 15331716482306 2021-11-07 04:06:38
Backup Set Copy #1 of backup set 23942
Device Type Elapsed Time Completion Time Compressed Tag
----------- ------------ ------------------- ---------- ---
SBT_TAPE 00:01:29 2021-11-07 04:08:09 NO 2021_11_07_04_06_24
List of Backup Pieces for backup set 23942 Copy #1
BP Key Pc# Status Media Piece Name
------- --- ----------- ----------------------- ----------
53703 1 AVAILABLE ARCH_QHPFDB_24000_1_1087963600
53704 2 AVAILABLE ARCH_QHPFDB_24000_2_1087963600
。。。。。。。
BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ -------------------
24030 36.73G SBT_TAPE 00:01:22 2021-11-08 10:08:00
List of Archived Logs in backup set 24030
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- ------------------- ---------- ---------
1 46855 15331937712325 2021-11-08 04:10:55 15331938398224 2021-11-08 04:17:19
1 46856 15331938398224 2021-11-08 04:17:19 15331939128429 2021-11-08 04:23:25
1 46857 15331939128429 2021-11-08 04:23:25 15331942867940 2021-11-08 04:53:20
1 46858 15331942867940 2021-11-08 04:53:20 15331953453148 2021-11-08 05:23:12
Backup Set Copy #1 of backup set 24030
Device Type Elapsed Time Completion Time Compressed Tag
----------- ------------ ------------------- ---------- ---
SBT_TAPE 00:01:22 2021-11-08 10:08:00 NO 2021_11_08_10_06_22
List of Backup Pieces for backup set 24030 Copy #1
BP Key Pc# Status Media Piece Name
------- --- ----------- ----------------------- ----------
53813 1 AVAILABLE ARCH_QHPFDB_24086_1_1088071598
53814 2 AVAILABLE ARCH_QHPFDB_24086_2_1088071598
可以看出,缺失的归档日志是有备份集的,但是一方面,由于各种原因,不好恢复出来,另一方面,缺失的太多了,好几千个。所以决定采用方法2,增量备份恢复的方法,来重新同步。
问题分析
初步尝试
对于做增量备份恢复adg GAP,其实也很简单。参考mos:
Steps to perform for Rolling Forward a Physical Standby Database using RMAN Incremental Backup. (Doc ID 836986.1)
主要步骤如下:
1)备库查询最小scn,备份的起点
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
--------------
15335472404958
SQL> select min(checkpoint_change#) from v$datafile_header;
MIN(F.FHSCN)
----------------
15335440758300
2)主库进行基于scn的增量备份
run{
allocate channel ch1 device type disk;
allocate channel ch2 device type disk;
allocate channel ch3 device type disk;
allocate channel ch4 device type disk;
allocate channel ch5 device type disk;
BACKUP INCREMENTAL FROM SCN 15335440758300 DATABASE FORMAT '/OGG/ForStandby_%U' tag 'FORSTANDBY';
release channel ch1;
release channel ch2;
release channel ch3;
release channel ch4;
release channel ch5;
}
3)备份集scp传输到备库
4)备库备份集注册到控制文件
RMAN> CATALOG START WITH '/OGG/pf/';
5)recover应用增量备份(提升数据文件头的scn)
RMAN> RECOVER DATABASE NOREDO;
6)主库备份standby controlfile
RMAN> BACKUP CURRENT CONTROLFILE FOR STANDBY FORMAT '/OGG/ForStandbyCTRL.bck';
7)备份备库控制文件信息,或者查看备库原来数据文件信息(很重要)
spool standby_datafile_names.txt
set pagesize 1000;
set lines 200
col name format a60
select file#, name from v$datafile order by file# ;
spool off
或者备份下
二进制方式
SQL>alter database backup controlfile to '/home/oracle/ctl.bak';
Database altered.
文本格式SQL
SQL>alter database backup controlfile to trace as '/home/oracle/ctl.txt';
Database altered.
8)备库重新恢复控制文件,
RMAN> SHUTDOWN IMMEDIATE ;
RMAN> STARTUP NOMOUNT;
RMAN> RESTORE STANDBY CONTROLFILE FROM '/OGG/pf/ForStandbyCTRL.bck';
9)启动到mount状态:
RMAN> ALTER DATABASE MOUNT;
10)修改数据文件路径
由于现在备库的控制文件是主库恢复过来的,主备数据文件路径不一致,所以需要修改数据文件路径,采用switch 。。copy的方式。
由前面备份的控制文件中可查,库的数据文件路径有两个地方,
RMAN> catalog start with '+DATADG2/xxxFDB/DATAFILE';
RMAN> catalog start with '+DATADG1/xxxFDB/DATAFILE/';
RMAN> switch database to copy;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of switch to copy command at 05/09/2020 16:20:06
RMAN-06571: datafile 933 does not have recoverable copy
至此,修改数据文件路径的时候,报错了RMAN-06571。报错意思也很,明确,没有可用的镜像copy。那么为什么没有可用的镜像副本哪?
极大可能就是数据文件在路径下不存在。
原因分析
1)先查,是否真的没有?
RMAN> list copy of datafile 933;
specification does not match any datafile copy in the repository
还真没有。
2)为啥没有,是不是路径找错了,导致catalog的时候没注册进去?
检查控制文件备份(此刻显示了备份的重要性),里面的路径就2个:
‘+DATADG1/QHPFDBSTD/DATAFILE/’,
‘+DATADG2/QHPFDBSTD/DATAFILE/’,omf管理。所以路径肯定是全的。
3)到真实路径下查看,是否存在?
查看933文件是什么?
此刻备库已经是恢复后的控制文件,所以主备库都可以查看。我们在主库查询:
SQL> select tablespace_name,file_name from dba_Data_Files where file_id=933;
TABLESPACE_NAME FILE_NAME
------------------------------ ----------------------------------------------------------------------
TBS_SFCINS1_HINDEX +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_hindex.534.1088071291
SQL> select tablespace_name,count(*) from dba_Data_Files where tablespace_name='TBS_SFCINS1_HINDEX' group by tablespace_name;
TABLESPACE_NAME COUNT(*)
------------------------------ -------------------------------------------------
TBS_SFCINS1_HINDEX 100
是表空间TBS_SFCINS1_HINDEX的一个数据文件,这个表空间共有100个数据文件。那么在备库下是否存在933号文件对应的数据文件。100个数据文件太多了,而且在asm中,文件的命名也会发生变化,也就是主备库中,933号文件,在asm中的名字肯定不相同。
补充:ASM 标准格式的文件名是由 tag_name file_number incarnation 三部分组成,例如 SYSTEM.256.769030243 代表着是 system 表空间的一个数据文件,在 ASM 磁盘中是第 256号文件, incarnation 为 769030243 incarnation 是根据时间戳生 成的数字,用来唯一的区分一个文件。 file_number 本身是可以重用的,一个文件被删除后,它空出来的文件号就可以被其他文件使用,因此 file_number 不能唯一的代表一个文件,只有 incarnation 才可以。
插曲一:
刚开始有个失误,用asm中的file_id和数据库中的file_id去对应,结果发现根本对不上。
ASMCMD> find --type datafile +DATADG1 *933*
+DATADG1/xxxDBSTD/DATAFILE/TBS_NEA1_DATA.933.1075972797
+DATADG1/xxxDBSTD/DATAFILE/TBS_SFCINS1_HDATA.616.1055016933
ASMCMD> find --type datafile +DATADG2 *933*
ASMCMD> ls +DATADG1/xxxDBSTD/DATAFILE/TBS_NEA1_DATA.933.1075972797
TBS_NEA1_DATA.933.1075972797
ASMCMD> ls -ls +DATADG1/xxxDBSTD/DATAFILE/TBS_NEA1_DATA.933.1075972797 ##ls -s在asmcmd中可以查看文件大小
Type Redund Striped Time Sys Block_Size Blocks Bytes Space Name
DATAFILE UNPROT COARSE NOV 23 01:00:00 Y 8192 3932161 32212262912 32220643328 TBS_NEA1_DATA.933.1075972797
匹配了一个文件,而且实际大小也合适:
错误的以为这个933文件就是数据库中的933号文件,给我们了误导,以为数据库的933号文件已经在备库中存在,但是没有被正确识别。去重新收到catalog的时候,正常,但是还是会报错没有镜像copy,switch修改路径的时候还是会报错。(其实当时也想到了rename,但是文件名字太多了,对应不上)
所以这就很奇怪了,你明明手动catalog了,可是后面查询确没有。唯一的解释就是你虽然catalog成功了,但是文件不对。
所以,asm中的file_id和数据库中的file_id其实不是想回对应的,在这里,你用表空间名字都可以判断出来。
所以,在asm实际路径中,并不存在933号数据文件。
4)为什么asm中不存在?那么备库以前没有这个文件吗?
查看备库以前的控制文件备份,根据表空间名称匹配控制文件中到底有几个数据文件
[root@qhpfdb1 oracle]# vi ctl.bak
:%s/tbs_sfcins1_hindex/&/gn
匹配了198个数据文件,熟悉控制文件内容的人都知道,里面有resetlog和noresetlog两个选项,所以除以2,共有99个数据文件。所以相比主库的100个数据文件,备库在不同步的时候,少一个数据文件。
另外一种方法是,asm中可以验证,用到asmcmd find 命令,
ASMCMD> help find
find
Displays the absolute paths of all occurrences of the specified name
pattern (with wildcards) in a specified directory and its
subdirectories.
Synopsis
find [--type <type>] <dir> <pattern>
Description
The options for the find command are described below.
--type <type> - Type of target that you want to find.
<dir> - Directory name where you want to start searching.
pattern - Name of a target or a wildcard pattern.
This command searches the specified directory and all subdirectories
under it in the directory tree for the supplied pattern. The value
that you use for pattern can be a directory name or a filename, and
can include wildcard characters.
The ASMCMD find command is case insensitive.
In the output of the command, directory names are suffixed with the
slash character (/) to distinguish them from filenames.
You use the --type flag to find all the files of a particular type
(specified as type). For example, you can search for control files by
specifying type as CONTROLFILE. The type values are listed in the
type column of the V$ASM_FILE view.
Examples
The following are examples of the find command.
The first example searches the data disk group and subdirectories
for files that begin with UNDO. The second example returns the
absolute path of all the control files (--type CONTROLFILE) in the
+DATA/orcl directory.
ASMCMD [+] > find +data undo*
+data/ORCL/DATAFILE/UNDOTBS1.258.691577151
ASMCMD [+] > find --type CONTROLFILE +data/orcl *
+data/orcl/CONTROLFILE/Current.260.691577263
ASMCMD> find --type datafile +DATADG2 *tbs_sfcins1_hindex*
ASMCMD> find --type datafile +DATADG1 *tbs_sfcins1_hindex*
可以把输出结果粘贴到txt中,统计总数,总共也是99个,少1个。和前面的结果相对应。
5)那么备库为什么会少一个数据文件?咨询了现场人员,该库业务重,有可能是主库添加数据文件了,而备库没同步上,已经断档了超过15天,完全有这个可能啊。断档后,主库新添加了数据文件,而备库由于断档,没有把新添加的数据文件同步过来,导致数据文件和asm中都缺少933号文件。 主库查询933号文件创建时间。
SQL> select file#,name,CREATION_TIME from v$datafile where file#>=933
FILE# NAME CREATION_TIME
---------- ------------------------------------------------------------ -------------------
933 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_hindex.534.1088071291 2021-11-08 10:02:43
934 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_index.535.1088071365 2021-11-08 10:03:46
935 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.536.1088071427 2021-11-08 10:04:52
936 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.537.1088072045 2021-11-08 10:15:10
937 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.538.1088072457 2021-11-08 10:22:03
938 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.539.1088072525 2021-11-08 10:23:18
939 +DATADG1/xxxFDB/DATAFILE/tbs_uif1_data.1010.1088262503 2021-11-10 15:09:29
940 +DATADG1/xxxFDB/DATAFILE/tbs_sfcins1_data.1011.1088262569 2021-11-10 15:10:32
941 +DATADG2/xxxFDB/DATAFILE/ts_emos.540.1088264197 2021-11-10 15:37:54
942 +DATADG1/xxxFDB/DATAFILE/ts_emos.1012.1088264371 2021-11-10 15:40:35
943 +DATADG1/xxxFDB/DATAFILE/ts_emos.1013.1088265211 2021-11-10 15:54:34
944 +DATADG1/xxxFDB/DATAFILE/ts_emos.1014.1088265275 2021-11-10 15:55:39
945 +DATADG1/xxxFDB/DATAFILE/ts_emos.1015.1088265339 2021-11-10 15:56:44
946 +DATADG1/xxxFDB/DATAFILE/ts_emos.1016.1088265405 2021-11-10 15:58:00
947 +DATADG1/xxxFDB/DATAFILE/ts_emos.1017.1088265481 2021-11-10 15:59:04
948 +DATADG1/xxxFDB/DATAFILE/ts_emos.1018.1088265545 2021-11-10 16:00:10
949 +DATADG1/xxxFDB/DATAFILE/ts_emos.1019.1088265611 2021-11-10 16:01:15
950 +DATADG1/xxxFDB/DATAFILE/ts_emos.1020.1088265675 2021-11-10 16:02:33
951 +DATADG1/xxxFDB/DATAFILE/ts_emos.1021.1088265753 2021-11-10 16:03:36
952 +DATADG1/xxxFDB/DATAFILE/ts_emos.1022.1088265817 2021-11-10 16:04:41
953 +DATADG1/xxxFDB/DATAFILE/ts_emos.1023.1088265881 2021-11-10 16:05:45
954 +DATADG1/xxxFDB/DATAFILE/ts_emos.1024.1088265945 2021-11-10 16:06:52
955 +DATADG1/xxxFDB/DATAFILE/ts_emos.1025.1088266013 2021-11-10 16:08:19
956 +DATADG1/xxxFDB/DATAFILE/ts_emos.1026.1088266099 2021-11-10 16:09:26
957 +DATADG1/xxxFDB/DATAFILE/ts_emos.1027.1088266167 2021-11-10 16:10:31
958 +DATADG1/xxxFDB/DATAFILE/ts_emos.1028.1088266231 2021-11-10 16:11:36
959 +DATADG1/xxxFDB/DATAFILE/ts_emos.1029.1088266297 2021-11-10 16:12:55
960 +DATADG1/xxxFDB/DATAFILE/ts_emos.1030.1088266375 2021-11-10 16:14:01
961 +DATADG1/xxxFDB/DATAFILE/ts_emos.1031.1088266441 2021-11-10 16:15:09
962 +DATADG1/xxxFDB/DATAFILE/ts_emos.1032.1088266509 2021-11-10 16:16:16
963 +DATADG1/xxxFDB/DATAFILE/ts_emos.1033.1088266577 2021-11-10 16:17:32
964 +DATADG1/xxxFDB/DATAFILE/ts_emos.1034.1088266653 2021-11-10 16:18:38
965 +DATADG1/xxxFDB/DATAFILE/ts_emos.1035.1088266719 2021-11-10 16:19:40
966 +DATADG1/xxxFDB/DATAFILE/ts_emos.1036.1088266781 2021-11-10 16:20:44
967 +DATADG1/xxxFDB/DATAFILE/ts_emos.1037.1088266845 2021-11-10 16:21:49
968 +DATADG1/xxxFDB/DATAFILE/ts_emos.1038.1088266911 2021-11-10 16:23:06
969 +DATADG1/xxxFDB/DATAFILE/ts_emos.1039.1088266987 2021-11-10 16:24:11
970 +DATADG1/xxxFDB/DATAFILE/ts_emos.1040.1088267051 2021-11-10 16:25:14
971 +DATADG1/xxxFDB/DATAFILE/ts_emos.1041.1088267115 2021-11-10 16:26:18
972 +DATADG1/xxxFDB/DATAFILE/ts_emos.1042.1088267179 2021-11-10 16:27:34
973 +DATADG1/xxxFDB/DATAFILE/ts_emos.1043.1088267255 2021-11-10 16:28:40
974 +DATADG1/xxxFDB/DATAFILE/ts_emos.1044.1088267321 2021-11-10 16:29:45
975 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_index.541.1088676671 2021-11-15 10:12:24
976 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.542.1088676745 2021-11-15 10:13:34
977 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.543.1088676815 2021-11-15 10:14:39
978 +DATADG2/xxxFDB/DATAFILE/tbs_res1_data.544.1088676879 2021-11-15 10:15:44
979 +DATADG2/xxxFDB/DATAFILE/tbs_res1_data.545.1088676945 2021-11-15 10:16:48
980 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.546.1088679957 2021-11-15 11:07:02
981 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.547.1088681113 2021-11-15 11:26:17
982 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.548.1088681279 2021-11-15 11:29:02
983 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.549.1088759473 2021-11-16 09:12:27
984 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.550.1088759605 2021-11-16 09:14:27
985 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_hdata.551.1088867537 2021-11-17 15:13:29
986 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_hdata.552.1088867611 2021-11-17 15:14:33
987 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.553.1088934109 2021-11-18 09:43:05
988 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.554.1088934185 2021-11-18 09:44:11
989 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.555.1088934251 2021-11-18 09:45:16
990 +DATADG2/xxxFDB/DATAFILE/tbs_uif1_data.556.1089281919 2021-11-22 10:19:43
991 +DATADG2/xxxFDB/DATAFILE/tbs_nea1_data.557.1089281983 2021-11-22 10:20:50
992 +DATADG2/xxxFDB/DATAFILE/tbs_olcom1_hdata.558.1089282051 2021-11-22 10:21:55
993 +DATADG2/xxxFDB/DATAFILE/tbs_sfcins1_data.559.1089282115 2021-11-22 10:23:14
61 rows selected.
这61个数据文件的创建时间,都在断档之后,所以没有同步过来。
6)其实我们还有个误区,认为基于scn增量备份的时候,会把这个scn之后的所有文件备份出来,然后在恢复的时候,恢复出来。通过后面测试,观察rman日志,却是缺失的数据文件会被增量备份出来,但是recover database noredo的时候,并不会自动创建出来在备库中不存在的数据文件,所以switch修改路径的时候会报数据文件(镜像copy)不存在。此处在另一篇中详述。
7)其实也可以从rman恢复日志中查看,是否对933号文件之后的文件recover了没,但是刚开始认为比较简单,所有的rman都没有记录日志,导致被冲掉了(不管任何时候,保留日志十分重要)。也可以从数据库alert中过滤关键字。
如果recover的时候,会被恢复,就会出现如下信息,
2021-11-22T17:43:34.473090+08:00 Incremental restore complete of datafile 126 +DATADG1/QHPFDBSTD/DATAFILE/tbs_sfcins1_hdata.452.1052049577 checkpoint is 15335440758300 last deallocation scn is 15268544235671 Incremental restore complete of datafile 9 +DATADG1/QHPFDBSTD/DATAFILE/tbs_sfcins1_data.332.1048521227 checkpoint is 15335440758300 last deallocation scn is 15293747016860 Incremental restore complete of datafile 42 +DATADG1/QHPFDBSTD/DATAFILE/tbs_sfcins1_data.365.1048535489 checkpoint is 15335440758300 last deallocation scn is 15293747016860
根据关键字:Incremental restore complete of datafile 进行过滤
$cat alert_qhpfdb1.log |grep -i5n "Incremental restore complete of datafile 933"
没有任何输出,说明933没有被recover,也可以说明933号文件不存在。
8)再判断不清楚,可以用以前备份的控制文件还原,查看恢复前备库数据文件的情况。
到此处,我们就很清楚了,在主备库不同步期间,主库有新添加的数据文件,导致备库没有同步过来。而在增量备份恢复的过程中,没有retore datafile的过程,recover database不会自动创建包含在备份集中,实际不存在的数据文件,导致实际就不存在这部分新创建的数据文件,所以switch 修改文件路径的时候会报错RMAN-06571。
那么现在的问题就是restore 这部分缺失的数据文件,重新recover即可。
问题解决
其实参考的mos里面有明确的说明,但是基于以前经验,没有认真查看:
参考mos: Steps to perform for Rolling forward a standby database using RMAN incremental backup when datafile is added to primary (Doc ID 1531031.1)
其实里面唯一的不同,就是增加了手动备份恢复缺失的数据文件过程。
1)备份缺失的数据文件
RMAN> run{
allocate channel ch1 device type disk;
allocate channel ch2 device type disk;
allocate channel ch3 device type disk;
allocate channel ch4 device type disk;
allocate channel ch5 device type disk;
backup datafile 933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993 format '/OGG/dataForStandby_%U' tag 'dataFORSTANDBY';
release channel ch1;
release channel ch2;
release channel ch3;
release channel ch4;
release channel ch5;
}
备注:其实不用备份缺失的文件,因为前面的增量备份集中包含了这些缺失的数据文件,虽然没有从里面还原出来。我们可以 list backup;查看,手动set newname for datafile 933 to ‘+DATADG2’;restore datafile 933;。具体见另一篇。
2)备库restore缺失数据文件
RMAN> CATALOG START WITH '/OGG/';
RMAN> run{
allocate channel ch1 device type disk;
allocate channel ch2 device type disk;
allocate channel ch3 device type disk;
allocate channel ch4 device type disk;
allocate channel ch5 device type disk;
set newname for datafile 933 to '+DATADG2';
set newname for datafile 934 to '+DATADG2';
set newname for datafile 935 to '+DATADG2';
set newname for datafile 936 to '+DATADG2';
set newname for datafile 937 to '+DATADG2';
set newname for datafile 938 to '+DATADG2';
set newname for datafile 939 to '+DATADG1';
set newname for datafile 940 to '+DATADG1';
set newname for datafile 941 to '+DATADG2';
set newname for datafile 942 to '+DATADG1';
set newname for datafile 943 to '+DATADG1';
set newname for datafile 944 to '+DATADG1';
set newname for datafile 945 to '+DATADG1';
set newname for datafile 946 to '+DATADG1';
set newname for datafile 947 to '+DATADG1';
set newname for datafile 948 to '+DATADG1';
set newname for datafile 949 to '+DATADG1';
set newname for datafile 950 to '+DATADG1';
set newname for datafile 951 to '+DATADG1';
set newname for datafile 952 to '+DATADG1';
set newname for datafile 953 to '+DATADG1';
set newname for datafile 954 to '+DATADG1';
set newname for datafile 955 to '+DATADG1';
set newname for datafile 956 to '+DATADG1';
set newname for datafile 957 to '+DATADG1';
set newname for datafile 958 to '+DATADG1';
set newname for datafile 959 to '+DATADG1';
set newname for datafile 960 to '+DATADG1';
set newname for datafile 961 to '+DATADG1';
set newname for datafile 962 to '+DATADG1';
set newname for datafile 963 to '+DATADG1';
set newname for datafile 964 to '+DATADG1';
set newname for datafile 965 to '+DATADG1';
set newname for datafile 966 to '+DATADG1';
set newname for datafile 967 to '+DATADG1';
set newname for datafile 968 to '+DATADG1';
set newname for datafile 969 to '+DATADG1';
set newname for datafile 970 to '+DATADG1';
set newname for datafile 971 to '+DATADG1';
set newname for datafile 972 to '+DATADG1';
set newname for datafile 973 to '+DATADG1';
set newname for datafile 974 to '+DATADG1';
set newname for datafile 975 to '+DATADG2';
set newname for datafile 976 to '+DATADG2';
set newname for datafile 977 to '+DATADG2';
set newname for datafile 978 to '+DATADG2';
set newname for datafile 979 to '+DATADG2';
set newname for datafile 980 to '+DATADG2';
set newname for datafile 981 to '+DATADG2';
set newname for datafile 982 to '+DATADG2';
set newname for datafile 983 to '+DATADG2';
set newname for datafile 984 to '+DATADG2';
set newname for datafile 985 to '+DATADG2';
set newname for datafile 986 to '+DATADG2';
set newname for datafile 987 to '+DATADG2';
set newname for datafile 988 to '+DATADG2';
set newname for datafile 989 to '+DATADG2';
set newname for datafile 990 to '+DATADG2';
set newname for datafile 991 to '+DATADG2';
set newname for datafile 992 to '+DATADG2';
set newname for datafile 993 to '+DATADG2';
restore datafile 933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993;}
restore完成后,控制文件中的数据文件路径应该是主库的路径,但是此处却是“MUST_RENAME_THIS_DATAFILE_”,可能是19c的新特性。
RMAN> report schema;
RMAN-06139: warning: control file is not current for REPORT SCHEMA
Report of database schema for database with db_unique_name QHPFDBSTD
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 0 SYSTEM *** +DATADG1/MUST_RENAME_THIS_DATAFILE_1.4294967295.4294967295
2 0 SYSAUX *** +DATADG1/MUST_RENAME_THIS_DATAFILE_2.4294967295.4294967295
3 0 UNDOTBS1 *** +DATADG1/MUST_RENAME_THIS_DATAFILE_3.4294967295.4294967295
4 0 UNDOTBS2 *** +DATADG1/MUST_RENAME_THIS_DATAFILE_4.4294967295.4294967295
5 0 USERS *** +DATADG1/MUST_RENAME_THIS_DATAFILE_5.4294967295.4294967295
6 0 SYSTEM *** +DATADG1/MUST_RENAME_THIS_DATAFILE_6.4294967295.4294967295
7 0 SYSAUX *** +DATADG1/MUST_RENAME_THIS_DATAFILE_7.4294967295.4294967295
8 0 USERS *** +DATADG1/MUST_RENAME_THIS_DATAFILE_8.4294967295.4294967295
9 0 TBS_SFCINS1_DATA *** +DATADG1/MUST_RENAME_THIS_DATAFILE_9.4294967295.4294967295
.........
933 0 TBS_SFCINS1_HINDEX *** +DATADG2/MUST_RENAME_THIS_DATAFILE_933.4294967295.4294967295
934 0 TBS_SFCINS1_INDEX *** +DATADG2/MUST_RENAME_THIS_DATAFILE_934.4294967295.4294967295
935 0 TBS_SFCINS1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_935.4294967295.4294967295
936 0 TBS_SFCINS1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_936.4294967295.4294967295
937 0 TBS_SFCINS1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_937.4294967295.4294967295
938 0 TBS_SFCINS1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_938.4294967295.4294967295
..........
987 0 TBS_SFCINS1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_987.4294967295.4294967295
988 0 TBS_SFCINS1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_988.4294967295.4294967295
989 0 TBS_SFCINS1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_989.4294967295.4294967295
990 0 TBS_UIF1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_990.4294967295.4294967295
991 0 TBS_NEA1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_991.4294967295.4294967295
992 0 TBS_OLCOM1_HDATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_992.4294967295.4294967295
993 0 TBS_SFCINS1_DATA *** +DATADG2/MUST_RENAME_THIS_DATAFILE_993.4294967295.4294967295
3)switch 修改路径
RMAN> catalog start with '+DATADG2/QHPFDBSTD/DATAFILE/';
RMAN> catalog start with '+DATADG1/QHPFDBSTD/DATAFILE/';
RMAN> switch database to copy;
.........
datafile 1 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/system.284.1048243311"
datafile 2 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/sysaux.283.1048243317"
datafile 3 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/undotbs1.263.1048243317"
datafile 4 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/undotbs2.281.1048243325"
datafile 5 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/users.280.1048243325"
datafile 6 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/system.268.1048243311"
datafile 7 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/sysaux.257.1048243311"
datafile 8 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/users.285.1048243311"
datafile 9 switched to datafile copy "+DATADG1/xxxDBSTD/DATAFILE/tbs_sfcins1_data.332.1048521227"
.............
datafile 990 switched to datafile copy "+DATADG2/xxxDBSTD/DATAFILE/tbs_uif1_data.592.1089326867"
datafile 991 switched to datafile copy "+DATADG2/xxxDBSTD/DATAFILE/tbs_nea1_data.596.1089327153"
datafile 992 switched to datafile copy "+DATADG2/xxxDBSTD/DATAFILE/tbs_olcom1_hdata.600.1089327427"
datafile 993 switched to datafile copy "+DATADG2/xxxDBSTD/DATAFILE/tbs_sfcins1_data.585.1089325971"
4)重新recover
RMAN> recover database noredo; Starting recover at 2021-11-22 23:25:32 using channel ORA_DISK_1 Finished recover at 2021-11-22 23:25:58
5)启动mrp
SQL> recover managed standby database using current logfile disconnect from session;
6)查看同步状态
INST_ID THREAD# PROCESS PID STATUS CLIENT_P CLIENT_PID SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
---------- ---------- --------- ------------------------ ------------ -------- ---------- ---------- ---------- ------------- ------------
1 0 RFS 116162 IDLE UNKNOWN 13487 0 0 0 0
1 0 RFS 3521 IDLE UNKNOWN 13466 0 0 0 0
1 0 DGRD 59223 ALLOCATED N/A N/A 0 0 0 0
1 0 DGRD 59225 ALLOCATED N/A N/A 0 0 0 0
1 0 RFS 60322 IDLE UNKNOWN 15806 0 0 0 0
2 0 ARCH 76048 CONNECTED ARCH 76048 0 0 0 0
2 0 DGRD 76076 ALLOCATED N/A N/A 0 0 0 0
2 0 DGRD 76103 ALLOCATED N/A N/A 0 0 0 0
2 0 ARCH 76152 CONNECTED ARCH 76152 0 0 0 0
2 0 ARCH 76179 CONNECTED ARCH 76179 0 0 0 0
2 0 ARCH 76200 CONNECTED ARCH 76200 0 0 0 0
1 1 MRP0 123593 APPLYING_LOG N/A N/A 48949 4493727 128 128
1 1 RFS 163806 IDLE Archival 2335 0 0 0 0
1 1 RFS 40928 IDLE Archival 62114 0 0 0 0
1 1 RFS 46564 RECEIVING LGWR 13606 48949 4493795 0 0
1 1 ARCH 59208 CLOSING ARCH 59208 48948 7276544 0 0
1 1 ARCH 59231 CLOSING ARCH 59231 48946 7264256 0 0
1 1 ARCH 59235 CLOSING ARCH 59235 48947 7274496 0 0
1 2 RFS 138255 RECEIVING LGWR 16492 33664 558960 0 0
1 2 ARCH 59233 CLOSING ARCH 59233 33662 847872 0 0
SOURCE_DBID SOURCE_DB_ NAME VALUE UNIT TIME_COMPUTED DATUM_TIME CON_ID
----------- ---------- --------------- --------------- ------------------------------ -------------------- ------------------------------ ----------
0 transport lag +00 00:00:00 day(2) to second(0) interval 11/24/2021 15:22:37 11/24/2021 15:22:36 0
0 apply lag +00 00:00:00 day(2) to second(0) interval 11/24/2021 15:22:37 11/24/2021 15:22:36 0
主备同步状态以恢复正常。
至此 ,故障解决完成
启示
在做增量备份恢复之前,一定要看不同步状态的时间长短,如果很长了,那么期间可能就会发生数据文件的增减,在备份前,一定要对比主备库的数据文件数量,是否一致。如果不一致,就要先restore缺失的数据文件。如果一致,按普通的增量备份恢复来即可。切记,做增量备份恢复前一定要检查仔细,控制文件最好手动备份下,以防出现问题,都没法参照。
参考
Steps to perform for Rolling Forward a Physical Standby Database using RMAN Incremental Backup. (Doc ID 836986.1)
Steps to perform for Rolling forward a standby database using RMAN incremental backup when datafile is added to primary (Doc ID 1531031.1)