ADG基于SCN恢复备库（修复GAP）

原创 virvle 2024-05-16

1336

环境：主备基于11.2.0.4版本

一、备库遇到未同步问题：

前几天查看备库的数据，发现数据不是最新的，最新的还停留在4月中，而查询同步延时为0分钟，难怪一直没有同步异常告警
进一步排查，发现是归档日志未从主库传输到备库，发现LOG_ARCHIVE_DEST_STATE_2=defer 。。。。我的天啊

于是尝试利用主库备份的归档日志，传输到备库进行应用，而这过程中，发现缺失了一部分归档日志，触发GAP

Mon May 13 14:02:06 2024
Fetching gap sequence in thread 1, gap sequence 452894-452993
Mon May 13 14:04:00 2024
FAL[client]: Failed to request gap sequence
 GAP - thread 1 sequence 452894-452993
 DBID 2198540012 branch 909675799
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that's sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
------------------------------------------------------------

至此，使用归档日志应用恢复，已经无望了。
那么接下来，通常做法是要么备库铲掉重做，要么是基于SCN增量恢复，各有优缺点：
（1）铲掉重做：若是数据量特别大，历时较长
（2）基于SCN恢复：速度快，但增备可能对主库造成一定的性能影响

二、综合考虑，使用基于SCN 恢复备库

1. 关闭备库应用，并查询当前备库SCN


SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.

SQL> select  to_char(current_scn) from v$database;
TO_CHAR(CURRENT_SCN)
----------------------------------------
17626998155431

2. 重启备库到nomount【一定要确认IP，谨慎！再谨慎！】

sqlplus / as sysdba
shutdown immediate;
startup nomount;

3. 在主库查询备库的SCN到当前时间，是否有新增数据文件（使用第一步获取到的SCN）

select file#,to_char(creation_time,'yyyy-mm-dd hh24:mi:ss' ) creation_time,status,last_time,name 
from v$datafile 
where creation_change#>17626998155431;

4. 备份控制文件

alter database create standby controlfile as '/databak/scnbak/standby211_20240515.ctl';

5. 主库基于scn增量备份（需考虑备份对服务器的影响）

5.1 备份脚本：

#!/bin/bash
. $HOME/.bash_profile

DATE=`date +%y%m%d%H%M`;export DATE
RMAN_LOG_FILE=/databak/scnbak/dbbackup_$DATE.log

rman target / nocatalog  msglog $RMAN_LOG_FILE  <<EOF
run
{
allocate channel ch1 type disk;
allocate channel ch2 type disk;
allocate channel ch3 type disk;
allocate channel ch4 type disk;
backup as backupset incremental from scn 17626998155431 database filesperset 5 tag='incr_scn_bak' format '/databak/scnbak/incre_%U_%T';
release channel ch1;
release channel ch2;
release channel ch3;
release channel ch4;
}
EOF
DATE=`date +%y%m%d`;export DATE

5.2 为了让其在业务低峰期执行，定了个计划任务执行shell脚本

 crontab -l
01 04 * * * /bin/sh /mnt/incre_dbbackup.sh

5.3 查看备份日志情况 (备份耗时1小时15分钟）

$ cat   dbbackup_2405160928.log|grep "comp"
channel ch4: backup set complete, elapsed time: 00:08:35
channel ch2: backup set complete, elapsed time: 00:12:12
channel ch3: backup set complete, elapsed time: 00:13:27
channel ch1: backup set complete, elapsed time: 00:15:13
channel ch4: backup set complete, elapsed time: 00:07:22
channel ch2: backup set complete, elapsed time: 00:08:32
channel ch1: backup set complete, elapsed time: 00:07:16
channel ch4: backup set complete, elapsed time: 00:06:56
channel ch3: backup set complete, elapsed time: 00:10:23
channel ch4: backup set complete, elapsed time: 00:05:52
channel ch2: backup set complete, elapsed time: 00:08:18
channel ch1: backup set complete, elapsed time: 00:08:38
channel ch3: backup set complete, elapsed time: 00:07:31
channel ch2: backup set complete, elapsed time: 00:07:15
channel ch4: backup set complete, elapsed time: 00:07:39
channel ch3: backup set complete, elapsed time: 00:05:48
channel ch1: backup set complete, elapsed time: 00:08:18
channel ch4: backup set complete, elapsed time: 00:08:55
channel ch2: backup set complete, elapsed time: 00:09:47
channel ch3: backup set complete, elapsed time: 00:09:51
channel ch1: backup set complete, elapsed time: 00:11:12
channel ch4: backup set complete, elapsed time: 00:06:42
channel ch3: backup set complete, elapsed time: 00:05:58
channel ch2: backup set complete, elapsed time: 00:07:50
channel ch1: backup set complete, elapsed time: 00:06:04
channel ch3: backup set complete, elapsed time: 00:06:55
channel ch2: backup set complete, elapsed time: 00:08:06
channel ch4: backup set complete, elapsed time: 00:10:44
channel ch1: backup set complete, elapsed time: 00:08:43
channel ch2: backup set complete, elapsed time: 00:06:57
channel ch4: backup set complete, elapsed time: 00:08:47
channel ch3: backup set complete, elapsed time: 00:11:43
channel ch4: backup set complete, elapsed time: 00:00:03
channel ch2: backup set complete, elapsed time: 00:05:32
channel ch1: backup set complete, elapsed time: 00:10:07
Recovery Manager complete.

6. 传输主库备份的控制文件及增量备份文件到备库

scp -rp -P22 /databak/scnbak/*  oracle@10.10.10.2:/databak/scnbak/

7. 备库恢复控制文件

rman target /

RMAN> restore controlfile from '/databak/scnbak/standby211_20240515.ctl';

Starting restore at 2024:05:16 11:08:14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1135 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=/mnt/oradata/control01.ctl
output file name=/mnt/oradata/control02.ctl
Finished restore at 2024:05:16 11:08:21

8. 启动备库到mount状态，并注册备份文件

rman target /
alter database mount;

catalog start with '/databak/scnbak';

9备库恢复新增文件,若第3部位查询无新增文件，跳过此步(本次恢复中未涉及，跳过此步骤)

run{
allocate channel ch1 type disk;
allocate channel ch2 type disk;
backup datafile 200,201,202  format '/databak/scnbak/incre_dbfile_%U';
release channel ch1;
release channel ch2;
}

10. 查看备份集

report schema;

RMAN> list incarnation;


List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       1       ORCL      2198540012       PARENT  1          2009:08:15 00:16:43
2       2       ORCL      2198540012       CURRENT 945184     2016:04:20 15:43:19

11. 恢复备库,鉴于资源充足，直接开8通道，即时运行（耗时13分钟）

11.1 恢复脚本 rman_recover_incre_scn.sh

#!/bin/bash
. $HOME/.bash_profile

DATE=`date +%y%m%d%H`;export DATE
RMAN_RECOVER_LOG_FILE=/mnt/script/recover_script/recover_incre_scn_$DATE.log

rman target / nocatalog  msglog $RMAN_RECOVER_LOG_FILE  <<EOF
run
{
allocate channel ch1 type disk;
allocate channel ch2 type disk;
allocate channel ch3 type disk;
allocate channel ch4 type disk;
allocate channel ch5 type disk;
allocate channel ch6 type disk;
allocate channel ch7 type disk;
allocate channel ch8 type disk;
recover database noredo;
release channel ch1;
release channel ch2;
release channel ch3;
release channel ch5;
release channel ch6;
release channel ch7;
release channel ch8;
}
EOF
DATE=`date +%y%m%d`;export DATE

11.2 执行恢复脚本

nohup sh rman_recover_incre_scn.sh &

PS：到此，若直接open数据库，会提示报错

12. 启动备库 mgr 应用日志

12.1 启动日志实时应用

alter database recover managed standby database using current logfile disconnect from session;

12.2 查看redo日志，观察日志应用情况

Thu May 16 12:33:58 2024
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 8 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Waiting for thread 1 sequence 453056
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION
Thu May 16 12:37:01 2024
Primary database is in MAXIMUM PERFORMANCE mode
SRL log 20 needs clearing because log has not been created
SRL log 21 needs clearing because log has not been created
SRL log 22 needs clearing because log has not been created
RFS[1]: Assigned to RFS process 19706
RFS[1]: Selected log 23 for thread 1 sequence 453058 dbid -2108412354branch 909675799
Thu May 16 12:37:01 2024
SRL log 20 needs clearing because log has not been created
SRL log 21 needs clearing because log has not been created
SRL log 22 needs clearing because log has not been created
RFS[2]: Assigned to RFS process 19712
RFS[2]: Selected log 24 for thread 1 sequence 453057 dbid -2108412354 branch 909675799
Thu May 16 12:37:03 2024
Fetching gap sequence in thread 1, gap sequence 453056-453056
Thu May 16 12:37:04 2024
RFS[3]: Assigned to RFS process 19714
RFS[3]: Opened log for thread 1 sequence 453056 dbid -2108412354branch 909675799
Thu May 16 12:37:06 2024
Archived Log entry 1 added for thread 1 sequence 453057 ID 0x825fe694 dest 1:
Thu May 16 12:37:14 2024
SRL log 20 needs clearing because log has not been created
SRL log 21 needs clearing because log has not been created
SRL log 22 needs clearing because log has not been created
RFS[1]: Selected log 24 for thread 1 sequence 453059 dbid -2108412354 branch 909675799
Thu May 16 12:37:14 2024
Archived Log entry 2 added for thread 1 sequence 453058 ID 0x825fe694 dest 1:
SRL log 20 needs clearing because log has not been created
SRL log 21 needs clearing because log has not been created
SRL log 22 needs clearing because log has not been created
RFS[1]: Selected log 23 for thread 1 sequence 453060 dbid -2108412354 branch 909675799
Thu May 16 12:37:19 2024
Archived Log entry 3 added for thread 1 sequence 453059 ID 0x825fe694 dest 1:
Thu May 16 12:37:49 2024
Archived Log entry 4 added for thread 1 sequence 453056 rlc 909675799 ID 0x825fe694 dest 2:
Thu May 16 12:37:50 2024
Media Recovery Log /mnt/archivelog/ARC0000453056_0909675799.0001.arc
Thu May 16 12:38:20 2024
Media Recovery Log /mnt/archivelog/ARC0000453057_0909675799.0001.arc
Media Recovery Log /mnt/archivelog/ARC0000453058_0909675799.0001.arc
Media Recovery Log /mnt/archivelog/ARC0000453059_0909675799.0001.arc
Media Recovery Waiting for thread 1 sequence 453060 (in transit)
Recovery of Online Redo Log: Thread 1 Group 23 Seq 453060 Reading mem 0
  Mem# 0: /mnt/oradata/standby_redo23.log

12.3 当再行日志到达当前最新时，取消日志应用，打开数据库

SQL> alter database recover managed standby database cancel;
Database altered.

SQL> alter database open;
Database altered.

SQL> alter database recover managed standby database using current logfile disconnect;
Database altered.


SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ ONLY WITH APPLY

13. 主库重新打开归档同步日志同步

sqlplus / as sysdba
alter system set log_archive_dest_state_2=defer;
alter system set log_archive_dest_state_2=enable;

14. 查看主备同步情况

select * from v$dataguard_stats;

三、常用SQL

1. 查询备库的进程状态

select process, status,sequence#,thread# from v$managed_standby;

select message_num,error_code,timestamp,message from v$dataguard_status;

2. 查询备库的角色

set linesize 450;
column DBNAME format a10;
column DBUNAME format a10;
column cftype format a10;
column OPEN_MODE format a25;
column DATABASE_ROLE format a20;
select name dbname,db_unique_name dbuname,controlfile_type cftype,database_role,open_mode from v$database;

3. 查询备库的日志应用模式

SQL> select recovery_mode from v$archive_dest_status where dest_id=1;

RECOVERY_MODE
-----------------------
MANAGED REAL TIME APPLY

4. 查看恢复进程

set lines 450
set pages 900
col type for a15
col item for a20
col units for a15
select to_char(start_time, 'DD-MON-RR HH24:MI:SS') start_time, type,
       item, units, sofar, total, timestamp 
from v$recovery_progress;

5. 查看当前归档日志应用情况

select a.thread#,
       b.max_available,
       a.max_applied,
       b.max_available - a.max_applied
  from (select thread#, max(sequence#) max_applied
          from gv$archived_log
         where applied = 'YES'
         group by thread#) a,
       (select thread#, max(sequence#) max_available
          from gv$archived_log
         group by thread#) b
 where a.thread# = b.thread#;

四. 常见报错

1. 主库生成用于备库的控制文件报错，提示目录不存在

SQL> alter database create standby controlfile as '/scnbak/standby211_20240515.ctl';
alter database create standby controlfile as '/scnbak/standby211_20240515.ctl'
*
ERROR at line 1:
ORA-01580: error creating control backup file /scnbak/standby211_20240515.ctl
ORA-27040: file create error, unable to create file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 1

处理措施：新建目录，或选择正确的目录存放

2. 注册备份目录遇到报错

File Name: /databak/scnbak/standby211_20240515.ctl

List of Files Which Where Not Cataloged
=======================================
File Name: /databak/scnbak/dbbackup_24051609.log
  RMAN-07517: Reason: The file header is corrupted
File Name: /databak/scnbak/dbbackup_2405160928.log
  RMAN-07517: Reason: The file header is corrupted

处理措施：查看非备份文件，忽略即可

3. recover完成后，打开报错（先应用归档日志，再开启即可）

SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/mnt/oradata/system01.dbf'

处理措施：先应用日志，开启mgr，待同步后，再行打开
– 查看归档日志

Errors in file /mnt/oracle/diag/rdbms/vir_st/vir/trace/vir_ora_17866.trc:
ORA-16016: archived log for thread 1 sequence# 453056 unavailable
Recovery interrupted!
Completed Standby Crash Recovery.
Signalling error 1152 for datafile 1!
Errors in file /mnt/oracle/diag/rdbms/vir_st/vir/trace/vir_ora_17866.trc:
ORA-10458: standby database requires recovery
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/mnt/oradata/system01.dbf'
ORA-10458 signalled during: alter database open ...

到此，已恢复备库的实时。定期检查很重要

墨力计划故障处理 oracle11g

最后修改时间：2024-05-17 10:37:51

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者