暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Restore Database with ORA-01861

原创 Roger 2016-12-02
1841
今天一个客户问我,有个异机恢复的问题,原库是生产,还在运行,需要将NBU中的备份恢复到另外一个主机上,数据文件均为raw。首先我们来看一下错误:
RMAN> run{
2>
3> allocate channel t1 type 'sbt_tape';
4> allocate channel t2 type 'sbt_tape';
5> send 'NB_ORA_CLIENT=sth05v01';
6> set until time "to_date('2016-11-25 06:50:00','yyyy-mm-dd hh24:mi:ss')";
7> restore database;
8> release channel t1;
9> release channel t2;}

allocated channel: t1
channel t1: sid=540 devtype=SBT_TAPE
channel t1: Veritas NetBackup for Oracle - Release 7.5 (2012020801)

allocated channel: t2
channel t2: sid=539 devtype=SBT_TAPE
channel t2: Veritas NetBackup for Oracle - Release 7.5 (2012020801)

sent command to channel: t1
sent command to channel: t2

executing command: SET until clause

Starting restore at 01-DEC-16

released channel: t1
released channel: t2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 12/01/2016 14:46:30
ORA-01861: literal does not match format string
复制

大家不难看出,rman执行报错了。从上述错误来看,感觉像是nls_date_format设置的问题。的确,如果我们在rman命令行中使用了until time操作,那么建议设置nls_date_Format参数,可以设置os环境变量也可以在rman中执行sql执行。但是这里其实并不是这种情况。实际上我无论是在os设置还是rman命令行中设置,均不会起作用。

针对该问题,Oracle mos有一篇文档RMAN Recovery Session Fails with ORA-1861 (文档 ID 852723.1) 记录十分详细,里面提到了一些解决方案,其中也提到了一些检测判断是否为文章中提到的bug的方式。

不幸的是,我参考文章的的方式,执行了如下脚本发现根本不返回记录,也就说明不是文档中描述的这个问题。

于是我进一步使用rman target / debug trace=/tmp/rman.trc进行跟踪,检查发现rman在执行脚本的时候,在运行如下的SQL时报错:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 12/01/2016 17:38:21
RMAN-06003: ORACLE error from target database:
ORA-01861: literal does not match format string
RMAN-06097: text of failing SQL statement: select fhscn ,to_date(fhtim,'MM/DD/RR HH24:MI:SS','NLS_CALENDAR=Gregorian') ,fhcrs ,fhrls ,to_date(fhrlc,'MM/DD/RR HH24:MI:SS','NLS_CALENDAR=Gregorian') ,fhafs ,fhrfs ,fhrft ,hxerr ,fhfsz ,fhsta into :b1,:b2:b3,:b4,:b5,:b6:b7,:b8:b9,:b10:b11,:b12:b13,:b14,:b15,:b16 from x$kcvfhall where hxfil=:b17
RMAN-06099: error occurred in source file: krmk.pc, line: 5655
DBGMISC: ENTERED krmkursr [17:38:21.493]

DBGSQL: EXEC SQL AT TARGET select decode(open_mode,'MOUNTED',0,'READ WRITE',1,'READ ONLY',1,0) into :b1 from v$database [17:38:21.493]
DBGSQL: sqlcode=0 [17:38:21.494]
DBGSQL: :b1 = 0
复制

从上面的内容来看,上述的select fhscn语句是报错的关键。将该SQL挑出来,单独执行,发现果然有问题。如下是我的测试结果:
SQL> select fhscn,
2 to_date(fhtim, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
3 fhcrs,
4 fhrls,
5 to_date(fhrlc, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
6 fhafs,
7 fhrfs,
8 fhrft,
9 hxerr,
10 fhfsz,
11 fhsta
12 from x$kcvfhall
13 where hxfil =252;
to_date(fhtim, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
*
ERROR at line 2:
ORA-01861: literal does not match format string

SQL> l
1 select fhscn,
2 to_date(fhtim, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
3 fhcrs,
4 fhrls,
5 to_date(fhrlc, 'MM/DD/RR HH24:MI:SS', 'NLS_CALENDAR=Gregorian'),
6 fhafs,
7 fhrfs,
8 fhrft,
9 hxerr,
10 fhfsz,
11 fhsta
12 from x$kcvfhall
13* where hxfil <252 and hxfil > 252
SQL> /

FHSCN TO_DATE(FHTIM,'MM FHCRS FHRLS TO_DATE(FHRLC,'MM FHAFS FHRFS FHRFT HXERR FHFSZ FHSTA
---------------- ----------------- ---------------- ---------------- ----------------- ---------------- ---------------- -------------------- ---------- ---------- ----------
408027070532 02/23/88 02:19:50 0 0 313537986655 8 0 0
339307855959 02/21/88 13:55:10 0 0 266292363311 8 0 0
231931576374 02/07/88 04:01:52 0 0 206161641523 8 0 0
214751510576 02/09/88 10:38:52 0 0 206161772597 8 0 0
287767330911 03/02/88 17:11:37 0 0 313537593439 8 0 0
287769034817 03/11/88 01:26:40 0 0 335013675081 8 0 0
356487069774 03/11/88 01:26:33 0 0 335013675081 8 0 0
257701183536 02/27/88 15:33:21 0 0 300653936719 8 0 0
137441050656 01/25/88 06:33:04 0 0 137441050656 8 0 0
0 0 0
......
复制

从测试结果来看,我们可以断定第252号文件在控制文件中的信息可能有点问题。这个问题我也是第一次遇到。不过我将controlfile 的内容dump出来之后,过滤定位到file=252的内容,发现似乎并没有异常的地方:
sth14n01$(/oracle/app/admin/SETTLE/bdump)cat /oracle/app/product/10.2.0/db_1/rdbms/log/settle1_ora_10832.trc|grep 'File=252'
RECID #39423 Recno 1071 Record timestamp 11/25/16 00:51:28 File=252 Incremental backup level=0
RECID #38917 Recno 565 Record timestamp 11/17/16 20:53:38 File=252 Incremental backup level=0

RECID #39423 Recno 1071 Record timestamp 11/25/16 00:51:28 File=252 Incremental backup level=0
File is part of the incremental strategy
Backup set key: stamp=928792972, count=17980
Creation checkpointed at scn: 0x0caf.169d9ef4 10/28/14 12:19:55
File header checkpointed at scn: 0x0dc6.4f902943 11/24/16 22:02:53
Resetlogs scn and time scn: 0x0000.0007c754 12/31/13 10:28:23
Incremental Change scn: 0x0000.00000000
Absolute Fuzzy scn: 0x0000.00000000
Newly-marked media corrupt blocks 0 Total media corrupt blocks 0
Total logically corrupt blocks 0 Block images written to backup 1048456
File size at backup time 1048575 Block size 8192
Low Offline Range Recid 0
Number of blocks read during backup 1048575

RECID #38917 Recno 565 Record timestamp 11/17/16 20:53:38 File=252 Incremental backup level=0
File is part of the incremental strategy
Backup set key: stamp=928173622, count=17846
Creation checkpointed at scn: 0x0caf.169d9ef4 10/28/14 12:19:55
File header checkpointed at scn: 0x0dc4.fb8c934d 11/17/16 18:00:23
Resetlogs scn and time scn: 0x0000.0007c754 12/31/13 10:28:23
Incremental Change scn: 0x0000.00000000
Absolute Fuzzy scn: 0x0000.00000000
Newly-marked media corrupt blocks 0 Total media corrupt blocks 0
Total logically corrupt blocks 0 Block images written to backup 1048456
File size at backup time 1048575 Block size 8192
Low Offline Range Recid 0
Number of blocks read during backup 1048575
复制

不难看出,上述的内容似乎没有什么问题。从整体现象来看与Oracle mos文档中提到的bug确实很像,几乎完全一样。但是通过mos文档中提到的脚本检查缺没有问题,这让我感觉很奇怪。

不过这里有点不确认的是,我是在生产库进行检测的,同时执行select语句也没有任何问题。说明原生产库的controlfile本身就是ok的。按理说应该在新环境中进行检测。

简而言之,这个问题确实有些怪异。不过还好的是,要解决或者绕过这个问题并不难,比如可以restore tablespace  xx,xxx  skip datafile 252即可。最后单独备份这个252文件进行restore,然后即可。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论