问题描述
listener不能访问,重启lsrnctl restart 无效,最后操作系统 重启后正常。
RPHDB时间在:2019.01.30 01:02:02~ 03:35:38
1. 2019.01.30 02:41接到电话,反映Rerport不能使用,erp有画面报警;
我发现db不能连接,lsnr 不能服务了;
2. 查询日志:
发现:
Wed Jan 30 01:02:02 China Standard Time 2019 ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4688' waited for 'direct path read', seq_num: 10340 for 'rdbms ipc message' count=1 wait_time=3.009785 sec复制
DB: direct path read 这个值超时。
发现大量锁信息:
channel: (0000000B761D5F68) scumnt mount lock
后来查询日志发现:
2019-01-30 00:50:24时,有锁出现 :
sql::DELETE FROM XXX WHERE XXX<=TO_CHAR(SYSDATE-30,'YYYYMMDD')||' 0000000' AND ROWNUM<1001复制
有大量锁表:XXX
接着有XXXX表,用户FTRPT/sqlplus.exe_程序
XXXXX,XXXXX,一些job等进程锁,越来越多!造成连锁反映!
详细日志如下:
Wed Jan 30 01:02:02 China Standard Time 2019 Errors in file d:\oracle\product\10.2.0\admin\\bdump\_mmon_4704.trc: ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 4688' Wed Jan 30 01:02:02 China Standard Time 2019 System State dumped to trace file d:\oracle\product\10.2.0\admin\\bdump\_mmon_4704.trc Killing enqueue blocker (pid=4688) on resource CF-00000000-00000000 by killing session 162.1 Killing enqueue blocker (pid=4688) on resource CF-00000000-00000000 by terminating the process MMON: terminating instance due to error 2103 Wed Jan 30 01:12:05 China Standard Time 2019 USER: terminating instance due to error 1092 Wed Jan 30 01:12:05 China Standard Time 2019 LNS1: terminating instance due to error 2103 Wed Jan 30 01:12:05 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:12:05 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:12:05 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:12:05 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:12:05 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:12:05 China Standard Time 2019 DBW2: terminating instance due to error 2103 Wed Jan 30 01:12:05 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:12:06 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:12:06 China Standard Time 2019 MMAN: terminating instance due to error 2103 Wed Jan 30 01:12:06 China Standard Time 2019 MMNL: terminating instance due to error 2103 Wed Jan 30 01:12:06 China Standard Time 2019 ARC1: terminating instance due to error 2103 Wed Jan 30 01:12:06 China Standard Time 2019 PMON: terminating instance due to error 2103 Wed Jan 30 01:12:07 China Standard Time 2019 ARC0: terminating instance due to error 2103 Wed Jan 30 01:12:07 China Standard Time 2019 PSP0: terminating instance due to error 2103 Wed Jan 30 01:12:07 China Standard Time 2019 RVWR: terminating instance due to error 2103 Wed Jan 30 01:12:08 China Standard Time 2019 LGWR: terminating instance due to error 2103 Wed Jan 30 01:12:08 China Standard Time 2019 DBW1: terminating instance due to error 2103 Wed Jan 30 01:12:09 China Standard Time 2019 CJQ0: terminating instance due to error 2103 Wed Jan 30 01:12:10 China Standard Time 2019 QMNC: terminating instance due to error 2103 Wed Jan 30 01:12:15 2019 Warning: skgmdetach - Unable to register unmap, error 4210 Wed Jan 30 01:12:28 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:12:42 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:13:02 China Standard Time 2019 USER: terminating instance due to error 2103 Wed Jan 30 01:14:09 China Standard Time 2019 SMON: terminating instance due to error 2103 Wed Jan 30 01:17:05 China Standard Time 2019复制
专家解答
根源还是CF的enq,请先排查下闪回区的使用情况,之前一直报db_recovery_file_dest_size使用空间超过96%,就算没到100%,Oracle在自动清理闪回日志的时候也会出现问题。
4688为CKPT进程,也是直接执行CF enq的进程,等待时间939秒,且最后在DPR 89号数据文件时
O/S info: user: Oracle, term:, ospid: 4688 OSD pid info: Windows thread id: 4688, image: ORACLE.EXE (CKPT) waiting for 'direct path read' wait_time=0, seconds since wait started=939 file number=89, first dba=1, block cnt=1 blocking sess=0x0000000000000000 seq=10340 Dumping Session Wait History for 'control file sequential read' count=1 wait_time=0.000075 sec file#=0, block#=1d, blocks=1 for 'control file sequential read' count=1 wait_time=0.000040 sec file#=0, block#=13, blocks=1 for 'control file sequential read' count=1 wait_time=0.000043 sec file#=0, block#=12, blocks=1 for 'control file sequential read' count=1 wait_time=0.000066 sec file#=0, block#=10, blocks=1 for 'control file sequential read' count=1 wait_time=0.000043 sec file#=2, block#=1, blocks=1 for 'control file sequential read' count=1 wait_time=0.000065 sec file#=1, block#=1, blocks=1 for 'control file sequential read' count=1 wait_time=0.000091 sec file#=0, block#=1, blocks=1 for 'rdbms ipc message' count=1 wait_time=3.009785 sec timeout=12c, =0, =0 for 'control file sequential read' count=1 wait_time=0.000053 sec file#=0, block#=3, blocks=1 for 'control file parallel write' count=1 wait_time=0.000158 sec files=3, block#=3, requests=3复制
rman进程,等待时间为945秒
O/S info: user: Oracle, term:, ospid: 5064:15840, machine:\ program: rman.exe client info: rman channel=D2 application name: backup full datafile, hash value=2287030075 action name: 0000038 STARTED16, hash value=745096320 waiting for 'RMAN backup & recovery I/O' wait_time=0, seconds since wait started=945复制
同时还有闪回区清理进程
OSD pid info: Windows thread id: 4760, image: ORACLE.EXE (RVWR) ---------------------------------------- SO: 0000000B5B0CE710, type: 6, owner: 0000000B7610A658, flag: INIT/-/-/0x00 (FIB) flags=512 reference cnt=1 incno=31385 seqno=1 fname=D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\\FLASHBACK\O1_MF_DPYFMDDD_.FLB fno=126 lblksz=8192 fsiz=16105 (FOB) flags=10 fib=0000000B5B0CE710 incno=31385 pending i/o cnt=0 fname=D:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\\FLASHBACK\O1_MF_DPYFMDDD_.FLB fno=126 lblksz=8192 fsiz=16105 (FOB) flags=2 fib=0000000B5B0CE360 incno=0 pending i/o cnt=0 fname=F:\ORADATA\\CONTROL03.CTL fno=2 lblksz=16384 fsiz=1812 (FOB) flags=2 fib=0000000B5B0CDFB0 incno=0 pending i/o cnt=0 fname=E:\ORADATA\\CONTROL02.CTL fno=1 lblksz=16384 fsiz=1812 (FOB) flags=2 fib=0000000B5B0CDC00 incno=0 pending i/o cnt=0 fname=F:\ORADATA\\CONTROL01.CTL fno=0 lblksz=16384 fsiz=1812复制
这个问题一般出现在空间不足和资源不足的情况:
1、首先清理闪回空间,或者增大闪回区,避免自动清理的BUG,还可以考虑关闭闪回,在ADG端开启,减少主库I/O压力
Bug 6976005 - ORA-494 CF enqueue held for longer than needed with flashback enabled (Doc ID 6976005.8)
2、查看当时I/O资源使用情况,当时启动了RMAN备份,是否出现了I/O延迟情况
评论
