某银行的某系统rac数据库版本19.6,二节点的mmon slave进程一直在报ORA-01000,导致awr、ash等等很多MMON的功能收到了影响。
2021-11-05T14:38:50.244256+08:00 Errors in file /u01/app/oracle/diag/rdbms/rcg02/rcg022/trace/rcg022_m002_65032.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01000: maximum open cursors exceeded ORA-00604: error occurred at recursive SQL level 1 ORA-01000: maximum open cursors exceeded 2021-11-05T14:53:51.739041+08:00 Errors in file /u01/app/oracle/diag/rdbms/rcg02/rcg022/trace/rcg022_m000_64999.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01000: maximum open cursors exceeded ORA-00604: error occurred at recursive SQL level 1
复制
查询v$open_cursor,发现每个m00n的session都open了很多同一个递归sql,cursor type为OPEN-RECURSIVE:
select ts# from ts$ where bitmapped <> 0 and contents$ <> 1 and online$=1 and ts# >= :1 order by ts# asc
复制
并且发现rac的所有节点,就只有2节点的异常
SQL> select a.inst_id,program,ADDRESS,CHILD_ADDRESS,a.sql_id,count(*) from gv$open_cursor a,gv$session b 2 where a.inst_id=b.inst_id and a.sid=b.sid and b.program like '%M00%' 3 group by a.inst_id,program,ADDRESS,CHILD_ADDRESS,a.sql_id having count(*)>10; INST_ID PROGRAM ADDRESS CHILD_ADDRESS SQL_ID COUNT(*) ---------- ------------------------------------------------ ---------------- ---------------- --------------- ---------- 2 oracle@mgmrcgarac02 (M004) 000000063F2A95B0 000000063F2A8188 38hpmnhjywkz4 995 2 oracle@mgmrcgarac02 (M005) 000000063F2A95B0 000000063F2A8188 38hpmnhjywkz4 995 2 oracle@mgmrcgarac02 (M002) 000000063F2A95B0 000000063F2A8188 38hpmnhjywkz4 995 2 oracle@mgmrcgarac02 (M000) 000000063F2A95B0 000000063F2A8188 38hpmnhjywkz4 994 2 oracle@mgmrcgarac02 (M003) 000000063F2A95B0 000000063F2A8188 38hpmnhjywkz4 998 2 oracle@mgmrcgarac02 (M001) 000000063F2A95B0 000000063F2A8188 38hpmnhjywkz4 995
复制
这非常不正常,通常每个session不会一直open同一个递归cursor,怀疑是这些递归sql在运行结束都没有关闭游标。为了更好的分析问题,对M000进程做了一个processstate dump
SQL> oradebug setospid 64999; Oracle pid: 74, Unix process pid: 64999, image: oracle@mgmrcgarac02 (M000) SQL> oradebug dump processstate 266; Statement processed. SQL> oradebug tracefile_name /u01/app/oracle/diag/rdbms/rcg02/rcg022/trace/rcg022_m000_64999.trc
复制
从dump中发现确实同时打开了多个该cursor,单对某个LIBRARY CACHE LOCK进行分析,发现session以NULL模式持有该sql父游标handle的library cache lock,说明已经打开了游标。
SO: 0x61a4b63a8, type: LIBRARY OBJECT LOCK (118), map: 0x448394090 state: LIVE (0x4532), flags: 0x1 owner: 0x86ffdc2f8, proc: 0x86fef8078 link: 0x61a4b63c8[0x61e0f7ae8, 0x61e0ef1e8] child list count: 0, link: 0x61a4b6418[0x61a4b6418, 0x61a4b6418] conid: 1, conuid: 1, SGA version=(1,0), pg: 0 SOC: 0x448394090, type: LIBRARY OBJECT LOCK (118), map: 0x61a4b63a8 state: LIVE (0x99fc), flags: INIT (0x1) LibraryObjectLock: Address=0x448394090 Handle=0x63f2a95b0 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=1 User=0x863b17220 Session=0x863b19a08 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=c21c50 Time=09/02/2021 03:04:59 LibraryHandle: Address=0x63f2a95b0 Hash=23ee4be4 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD ObjectName: Name=select ts# from ts$ where bitmapped <> 0 and contents$ <> 1 and online$=1 and ts# >= :1 order by ts# asc
复制
游标打开时间09/02/2021 03:04:59,两个月前。。。。,并且对应的子游标handle的状态已经失效了
SO: 0x61e0ef1c8, type: LIBRARY OBJECT LOCK (118), map: 0x44b1ca9b0 state: LIVE (0x4532), flags: 0x1 owner: 0x86ffdc2f8, proc: 0x86fef8078 link: 0x61e0ef1e8[0x61a4b63c8, 0x86ffdc368] child list count: 0, link: 0x61e0ef238[0x61e0ef238, 0x61e0ef238] conid: 1, conuid: 1, SGA version=(1,0), pg: 0 SOC: 0x44b1ca9b0, type: LIBRARY OBJECT LOCK (118), map: 0x61e0ef1c8 state: LIVE (0x99fc), flags: INIT (0x1) LibraryObjectLock: Address=0x44b1ca9b0 Handle=0x63f2a8188 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=33565358 Context=0x7f44760c4f80 User=0x863b17220 Session=0x863b19a08 ReferenceCount=1 Flags=BRO/CBK/[0020] SavepointNum=0 Time=09/02/2021 03:04:59 LibraryHandle: Address=0x63f2a8188 Hash=0 LockMode=N PinMode=S LoadLockMode=0 Status=INVL
复制
说明sql早已执行完成,但是游标并未关闭。
解决该问题的方法很简单,就是关闭session,由于m00n进程是mmon的slave进程,可以随便kill,kill之后问题解决。但是并没有找到该问题的原因,MOS上也没有查到相关的案例,个人怀疑是在关闭递归游标的时候出现了异常,导致反复的打开该游标。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
Oracle RAC 一键安装翻车?手把手教你如何排错!
Lucifer三思而后行
563次阅读
2025-04-15 17:24:06
【纯干货】Oracle 19C RU 19.27 发布,如何快速升级和安装?
Lucifer三思而后行
493次阅读
2025-04-18 14:18:38
Oracle SQL 执行计划分析与优化指南
Digital Observer
464次阅读
2025-04-01 11:08:44
XTTS跨版本迁移升级方案(11g to 19c RAC for Linux)
zwtian
455次阅读
2025-04-08 09:12:48
墨天轮个人数说知识点合集
JiekeXu
455次阅读
2025-04-01 15:56:03
【ORACLE】记录一些ORACLE的merge into语句的BUG
DarkAthena
442次阅读
2025-04-22 00:20:37
Oracle数据库一键巡检并生成HTML结果,免费脚本速来下载!
陈举超
430次阅读
2025-04-20 10:07:02
【ORACLE】你以为的真的是你以为的么?--ORA-38104: Columns referenced in the ON Clause cannot be updated
DarkAthena
417次阅读
2025-04-22 00:13:51
Oracle 19c RAC更换IP实战,运维必看!
szrsu
403次阅读
2025-04-08 23:57:08
【活动】分享你的压箱底干货文档,三篇解锁进阶奖励!
墨天轮编辑部
374次阅读
2025-04-17 17:02:24