大家好,这里是公众号 DBA学习之路,分享一些学习国产数据库路上的知识和经验。
目录
前言
今天检查一套 Oracle RAC 12.2.0.1 数据库,检查集群状态时,发现集群命令一直夯着没反应:
经过一顿分析,终于解决问题,比较简单,这里分享一下处理过程。
问题分析
首先怀疑是集群资源挂了,查看集群资源,发现 ora.crsd
挂了:
## 节点一
[grid@lucifer1 ~]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE mesdb0 Started,STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE mesdb0 STABLE
ora.crf
1 ONLINE ONLINE mesdb0 STABLE
ora.crsd
1 ONLINE OFFLINE STABLE
ora.cssd
1 ONLINE ONLINE mesdb0 STABLE
ora.cssdmonitor
1 ONLINE ONLINE mesdb0 STABLE
ora.ctssd
1 ONLINE ONLINE mesdb0 OBSERVER,STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.evmd
1 ONLINE ONLINE mesdb0 STABLE
ora.gipcd
1 ONLINE ONLINE mesdb0 STABLE
ora.gpnpd
1 ONLINE ONLINE mesdb0 STABLE
ora.mdnsd
1 ONLINE ONLINE mesdb0 STABLE
ora.storage
1 ONLINE ONLINE mesdb0 STABLE
--------------------------------------------------------------------------------
## 节点二
[grid@lucifer2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
复制
检查 crs alert.log 日志:
2025-02-24 06:11:42.105 [ORAROOTAGENT(29459)]CRS-8500: Oracle Clusterware ORAROOTAGENT 进程以操作系统进程 ID 29459 开头
2025-02-24 06:12:42.142 [ORAROOTAGENT(29459)]CRS-5818: 已中止命令 'check' (对于资源 'ora.crsd')。详细资料见 (:CRSAGF00113:) {0:15:2} (位于 /oracle/app/grid/diag/crs/mesdb0/crs/trace/ohasd_orarootagent_root.trc)。
2025-02-24 06:13:20.260 [CRSD(30357)]CRS-8500: Oracle Clusterware CRSD 进程以操作系统进程 ID 30357 开头
2025-02-24 06:13:22.541 [CRSD(30357)]CRS-1019: 主机 mesdb0 上的 OCR 服务已退出。详细资料见 /oracle/app/grid/diag/crs/mesdb0/crs/trace/crsd.trc
2025-02-24T06:13:22.563713+08:00
Errors in file /oracle/app/grid/diag/crs/mesdb0/crs/trace/crsd.trc (incident=41):
CRS-1019 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /oracle/app/grid/diag/crs/mesdb0/crs/incident/incdir_41/crsd_i41.trc
2025-02-24 06:13:22.584 [CRSD(30357)]CRS-8505: Oracle Clusterware CRSD 进程 (具有操作系统进程 ID 30357) 遇到内部错误 CRS-01019
复制
检查 crsd 日志:
2025-02-24 06:13:22.514 : OCRMSG:3187623680: prom_listen: Port str [a0f4-81a3-c06c-03aa]
2025-02-24 06:13:22.514 : OCRSRV:3187623680: proath_listen: listening to remote requests at portstr [a0f4-81a3-c06c-03aa]
2025-02-24 06:13:22.518 : OCRMSG:3168728832: prom_listen: Port str [ab1d-0688-2d30-7387]
2025-02-24 06:13:22.518 : OCRSRV:3168728832: th_invalidate_cache: listening to cache_invalidation requests at portstr [ab1d-0688-2d30-7387]
2025-02-24 06:13:22.522 : OCRMSG:3166627584: prom_listen: Port str [c71c-c1a3-dc88-994f]
2025-02-24 06:13:22.522 : OCRSRV:3166627584: proath_listen: listening to remote rim requests at portstr [c71c-c1a3-dc88-994f]
2025-02-24 06:13:22.533 : OCRMAS:3164526336: th_calc_av: Configured Active Patch Level [0]
2025-02-24 06:13:22.533 : OCRMAS:3164526336: th_calc_av:5'': Return persisted APL [0]
OCRMAS:3164526336: th_calc_av:5': Return persisted AV [203424000] [12.2.0.1.0]
2025-02-24 06:13:22.535 : OCRMAS:3164526336: th_master_prereg: Persistent upgrade state retrieved from OCR is [0].
2025-02-24 06:13:22.537 : OCRMAS:3164526336: th_master_prereg: Persistent upgrade toversion buffer retrieved from OCR is [12.2.0.1.0]. Setting toversion to [203424000].
2025-02-24 06:13:22.541 : CSSCLNT:3164526336: clssgsGroupJoin: member in use group(1/ocrlocal)
2025-02-24 06:13:22.541 : default:3164526336: procr_reg_localgrp: Error [14] from clssgsreglocalgrp(). Return [23].
2025-02-24 06:13:22.541 : default:3164526336: SLOS : [clsuSlosFormatDiag called with non-error slos.]
2025-02-24 06:13:22.541 : OCRMAS:3164526336: th_master_register: Failed to register in OCRLOCAL group. Retval:[23]
2025-02-24 06:13:22.541 : OCRAPI:3164526336: procr_ctx_set_invalid: ctx is in state [6].
2025-02-24 06:13:22.541 : OCRAPI:3164526336: procr_ctx_set_invalid: ctx set to invalid
Trace file /oracle/app/grid/diag/crs/mesdb0/crs/trace/crsd.trc
Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved.
DDE: Flood control is not active
2025-02-24T06:13:22.564565+08:00
Incident 41 created, dump file: /oracle/app/grid/diag/crs/mesdb0/crs/incident/incdir_41/crsd_i41.trc
CRS-1019 [] [] [] [] [] [] [] [] [] [] [] []
2025-02-24 06:13:22.706 : OCRAPI:3164526336: procr_ctx_set_invalid: Aborting...
Trace file /oracle/app/grid/diag/crs/mesdb0/crs/trace/crsd.trc
Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved.
default:2552033344: 1: clskec:has:CLSU:910 4 args[CLSD00302][mod=clsdadr.c][loc=(:CLSD00302:)][msg=clsdAdrInit: Trace file size and number of segments fetched from environemnt variable: ORA_DAEMON_TRACE_FILE_OPTIONS filesize=26214400,numsegments=10]
CLSB:2552033344: Argument count (argc) for this daemon is 2
CLSB:2552033344: Argument 0 is: /oracle/app/12.2.0/grid/bin/crsd.bin
CLSB:2552033344: Argument 1 is: reboot
2025-02-24 06:13:22.829 : CSSCLNT:2552033344: clsssinit: initialized context: (0x4edf930) flags 0x207
2025-02-24 06:13:22.829 : CRSMAIN:2552033344: First attempt: init CSS context succeeded.
2025-02-24 06:13:22.829 : CRSMAIN:2552033344: Start mode: normal
2025-02-24 06:13:22.831 : CLSDMT:2343307008: PID for the Process [30402], connkey CRSD
2025-02-24 06:13:23.745 : CRSMAIN:2552033344: CRS Daemon Starting
2025-02-24 06:13:23.745 : CRSMAIN:2343307008: Process environment is not initialized yet!
2025-02-24 06:13:23.746 : CRSD:2552033344: Logging level for Module: clsdadr 0
2025-02-24 06:13:23.746 : CRSD:2552033344: Logging level for Module: clsdnreg 0
2025-02-24 06:13:23.746 : CRSD:2552033344: Logging level for Module: clsdynam 0
复制
查看 trace dump 日志:
----- Invocation Context Dump -----
Address: 0x7f1a9c024340
Phase: 3
flags: 0x10E0000
Incident ID: 41
Error Descriptor: CRS-1019 [] [] [] [] [] [] [] [] [] [] [] []
Error class: 0
Problem Key # of args: 0
Number of actions: 10
----- Incident Context Dump -----
Address: 0x7f1abc9d99d0
Incident ID: 41
Problem Key: CRS 1019
Error: CRS-1019 [] [] [] [] [] [] [] [] [] [] [] []
[00]: dbgePostErrorDirectVaList_int [diag_dde]
[01]: dbgePostErrorDirect [diag_dde]
[02]: clsdAdrPostError []
[03]: clsdadrpr_CreateIncidentCheck []
[04]: clsdadrprAlert []
[05]: clsd_alertprintft []
[06]: proath_master_exit_helper []<-- Signaling
[07]: proath_master_register []
[08]: proath_master []
[09]: start_thread []
MD [00]: 'Client ProcId'='crsd.bin@mesdb0.30357_139752810403584' (0x0)
Impact 0:
Impact 1:
Impact 2:
Impact 3:
Derived Impact:
----- END Incident Context Dump -----
复制
看着很像是 BUG,在 MOS 搜索后发现有一个文章很匹配:
MOS 截图如下:
MOS 内容与问题日志完全一致,确认是 BUG,需要进行补丁修复:
解决问题
下载 BUG 所需补丁 Patch 24396050: LNX64-12.2-CRS: CRSD.BIN FAILED SEVERAL TIMES WITH ERROR CRS-1019
更新 OPatch
查看补丁 README:You must use the OPatch utility version 12.2.0.1.5 or later to apply this patch.
检查当前 OPatch 补丁是否符合要求:
[grid@mesdb0 ~]$ cd $ORACLE_HOME/OPatch/
[grid@mesdb0 OPatch]$ ./opatch version
OPatch Version: 12.2.0.1.6
OPatch succeeded.
复制
符合要求,不需要更新 OPatch。
解压补丁
## root 执行
unzip -q /soft/p24396050_122010_Linux-x86-64.zip -d /soft/
chown -R oracle:oinstall /soft/24396050
复制
安装补丁
## root 执行
export GI_HOME=/oracle/app/12.2.0/grid
## 安装前检查
$GI_HOME/OPatch/opatchauto apply /soft/24396050 -analyze
## 安装补丁
$GI_HOME/OPatch/opatchauto apply /soft/24396050 -oh $GI_HOME
复制
安装补丁后重启系统验证集群已经恢复正常。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
语言是有能量的。我们所说的每一句正能量的话,都像一阵无形的春风,滋养着我们的内心,让人变得更加积极向上。
1天前

评论
Oracle RAC 触发 BUG,crsd 无限重启!
3天前

评论
通常打GI补丁,也是先停用数据库服务和集群的吧。
4天前

评论
三哥,你这个GI补丁,是所有节点都要打,还是只打了故障节点?
4天前

2
相关阅读
【专家有话说第五期】在不同年龄段,DBA应该怎样规划自己的职业发展?
墨天轮编辑部
1216次阅读
2025-03-13 11:40:53
Oracle RAC ASM 磁盘组满了,无法扩容怎么在线处理?
Lucifer三思而后行
739次阅读
2025-03-17 11:33:53
Oracle+Deepseek+Dify 实现数据库数据实时分析
bicewow
647次阅读
2025-03-06 09:41:49
【ORACLE】ORACLE19C在19.13版本前的一个严重BUG-24761824
DarkAthena
541次阅读
2025-03-04 14:33:31
Oracle避坑指南|同名表导出难题:如何精准排除指定用户下的表?
szrsu
502次阅读
2025-03-05 00:42:34
Ogg23ai高手必看-MySQL Innodb Cluster跟oracle的亲密接触
曹海峰
437次阅读
2025-03-04 21:56:13
2月“墨力原创作者计划”获奖名单公布
墨天轮编辑部
436次阅读
2025-03-13 14:38:19
【ORACLE】char类型和sql优化器发生的“错误”反应
DarkAthena
398次阅读
2025-03-04 23:05:01
什么,oracle 主机用户被删了?原来是虚惊一场!
Lucifer三思而后行
393次阅读
2025-03-03 21:12:09
Oracle 如何修改 db_unique_name?强迫症福音!
Lucifer三思而后行
322次阅读
2025-03-12 21:27:56