问题描述
dg备库,asm实例的gmon进程trace日志文件一直频繁写入,trace文件已经达到200多G。
问题分析
GMON进程功能
- GMON - ASM Disk Group Monitor Process
GMON monitors all the disk groups mounted in an ASM instance
It is responsible for maintaining consistent disk membership and status information. It updates partnership and status table
Membership changes result from adding and dropping disks, whereas disk status changes result from taking disks offline or bringing them online.
GMON进程负责维护一致的磁盘membership和status信息。它更新partnership和status table。Membership的变化是由于添加和删除磁盘,而磁盘状态的变化是由于将磁盘脱机或上线。
报错日志内容
gmon日志内容
=============== PST ==================== grpNum: 1 state: 1 callCnt: 73 (lockvalue) valid=1 ver=1.1 ndisks=3 flags=0x0 from inst=1 (I am 1) last=1832628233 (lockvalue) dsks: 0 1 2 --------------- HDR -------------------- next: 1832628233 last: 1832628233 pst count: 3 pst locations: 0 1 2 incarn: 1 dta size: 3 version: 1 ASM version: 186646528 = 11.2.0.0.0 contenttype: 1 partnering pattern: [ ] --------------- LOC MAP ---------------- 0: dirty 0 cur_loc: 0 stable_loc: 0 1: dirty 0 cur_loc: 1 stable_loc: 1 --------------- DTA -------------------- 0: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 2501843421 parts: 1 (amp) 2 (amp) 1: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2501843421 parts: 0 (amp) 2 (amp) 2: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2501843421 parts: 0 (amp) 1 (amp) --------------- HBEAT ------------------ kfdpHbeat_dump: state=2, inst=0, ts=0.0, rnd=0.0.0.0. kfk io-queue: (nil) InvalLck (group 1) force released InvalLck (group 1) re-acquired in S NOTE: rewrite PST set in kfdp_read: hasReadErrs 1 hasSupport 1 SupportSet Sz 3 HdrQurmSet Sz 2 NOTE: Require PST rewrite for grp 1, retry in X. NOTE: rewrite PST set in kfdp_read: hasReadErrs 1 hasSupport 1 SupportSet Sz 3 HdrQurmSet Sz 2 InvalLck (group 1) upgraded to X InvalLck (group 1) downgraded to S POST res = 1 =============== PST ====================
复制
asm alert日志内容
Mon Jul 17 16:09:02 2023 SQL> alter diskgroup CRSDG check /* proxy */ NOTE: starting check of diskgroup CRSDG Mon Jul 17 16:09:02 2023 GMON checking disk 0 for group 1 at 587742 for pid 27, osid 44173 GMON checking disk 1 for group 1 at 587743 for pid 27, osid 44173 GMON checking disk 2 for group 1 at 587744 for pid 27, osid 44173 SUCCESS: check of diskgroup CRSDG found no errors SUCCESS: alter diskgroup CRSDG check /* proxy */ Mon Jul 17 16:09:14 2023 SQL> alter diskgroup CRSDG check /* proxy */ NOTE: starting check of diskgroup CRSDG Mon Jul 17 16:09:15 2023 GMON checking disk 0 for group 1 at 587745 for pid 27, osid 45830 GMON checking disk 1 for group 1 at 587746 for pid 27, osid 45830 GMON checking disk 2 for group 1 at 587747 for pid 27, osid 45830 SUCCESS: check of diskgroup CRSDG found no errors SUCCESS: alter diskgroup CRSDG check /* proxy */
复制
gmon进程一直对磁盘组1(CRSDG)进行check,每次check,gmon日志都会输出大量内容,check显示是没有errors。但是gmon日志显示hasReadErrs,说明CRSDG磁盘组存在错误,但是未影响到集群运行。
寻找第一次报错发生时间点
发现第一次出现磁盘组频繁check是在2022年11月8日,而同时磁盘组出现了ORA-15196报错。
这个报错提示:CRSDG中CRS2磁盘存在无效的asm块头,asm自动从CRS3磁盘中获取正确的块进行修复,显示修复成功,但是实际上因为某种原因并未修复成功,导致一直频繁写入报错日志。
Tue Nov 08 20:49:26 2022 SQL> alter diskgroup CRSDG check /* proxy */ NOTE: starting check of diskgroup CRSDG WARNING: cache read a corrupt block: group=1(CRSDG) fn=255 indblk=0 disk=2 (CRS2) incarn=3916021488 au=42 blk=0 count=1 Errors in file /grid_base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_34397.trc: ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [255] [2147483648] [0 != 1] NOTE: a corrupted block from group CRSDG was dumped to /home/app/grid/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_34397.trc WARNING: cache read (retry) a corrupt block: group=1(CRSDG) fn=255 indblk=0 disk=2 (CRS2) incarn=3916021488 au=42 blk=0 count=1 Errors in file /grid_base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_34397.trc: ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [255] [2147483648] [0 != 1] ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [255] [2147483648] [0 != 1] NOTE: cache repaired a corrupt block: group=1(CRSDG) fn=255 indblk=0 on disk 2 from disk=0 (CRS3) incarn=3916021489 au=41 blk=0 count=1 Tue Nov 08 20:49:26 2022 GMON checking disk 0 for group 1 at 15 for pid 28, osid 34397 GMON checking disk 1 for group 1 at 16 for pid 28, osid 34397 GMON checking disk 2 for group 1 at 17 for pid 28, osid 34397 SUCCESS: check of diskgroup CRSDG found no errors SUCCESS: alter diskgroup CRSDG check /* proxy */
复制
解决方案
踢掉有问题的磁盘,重新添加。
alter DISKGROUP CRSDG drop disk 'CRS2'; alter DISKGROUP CRSDG add disk 'ORCL:CRS2' force;
复制
参考文档
ASM Background Processes in 11.2 (Doc ID 1641678.1)
订阅号:DongDB手记
墨天轮:https://www.modb.pro/u/231198
文章被以下合辑收录
评论
