OCR/VF磁盘组故障的恢复:
#### 备份准备:(采用手工导出方式) [root@rac1 ~]# /oracle/grid/crs_1/bin/ocrconfig -export ocr_export [root@rac1 ~]# ll ocr_export -rw------- 1 root root 123903 Jul 22 17:43 ocr_export 集群正常运行查看信息 [grid@rac1 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 3240 Available space (kbytes) : 258880 ID : 1617222518 Device/File Name : +OCR [grid@rac1 ~]$ crsctl query css votedisk; \## STATE File Universal Id File Name Disk group \1. ONLINE 17e2d627b8c84fabbffc1f533d533d0d (/dev/asm-disk2) [OCR] \2. ONLINE a8ef977544d64f0dbffab877ec75e1dd (/dev/asm-disk3) [OCR] \3. ONLINE 2fb3c679b7b64f8abf43887b3fbff81d (/dev/asm-disk4) [OCR] #### 故障模拟: 将asm-disk2 asm-disk3 asm-disk4破坏。 [root@rac1 ~]# dd if=/dev/zero of=/dev/asm-disk2 bs=1024 count=1024; [root@rac1 ~]# dd if=/dev/zero of=/dev/asm-disk3 bs=1024 count=1024; [root@rac1 ~]# dd if=/dev/zero of=/dev/asm-disk4 bs=1024 count=1024; [grid@rac1 ~]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [grid@rac1 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 3240 Available space (kbytes) : 258880 ID : 1617222518 Device/File Name : +OCR 此时还没有异常,我们重启集群。 [root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl stop crs [root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl stop crs 再次启动:(此时无法启动) [root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl start crs [root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl start crs 集群状态如下:(仅有OHASD进程启动) [root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager
复制
日志信息:(集群警告日志信息)$ORACLE_HOME/log/hostname/alert
CSS日志输出(CRSD日志无异常输出)
从日志中可以看出,无法发现VF,问题很明显。
尝试使用ocrcheck检查ocr状态 [grid@rac1 ~]$ ocrcheck PROT-602: Failed to retrieve data from the cluster registry PROC-26: Error while accessing the physical storage ORA-15077: could not locate ASM instance serving a required diskgroup ORA-29701: unable to connect to Cluster Synchronization Service [grid@rac1 ~]$ crsctl query css votedisk //无返回值 **开始恢复:** 恢复思路: 如果直接使用import进行恢复: [root@rac1 ~]# /oracle/grid/crs_1/bin/ocrconfig -import ocr_export PROT-1: Failed to initialize ocrconfig PROC-26: Error while accessing the physical storage ORA-29701: unable to connect to Cluster Synchronization Service 可以看出此时是无法直接使用ocrconfig -import 恢复OCR,因为OCR磁盘组已经不存在,我们dd清空了磁盘,那么磁盘组信息也就会消失,此时就需要我们手动新创建一个OCR磁盘组,然后再使用ocrconfig import 向磁盘组中导入OCR。由于创建一个OCR磁盘组,需要启动ASM实例,但是由于集群故障,ASM实例无法启动,所以我们采用独占模式启动集群,这样CRS不会启动,但是ASM实例可以启动成功。 关闭集群,以独占模式启动集群: 由于OHASD进程已经启动,所以需要强制关闭OHASD进程 [root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl stop has -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1' [root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl stop has -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2' 发现两节点hang住,无法强制关闭OHASD进程 准备重启操作系统,重启前要关闭集群自启动,否则操作系统重启后,集群会自动启动OHASD进程。 [root@rac1 ~]#/oracle/grid/crs_1/bin/crsctl disable has [root@rac1 ~]#/oracle/grid/crs_1/bin/crsctl disable crs--关闭开机重启 重启操作系统后查看集群状态: [grid@rac1 ~]$ crsctl check crs CRS-4639: Could not contact Oracle High Availability Services **以独占模式启动集群:** [root@rac1 ~]#/oracle/grid/crs_1/bin/crsctl start crs -excl -nocrs [root@rac2 ~]#/oracle/grid/crs_1/bin/crsctl start crs -excl -nocrs 输入如下: CRS-4123: Oracle High Availability Services has been started. CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1' CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1' CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1' CRS-2672: Attempting to start 'ora.gipcd' on 'rac1' CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rac1' CRS-2672: Attempting to start 'ora.diskmon' on 'rac1' CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1' CRS-2672: Attempting to start 'ora.ctssd' on 'rac1' CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1' CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2679: Attempting to clean 'ora.asm' on 'rac1' CRS-2681: Clean of 'ora.asm' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.asm' on 'rac1' CRS-2676: Start of 'ora.asm' on 'rac1' succeeded 连接ASM实例: [grid@rac1 ~]$ sqlplus / as sysasm 查看当前磁盘组信息: SQL> select name,state from v$asm_diskgroup; NAME STATE ------------------------------ ----------- DATA MOUNTED **创建磁盘组:** SQL>create diskgroup ocr normal redundancy DISK '/dev/asm-disk2' ,'/dev/asm-disk3' ,'/dev/asm-disk4' ATTRIBUTE 'compatible.asm'='11.2.0.0.0'; Diskgroup created. **//compatible.asm'='11.2.0.0.0'这个参数值一定要附加上,否则后续还需要修改,默认创建为10.0.0.0。** 再次查看: SQL> select name,state from v$asm_diskgroup; NAME STATE ------------------------------ ----------- DATA MOUNTED OCR MOUNTED **使用ocrconfig进行恢复:** [root@rac1 ~]# /oracle/grid/crs_1/bin/ocrconfig -import ocr_export 恢复完成后,执行ocrcheck [root@rac1 ~]# /oracle/grid/crs_1/bin/ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 3240 Available space (kbytes) : 258880 ID : 785902757 Device/File Name : +OCR OCR恢复成功。 **恢复VF:** [root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl replace votedisk +OCR Successful addition of voting disk 5d0d201e0ab24f66bf24bfd4a88f2f30. Successful addition of voting disk 9520d9d3ab8d4fefbfe5d05b62dac9cf. Successful addition of voting disk 361f26ddd0b34feabfeba6a1123533d7. Successfully replaced voting disk group with +OCR. CRS-4266: Voting file(s) successfully replaced 查看VF信息: [grid@rac1 ~]$ crsctl query css votedisk \## STATE File Universal Id File Name Disk group \1. ONLINE 5d0d201e0ab24f66bf24bfd4a88f2f30 (/dev/asm-disk2) [OCR] \2. ONLINE 9520d9d3ab8d4fefbfe5d05b62dac9cf (/dev/asm-disk3) [OCR] \3. ONLINE 361f26ddd0b34feabfeba6a1123533d7 (/dev/asm-disk4) [OCR] 停止独占模式运行的clusterware [root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl stop crs -f [root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl stop crs -f 所有节点正常启动crs [root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl start crs [root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl start crs 查看资源信息无异常。
复制
最后修改时间:2020-07-23 00:46:54
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。