暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle OCR/VF磁盘组故障恢复

原创 DBhanG 2020-07-23
2076

OCR/VF磁盘组故障的恢复:


#### 备份准备:(采用手工导出方式)
[root@rac1 ~]# /oracle/grid/crs_1/bin/ocrconfig -export ocr_export

[root@rac1 ~]# ll ocr_export 
-rw------- 1 root root 123903 Jul 22 17:43 ocr_export



集群正常运行查看信息

[grid@rac1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
	 Version                  :          3
	 Total space (kbytes)     :     262120
	 Used space (kbytes)      :       3240
	 Available space (kbytes) :     258880
	 ID                       : 1617222518
	 Device/File Name         :       +OCR



[grid@rac1 ~]$ crsctl query css votedisk;

\## STATE  File Universal Id        File Name Disk group

 \1. ONLINE  17e2d627b8c84fabbffc1f533d533d0d (/dev/asm-disk2) [OCR]

 \2. ONLINE  a8ef977544d64f0dbffab877ec75e1dd (/dev/asm-disk3) [OCR]

 \3. ONLINE  2fb3c679b7b64f8abf43887b3fbff81d (/dev/asm-disk4) [OCR]





#### 故障模拟:

将asm-disk2 asm-disk3 asm-disk4破坏。

[root@rac1 ~]# dd if=/dev/zero of=/dev/asm-disk2 bs=1024 count=1024;
[root@rac1 ~]# dd if=/dev/zero of=/dev/asm-disk3 bs=1024 count=1024;
[root@rac1 ~]# dd if=/dev/zero of=/dev/asm-disk4 bs=1024 count=1024;



[grid@rac1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online



[grid@rac1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
	 Version                  :          3
	 Total space (kbytes)     :     262120
	 Used space (kbytes)      :       3240
	 Available space (kbytes) :     258880
	 ID                       : 1617222518
	 Device/File Name         :       +OCR

此时还没有异常,我们重启集群。

[root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl stop crs

[root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl stop crs

再次启动:(此时无法启动)

[root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl start crs

[root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl start crs



集群状态如下:(仅有OHASD进程启动)

[root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager


复制

日志信息:(集群警告日志信息)$ORACLE_HOME/log/hostname/alert.log
04d1ed45414e65d4a9413aaebee335b.png

CSS日志输出(CRSD日志无异常输出)
009d95acb1e83fe39cc103a26f75a02.png

从日志中可以看出,无法发现VF,问题很明显。



尝试使用ocrcheck检查ocr状态

[grid@rac1 ~]$ ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
ORA-29701: unable to connect to Cluster Synchronization Service



[grid@rac1 ~]$ crsctl query  css votedisk

//无返回值



**开始恢复:**

恢复思路:

如果直接使用import进行恢复:

[root@rac1 ~]# /oracle/grid/crs_1/bin/ocrconfig -import ocr_export 
PROT-1: Failed to initialize ocrconfig
PROC-26: Error while accessing the physical storage
ORA-29701: unable to connect to Cluster Synchronization Service

可以看出此时是无法直接使用ocrconfig -import 恢复OCR,因为OCR磁盘组已经不存在,我们dd清空了磁盘,那么磁盘组信息也就会消失,此时就需要我们手动新创建一个OCR磁盘组,然后再使用ocrconfig import  向磁盘组中导入OCR。由于创建一个OCR磁盘组,需要启动ASM实例,但是由于集群故障,ASM实例无法启动,所以我们采用独占模式启动集群,这样CRS不会启动,但是ASM实例可以启动成功。



关闭集群,以独占模式启动集群:

由于OHASD进程已经启动,所以需要强制关闭OHASD进程

[root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'

[root@rac2 ~]#  /oracle/grid/crs_1/bin/crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'

发现两节点hang住,无法强制关闭OHASD进程



准备重启操作系统,重启前要关闭集群自启动,否则操作系统重启后,集群会自动启动OHASD进程。

[root@rac1 ~]#/oracle/grid/crs_1/bin/crsctl disable has

[root@rac1 ~]#/oracle/grid/crs_1/bin/crsctl disable crs--关闭开机重启

重启操作系统后查看集群状态:

[grid@rac1 ~]$ crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services



**以独占模式启动集群:**

[root@rac1 ~]#/oracle/grid/crs_1/bin/crsctl start crs -excl -nocrs

[root@rac2 ~]#/oracle/grid/crs_1/bin/crsctl start crs -excl -nocrs

输入如下:

CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'rac1'
CRS-2681: Clean of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded



连接ASM实例:

[grid@rac1 ~]$ sqlplus / as sysasm 

查看当前磁盘组信息:

SQL> select name,state from v$asm_diskgroup;

NAME			       STATE

------------------------------ -----------

DATA			       MOUNTED



**创建磁盘组:**

SQL>create diskgroup ocr normal redundancy DISK '/dev/asm-disk2' ,'/dev/asm-disk3' ,'/dev/asm-disk4' ATTRIBUTE 'compatible.asm'='11.2.0.0.0';  

Diskgroup created.

**//compatible.asm'='11.2.0.0.0'这个参数值一定要附加上,否则后续还需要修改,默认创建为10.0.0.0。**

再次查看:

SQL> select name,state from v$asm_diskgroup;

NAME			       STATE

------------------------------ -----------

DATA			       MOUNTED
OCR			       MOUNTED



**使用ocrconfig进行恢复:**

[root@rac1 ~]# /oracle/grid/crs_1/bin/ocrconfig -import ocr_export

恢复完成后,执行ocrcheck

[root@rac1 ~]# /oracle/grid/crs_1/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
	 Version                  :          3
	 Total space (kbytes)     :     262120
	 Used space (kbytes)      :       3240
	 Available space (kbytes) :     258880
	 ID                       :  785902757
	 Device/File Name         :       +OCR



OCR恢复成功。

**恢复VF:**

[root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl replace votedisk +OCR

Successful addition of voting disk 5d0d201e0ab24f66bf24bfd4a88f2f30.
Successful addition of voting disk 9520d9d3ab8d4fefbfe5d05b62dac9cf.
Successful addition of voting disk 361f26ddd0b34feabfeba6a1123533d7.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced



查看VF信息:

[grid@rac1 ~]$ crsctl query css votedisk

\## STATE  File Universal Id        File Name Disk group

 \1. ONLINE  5d0d201e0ab24f66bf24bfd4a88f2f30 (/dev/asm-disk2) [OCR]

 \2. ONLINE  9520d9d3ab8d4fefbfe5d05b62dac9cf (/dev/asm-disk3) [OCR]

 \3. ONLINE  361f26ddd0b34feabfeba6a1123533d7 (/dev/asm-disk4) [OCR]



停止独占模式运行的clusterware

[root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl stop crs -f

[root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl stop crs -f

所有节点正常启动crs

[root@rac1 ~]# /oracle/grid/crs_1/bin/crsctl start crs

[root@rac2 ~]# /oracle/grid/crs_1/bin/crsctl start crs



查看资源信息无异常。

复制
最后修改时间:2020-07-23 00:46:54
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论