暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

【备份恢复】数据很重要,ASM磁盘组损坏,使用AMDU来抢救

1676

欢迎关注“数据库运维之道”公众号,一起学习数据库技术!

本期将为大家分享“数据很重要,ASM磁盘组损坏,使用AMDU来抢救”的处置案例。这个案例对个人来说比较经典,下面我将把自己的处理思路进行整理与总结。

环境信息:数据库版本为12.1.0.2.0,操作系统为rhel 6.5。

关键词:ORA-15032、ORA-15335、ORA-15130、ORA-15066、ORA-15196、AMDU-00209

Oracle RAC集群数据库异常关闭,检查数据库告警日志看到实例被GEN0进程终止,接着手动拉起数据库提示DATA磁盘组找不到,然后手动挂载DATA磁盘组提示ASM元数据损坏。接着经过一番排查确认部分ASM数据块损坏,最后使用AMDU把数据抢救回来。

    Instance terminated by GEN0-->实例被GEN0进程终止
    ORA-15001: diskgroup "DATA" does not exist or is not mounted-->DATA磁盘组找不到
    ORA-15335: ASM metadata corruption detected in disk group 'DATA' --> ASM元数据损坏
    AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [138] type [0] --> 数据块损坏
    amdu -diskstring dev/ASMDISK2P3 -extract DATA.258 --> amdu提取数据文件
    复制

    1、数据库集群由两个节点组成,并且部署在vmware虚拟化上面。Oracle RAC集群数据库异常关闭,第一步先查看alert告警日志,以进一步分析报错信息。从日志可以看出实例ASMB相关进程卡死,最后被GEN0进程终止。

      --告警日志 vi u01/app/oracle/diag/rdbms/ywzd/ywzd1/trace/alert_ywzd1.log
      Fri Apr 05 14:46:02 2024
      WARNING: ASMB has not responded for 8016 seconds
      NOTE: ASM umbilicus running slower than expected, ASMB diagnostic requested after 8016 seconds
      Fri Apr 05 14:46:02 2024
      NOTE: ASMB registering with ASM instance as Flex client 0x10004 (reg:3052304389) (reconnect)
      Fri Apr 05 14:46:02 2024
      NOTE: ASMB process state dumped to trace file u01/app/oracle/diag/rdbms/ywzd/ywzd1/trace/ywzd1_gen0_12556.trc
      ERROR: terminating instance because ASMB is stuck for 8016 seconds
      GEN0 (ospid: 12556): terminating the instance due to error 15082
      Fri Apr 05 14:46:02 2024
      System state dump requested by (instance=1, osid=12556 (GEN0)), summary=[abnormal instance termination].
      System State dumped to trace file u01/app/oracle/diag/rdbms/ywzd/ywzd1/trace/ywzd1_diag_12562_20240405144602.trc
      Fri Apr 05 14:46:03 2024
      Dumping diagnostic data in directory=[cdmp_20240405144602], requested by (instance=1, osid=12556 (GEN0)), summary=[abnormal instance termination].
      Fri Apr 05 14:46:03 2024
      Instance terminated by GEN0, pid = 12556


      --跟踪日志 vi u01/app/oracle/diag/rdbms/ywzd/ywzd1/trace/ywzd1_gen0_12556.trc
      WARNING: ASMB has not responded for 8016 seconds
      NOTE: ASM umbilicus running slower than expected, ASMB diagnostic requested after 8016 seconds
      ----- Abridged Call Stack Trace -----
      ksedsts()+244<-kfnDiagASMB()+761<-kfnbHealthCheck()+664<-kfnTimeout()+41<-ksbcti()+524<-ksbabs()+2357<-ksbrdp()+1068<-opirip()+1488<-opidrv()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+253
      ----- End of Abridged Call Stack Trace -----
      复制

      2、为了尽快恢复数据库,尝试手动拉起数据库。但是数据库无法打开DATA磁盘组的pfile文件ORA-17503,并提示磁盘组不存在ORA-15001。

        [root@host03 ~]# su - oracle
        [oracle@host03 ~]$ export ORACLE_SID=ywzd2
        [oracle@host03 ~]$ sqlplus / as sysdba
        SQL*Plus: Release 12.1.0.2.0 Production on Mon Apr 8 17:29:46 2024
        Copyright (c) 19822014, Oracle.  All rights reserved.
        Connected to an idle instance.
        SQL> startup
        ORA-01078: failure in processing system parameters
        ORA-01565: error in identifying file '+DATA/ywzd/spfileywzd.ora'
        ORA-17503: ksfdopn:2 Failed to open file +DATA/ywzd/spfileywzd.ora <-----
        ORA-15056: additional error message
        ORA-17503: ksfdopn:2 Failed to open file +DATA/ywzd/spfileywzd.ora
        ORA-15001: diskgroup "DATA" does not exist or is not mounted <-----
        ORA-06512: at line 4
        SQL> exit
        Disconnected
        复制

        3、接着,尝试手动挂载DATA磁盘组。但是DATA磁盘组挂载失败,提示ASM元数据损坏以及ASM块头数据错误ORA-15335和ORA-15196。这时感觉到后背发凉,这个问题要修复估计不简单。

          [oracle@host04 ~]$ exit
          logout
          [root@host03 ~]# su - grid
          [grid@host03 ~]$ sqlplus as sysasm
          SQL*Plus: Release 12.1.0.2.0 Production on Mon Apr 8 17:30:21 2024
          Copyright (c) 19822014, Oracle.  All rights reserved.
          Connected to:
          Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
          With the Real Application Clusters and Automatic Storage Management options
          SQL> alter diskgroup data mount;
          alter diskgroup data mount
          *
          ERROR at line 1:
          ORA-15032: not all alterations performed
          ORA-15335: ASM metadata corruption detected in disk group 'DATA' <-----
          ORA-15130: diskgroup "DATA" is being dismounted
          ORA-15066: offlining disk "DATA_0000" in group "DATA" may result in a data loss
          ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [3] <-----
          [2147483648] [32 != 1]
          ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [3]
          [2147483648] [32 != 1]


          +ASM1实例日志信息
          Mon Apr 08 09:47:58 2024
          SQL> alter diskgroup data mount
          Mon Apr 08 09:47:58 2024
          NOTE: cache registered group DATA 1/0x89289E84
          NOTE: cache began mount (first) of group DATA 1/0x89289E84
          NOTE: Assigning number (1,0) to disk (/dev/ASMDISK2P3)
          Mon Apr 08 09:48:04 2024
          NOTE: GMON heartbeating for grp 1 (DATA)
          GMON querying group 1 at 25 for pid 40, osid 12051
          Mon Apr 08 09:48:04 2024
          NOTE: cache is mounting group DATA created on 2024/04/04 21:38:57
          NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/ASMDISK2P3
          NOTE: 04/08/24 09:48:04 DATA.F1X0 found on disk 0 au 10 fcn 0.0 datfmt 1
          Mon Apr 08 09:48:04 2024
          NOTE: cache mounting (first) external redundancy group 1/0x89289E84 (DATA)
          Mon Apr 08 09:48:05 2024
          * allocate domain 1, invalid = TRUE
          Mon Apr 08 09:48:05 2024
          NOTE: attached to recovery domain 1
          Mon Apr 08 09:48:05 2024
          NOTE: crash recovery of group DATA will recover thread=1 ckpt=5.14 domain=1 inc#=2 instnum=2
          WARNING: cache read a corrupt block: group=1(DATA) fn=3 indblk=0 disk=0 (DATA_0000) incarn=3915935396 au=71 blk=0 count=1
          Mon Apr 08 09:48:05 2024
          Errors in file u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_12051.trc:
          ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [3] [2147483648] [32 != 1]
          NOTE: a corrupted block from group DATA was dumped to u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_12051.trc
          WARNING: cache read (retry) a corrupt block: group=1(DATA) fn=3 indblk=0 disk=0 (DATA_0000) incarn=3915935396 au=71 blk=0 count=1
          Mon Apr 08 09:48:05 2024
          Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_12051.trc:
          ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [3] [2147483648] [32 != 1]
          ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [3] [2147483648] [32 != 1]
          ERROR: cache failed to read group=1(DATA) fn=3 indblk=0 from disk(s): 0(DATA_0000)
          ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [3] [2147483648] [32 != 1]
          ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [3] [2147483648] [32 != 1]


          NOTE: cache initiating offline of disk 0 group DATA
          NOTE: process _user12051_+asm1 (12051) initiating offline of disk 0.3915935396 (DATA_0000) with mask 0x7e in group 1 (DATA) with client assisting
          NOTE: initiating PST update: grp 1 (DATA), dsk = 0/0xe9686ea4, mask = 0x6a, op = clear
          Mon Apr 08 09:48:05 2024
          GMON updating disk modes for group 1 at 26 for pid 40, osid 12051
          ERROR: disk 0(DATA_0000) in group 1(DATA) cannot be offlined because the disk group has external redundancy.
          Mon Apr 08 09:48:05 2024
          ERROR: too many offline disks in PST (grp 1)
          Mon Apr 08 09:48:05 2024
          NOTE: halting all I/Os to diskgroup 1 (DATA)
          复制

          4、查阅ORA-15196相关解释及解决方案。一般当ASM元数据块(metadata block)被检验存在严重问题时才会触发ORA-15196错误。

            $ oerr ora 15196
            15196, 00000, "invalid ASM block header [%s:%s] [%s] [%s] [%s] [%s != %s]"
            // *Cause: ASM encountered an invalid metadata block.
            // *Action: Contact Oracle Support Services.


            该ORA-15196报错的一般格式是:
            ORA-15196: invalid ASM block header [1st] [2nd] [3rd] [4th] [5th != 6th]
            相关的变量的含义为:
            1st,触发该意外报错的Oracle内核函数的名字以及其在代码中的行数
            2nd,验证发现问题的区域名字
            3rd,存放在该块中的ASM对象号
            4th,存放在该块中的ASM的块号
            5th,2nd指定的区域中实际存放的值
            6th, 2nd指定的区域中应当存放的值


            错误理解
            ORA-15196: invalid ASM block header [kfc.c:29297] [endian_kfbh] [3] [2147483648] [32 != 1]
            kfc.c:29297:说明触发本次ORA-15196错误的函数是kfc.c内核源代码的29297行的代码
            endian_kfbh:验证发现问题的区域名字, 这个endian_kfbh是用来描述endian属性的
            存放在该块中的ASM对象号:3
            存放在该块中的ASM的块号:2147483648
            2nd指定的区域中实际存放的值:32
            2nd指定的区域中应当存放的值:1
            复制

            5、检查ASM磁盘的udev配置信息,都可以正常识别到物理磁盘。

              [root@host03 ~]# cat /etc/udev/rules.d/99-oracle-asmdevices.rules
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c29ff0363b8ebda7309f6f26c8c3', NAME='ASMDISK1P1',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c29811ad6d1353dd6df390513d3f', NAME='ASMDISK1P2',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c299356df1e7ab559e6bb0786514', NAME='ASMDISK1P3',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c2961eaa1e1771da07de9176da77', NAME='ASMDISK1P4',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c298d69a917dded0a30db5c4a6c5', NAME='ASMDISK1P5',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c2964b69f61d560c183c1494e4e8', NAME='ASMDISK1P6',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c294d1dd24f128dc3c06397ef7da', NAME='ASMDISK1P7',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c29084f5fd479b297e4ad4a8ef16', NAME='ASMDISK1P8',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c2979ce2d0d852d62298ed0d1b43', NAME='ASMDISK2P1',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c29f64125d722d2aa28cb17a7e4f', NAME='ASMDISK2P2',OWNER='grid', GROUP='asmadmin', MODE='0660'
              KERNEL=='sd*', BUS=='scsi', PROGRAM=='/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name', RESULT=='36000c29587ebe58f3a35c50073857b54', NAME='ASMDISK2P3',OWNER='grid', GROUP='asmadmin', MODE='0660'
              复制

              6、检查DATA磁盘组对应的ASM磁盘(/dev/ASMDISK2P3)头是否损坏,未看到KFBTYP_INVALID信息。

                [grid@host04 ~]$ let n=256
                [grid@host04 ~]$ for (( i=2; i<$n; i++ )); do kfed read /dev/ASMDISK2P3 blkn=$i | grep kfbh.type; done
                kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL
                kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL
                kfbh.type: 3 ; 0x002: KFBTYP_ALLOCTBL
                复制

                7、使用amdu命令将ASM磁盘信息dump出来,发现大量的AU坏块AMDU-00209: Corrupt block found。

                  amdu -diskstring '/dev/ASMDISK2P3' -dump 'DATA'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [84] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [85] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [86] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [87] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [88] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [89] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [90] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [91] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [92] type [0]
                  AMDU-00201: Disk N0001: '/dev/ASMDISK2P3'
                  AMDU-00209: Corrupt block found: Disk N0001 AU [96] block [93] type [0]
                  复制

                  1、基于上述的排查与处理,最终定位到ASM磁盘有大量的坏块。尝试修复ASM磁盘,并且通过设置参数event 15195 level 604去跳过cod recover,未能把DATA磁盘组挂载起来。

                    SQL> alter system set events '15195 trace name context forever, level 604';
                    SQL> alter system set asm_power_limit = 0;
                    $ kfed repair /dev/ASMDISK2P3
                    复制

                    2、这个时候只能用amdu抽出磁盘组中的控制文件、数据文件和REDO文件。基于alert日志文件的实例启动信息,创建pfile参数文件。

                      参数文件内容:
                      *.processes=300
                      #*.control_files='+DATA/YWZD/CONTROLFILE/current.262.1165492493'
                      *.control_files='/u01/app/oracle/oradata/ywzd/DATA_262.f'
                      *.db_block_size= 8192
                      *.compatible='12.1.0.2.0'
                      *.db_create_file_dest='/u01/app/oracle/oradata/ywzd'
                      *.db_recovery_file_dest='/u01/app/oracle/oradata/ywzd'
                      *.db_recovery_file_dest_size= 4785M
                      *.undo_tablespace='UNDOTBS1'
                      *.instance_number= 1
                      *.remote_login_passwordfile='EXCLUSIVE'
                      *.db_domain=''
                      *.dispatchers='(PROTOCOL=TCP) (SERVICE=ywzdXDB)'
                      *.audit_file_dest='/u01/app/oracle/admin/ywzd/adump'
                      *.audit_trail='DB'
                      *.db_name='ywzd'
                      *.db_unique_name='ywzd'
                      *.open_cursors= 300
                      *._diag_hm_rc_enabled= FALSE
                      *.diagnostic_dest='/u01/app/oracle'
                      启动实例
                      startup nomount pfile='/home/oracle/pfile.ora'
                      复制

                      3、在grid用户下抽取出控制文件,根据控制文件名称来编写抽取脚本。

                        --告警日志记录控制文件的路径和名称
                        *.control_files='+DATA/YWZD/CONTROLFILE/current.262.1165492493'
                        --抽出控制文件参考$ amdu -diskstring <asm_diskstring> -extract <DGNAME>.256
                        amdu -diskstring /dev/ASMDISK2P3 -extract DATA.262
                        复制

                        4、将数据库启动到mount状态

                          --在Oracle用户下创建相关目录
                          mkdir -p /u01/app/oracle/oradata/ywzd
                          --在root用户下将DATA_262.f文件拷贝到指定目录
                          cp DATA_262.f /u01/app/oracle/oradata/ywzd
                          chown -R oracle:oinstall /u01/app/oracle/oradata/ywzd/DATA_262.f
                          --将数据库启动到mount状态
                          sqlplus / as sysdba
                          alter database mount;
                          复制

                          5、在grid用户下抽取出数据文件,根据数据文件名称来编写抽取脚本。

                            --查看控制文件中记录的数据文件信息
                            select name from v$datafile
                            +DATA/YWZD/DATAFILE/system.258.1165492357
                            +DATA/YWZD/DATAFILE/sysaux.257.1165492281
                            +DATA/YWZD/DATAFILE/undotbs1.260.1165492443
                            +DATA/YWZD/DATAFILE/undotbs2.268.1165492525
                            +DATA/YWZD/DATAFILE/users.259.1165492443
                            。。。。。
                            --编写抽取脚本
                            amdu -diskstring /dev/ASMDISK2P3 -extract DATA.258
                            amdu -diskstring /dev/ASMDISK2P3 -extract DATA.257
                            amdu -diskstring /dev/ASMDISK2P3 -extract DATA.260
                            amdu -diskstring /dev/ASMDISK2P3 -extract DATA.268
                            amdu -diskstring /dev/ASMDISK2P3 -extract DATA.259
                            。。。。
                            --将文件拷贝到/u01/app/oracle/oradata/ywzd,并授权
                            --修改控制文件中数据文件的路径与名称
                            alter database rename file '+DATA/YWZD/DATAFILE/system.258.1165492357' to '/u01/app/oracle/oradata/ywzd/DATA_258.f';
                            alter database rename file '+DATA/YWZD/DATAFILE/sysaux.257.1165492281' to '/u01/app/oracle/oradata/ywzd/DATA_257.f';
                            alter database rename file '+DATA/YWZD/DATAFILE/undotbs1.260.1165492443' to '/u01/app/oracle/oradata/ywzd/DATA_260.f';
                            alter database rename file '+DATA/YWZD/DATAFILE/undotbs2.268.1165492525' to '/u01/app/oracle/oradata/ywzd/DATA_268.f';
                            alter database rename file '+DATA/YWZD/DATAFILE/users.259.1165492443' to '/u01/app/oracle/oradata/ywzd/DATA_259.f';
                            。。。。
                            复制

                            6、在grid用户下抽取出redo重做日志文件,根据redo重做日志名称来编写抽取脚本。

                              ----查看控制文件中记录的重做日志文件信息
                              select member from v$logfile;
                              GROUP# STATUS TYPE MEMBER
                              ---------- ------- ------- -------------------------------------
                              2 ONLINE +DATA/YWZD/ONLINELOG/group_2.265.1165492497
                              2 ONLINE +DATA/YWZD/ONLINELOG/group_2.266.1165492497
                              1 ONLINE +DATA/YWZD/ONLINELOG/group_1.263.1165492495
                              1 ONLINE +DATA/YWZD/ONLINELOG/group_1.264.1165492497
                              3 ONLINE +DATA/YWZD/ONLINELOG/group_3.269.1165492559
                              3 ONLINE +DATA/YWZD/ONLINELOG/group_3.270.1165492559
                              4 ONLINE +DATA/YWZD/ONLINELOG/group_4.271.1165492559
                              4 ONLINE +DATA/YWZD/ONLINELOG/group_4.272.1165492559


                              --编写抽取脚本
                              amdu -diskstring /dev/ASMDISK2P3 -extract DATA.265
                              amdu -diskstring /dev/ASMDISK2P3 -extract DATA.266
                              amdu -diskstring /dev/ASMDISK2P3 -extract DATA.263
                              amdu -diskstring /dev/ASMDISK2P3 -extract DATA.264
                              amdu -diskstring /dev/ASMDISK2P3 -extract DATA.269
                              amdu -diskstring /dev/ASMDISK2P3 -extract DATA.270
                              amdu -diskstring /dev/ASMDISK2P3 -extract DATA.271
                              amdu -diskstring /dev/ASMDISK2P3 -extract DATA.272


                              --将文件拷贝到/u01/app/oracle/oradata/ywzd,并授权
                              --修改控制文件中重做日志文件的路径与名称
                              alter database rename file '+DATA/YWZD/ONLINELOG/group_2.265.1165492497' to '/u01/app/oracle/oradata/ywzd/DATA_265.f';
                              alter database rename file '+DATA/YWZD/ONLINELOG/group_2.266.1165492497' to '/u01/app/oracle/oradata/ywzd/DATA_266.f';
                              alter database rename file '+DATA/YWZD/ONLINELOG/group_1.263.1165492495' to '/u01/app/oracle/oradata/ywzd/DATA_263.f';
                              alter database rename file '+DATA/YWZD/ONLINELOG/group_1.264.1165492497' to '/u01/app/oracle/oradata/ywzd/DATA_264.f';
                              alter database rename file '+DATA/YWZD/ONLINELOG/group_3.269.1165492559' to '/u01/app/oracle/oradata/ywzd/DATA_269.f';
                              alter database rename file '+DATA/YWZD/ONLINELOG/group_3.270.1165492559' to '/u01/app/oracle/oradata/ywzd/DATA_270.f';
                              alter database rename file '+DATA/YWZD/ONLINELOG/group_4.271.1165492559' to '/u01/app/oracle/oradata/ywzd/DATA_271.f';
                              alter database rename file '+DATA/YWZD/ONLINELOG/group_4.272.1165492559' to '/u01/app/oracle/oradata/ywzd/DATA_272.f';
                              复制

                              7、最后,成功打开数据库,并验证数据完整性。

                                SQL> alter database open;
                                复制

                                8、最后重建DATA磁盘组,并将数据文件在线迁移至磁盘组。

                                9、至此本案例恢复完成。

                                在处理本案例的过程中,还有遇到一些特殊情况,涉及相关知识点可以查阅“推荐阅读”部分和“参考资料”部分。

                                推荐阅读一:Linux系统CentOS进入单用户模式和救援模式

                                Linux 6 系统进入单用户模式步骤:

                                  1、重启服务器,系统加载至Booting界面时,按“E”键,进入grub引导菜单的内核界面;
                                  2、在内核界面选择使用上下箭头移动并选择内核;
                                  3、内核选定之后再次按下“E”键进行编辑;
                                  4、在最后一行后面加上数字1(是单用户模式的运行级别)或者single然后回车(补充:好像加数字1比较有效);
                                  5、然后根据提示按下b键(boot)进行重新引导;
                                  6、重启后,可以进入系统,现在就是单用户模式。
                                  详见步骤可以看“参考资料1”
                                  复制

                                  Linux 6 系统进入救援模式步骤:

                                    1、首先开机后,选择进入快速引导模式下,一般服务器都会选择F12,选择CD-ROM为第一引导分区。
                                    2、重启后进入安装菜单,选中Rescue install system(救援模式) 后回车;
                                    3、选择语言,保持默认English
                                    4、选择键盘类型,保持默认us
                                    5、是否启动网络,需要根据你实际情况进行选择,如果需要通过联网拷贝数据,选择YES,在这里我们选择NO
                                    6、进入到Rescue界面,选择Continue
                                    7、系统挂载在/mnt/sysimage下 如果要到root环境下,运行 chroot /mnt/sysimage 命令
                                    8、进入该模式下,可以选择进行相关操作,如修改root密码,使用fsck命令进行修复磁盘和文件等。
                                    9、在sh-4.1#模式下需要先exit退出,回到bash-4.1#才可以reboot重启系统;
                                    复制

                                    推荐阅读二:Vmware中虚拟机获取磁盘UUID无返回结果

                                      检查Vmware虚拟机配置文件xxx.vmx,确认EnableUUID是否为TRUE。
                                      disk.EnableUUID = "TRUE"


                                      执行lsblk命令查看挂载的硬盘,确认是否识别到UUID号
                                      # for i in `cat /proc/partitions | awk '{print $4}' | grep sd | grep [b-z]$`; do echo "### $i: `/sbin/scsi_id -g -u -d /dev/$i`"; done
                                      ### sdb: 36000c29ff0363b8ebda7309f6f26c8c3
                                      ### sdc: 36000c29811ad6d1353dd6df390513d3f
                                      ### sdd: 36000c299356df1e7ab559e6bb0786514
                                      ### sdf: 36000c298d69a917dded0a30db5c4a6c5
                                      ### sde: 36000c2961eaa1e1771da07de9176da77
                                      ### sdg: 36000c2964b69f61d560c183c1494e4e8
                                      ### sdi: 36000c29084f5fd479b297e4ad4a8ef16
                                      ### sdh: 36000c294d1dd24f128dc3c06397ef7da
                                      ### sdj: 36000c2979ce2d0d852d62298ed0d1b43
                                      ### sdk: 36000c29587ebe58f3a35c50073857b54


                                      如果udev配置有调整,可以重新加载配置
                                      --#/sbin/udevadm trigger --type=devices --action=change
                                      --#/sbin/udevadm control --reload-rules


                                      --#start_udev
                                      --#echo'/sbin/start_udev' >> /etc/rc.local
                                      复制

                                      推荐阅读三:磁盘组挂坏后,使用amdu快速恢复数据库

                                        1. Create pfile of lost database , from database alert.log startup messages.
                                        2. Startup database in nomount
                                        3. Get the controlfile number from db alert log , it will show while starting the database here in this eg. its 256
                                        eg., control_files='+<DGNAME>/<DB_NAME>/controlfile/current.256.709676643'
                                        If you have controlfile backup already in non-asm location edit the pfile to point to location of non-asm and then mount the database
                                        If you dont have controlfile backup then go to step 4 after determining the file# from step 3
                                        4. $ amdu -diskstring <asm_diskstring> -extract <DGNAME>.256
                                        5. shutdown the database and change the control_file location to point to the extracted file location
                                        6. startup mount the database
                                        7. once mounted, get the datafile file numbers using "select name from v$datafile"
                                        And get online redo logfile from "select * from v$logfile".
                                        8. Extract all datafiles and redolog files in similar manner
                                        9. alter database rename <datafile 1> to < newly extracted location>
                                        10. open the database
                                        复制

                                        • https://www.cnblogs.com/xuefy/p/13527898.html

                                        • https://www.jianshu.com/p/87cd034bcedb

                                        • https://www.parnassusdata.com/en/node/223

                                        • https://blog.csdn.net/chinahuanghuajun/article/details/78365559

                                        • http://www.minniebaby.tech/2021/10/21/asm-virtually-addressed-metadata-continuing-operations-directory/

                                        • https://blog.csdn.net/jasonliu1/article/details/38494833

                                        • https://blog.csdn.net/woqutechteam/article/details/61914611

                                        • https://www.modb.pro/db/21415

                                        • How to Restore the Database Using AMDU after Diskgroup Corruption (Doc ID 1597581.1)

                                        以上就是本期关于“数据很重要,ASM磁盘组损坏,使用AMDU来抢救”的处置案例。希望能给大家带来帮助。

                                        欢迎关注“数据库运维之道”公众号,一起学习数据库技术!

                                        欢迎扫码进“数据库运维之道”,此群用于数据库技术交流,禁止发广告!

                                        可以加我的微信,交个朋友或讨论数据库解决方案,请备注”姓名单位“,谢谢!

                                        文章转载自数据库运维之道,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                        评论