暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle ASM磁盘组被误删除!他跑,她追,他插翅难飞!

IT小Chen 2024-04-09
182

经常删库的小伙伴们,经常听说一句话:"删库跑路,三年起步!"

说的就是变更不规范,亲人两行泪的故事。

那么究竟是什么导致故事变成了事故呢,这一切的背后究竟是人性的扭曲、良心的泯灭还是道德的沦丧呢?让我们回顾一下案发现场:

注意:

本故事纯属虚构,如有雷同,纯属巧合,请勿模仿。

又快到下班时间了,突然,DBA小C接到了客户的电话,反馈数据库服务器A系统磁盘空间快不足了,之前给ASM加的两块盘sdh,sdi还没开始用,先踢出ASM磁盘组,临时扩容下系统盘。

接到任务,粗心大意的小C只想快速完成工作,避免加班,所以他匆忙检查了sdh,sdi盘已经被加到新建的+CJC磁盘组里,既然客户说没在用,那就不检查了,直接删除+CJC磁盘组。

说干就干,直接执行drop diskgroup:

发现active diskgroup磁盘删除不了:

    [grid@cjc-db-02 ~]$ sqlplus  as sysasm
    SQL> drop diskgroup CJC including contents;
    ORA-15039: diskgroup not dropped
    ORA-15027: active use of diskgroup "CJC" precludes its dismount
    复制

    一不做二不休,小C想到,别耽误我下班,直接停库操作:

      [oracle@cjc-db-02 ~]$ srvctl status database -d cjc
      Database is running.
      [oracle@cjc-db-02 ~]$ srvctl stop database -d cjc
      [oracle@cjc-db-02 ~]$ srvctl status database -d cjc
      Database is not running.
      复制

      再次执行,成功删除掉了+CJC磁盘组:

        [grid@cjc-db-02 ~]$ sqlplus  as sysasm
        SQL> drop diskgroup CJC including contents;
        Diskgroup dropped.
        复制

        启动数据库,有报错:

          [oracle@cjc-db-02 ~]$ srvctl start database -d cjc
          PRCR-1079 : Failed to start resource ora.cjc.db
          CRS-2640: Required resource 'ora.CJC.dg' is missing.
          SQL> startup
          ORA-39511: Start of CRS resource for instance '222' failed with error:[CRS-2640: Required resource 'ora.CJC.dg' is missing.
          CRS-0222: Resource 'ora.cjc.db' has dependency error.
          clsr_start_resource:260 status:222
          clsrapi_start_db:start_asmdbs status:222
          复制

          尝试重新启动has,发现问题仍然没有解决:

            [root@cjc-db-02 bin]# ./crsctl stop has
            [root@cjc-db-02 bin]# ./crsctl start has
            复制

            原来是因为踢盘后,CRS信息还没有更新,需要通过 stvctl modify ...命令更新后才能启动数据库。

            正当小C准备更新CRS信息时,客户电话又打来了,先别踢盘了,业务同事反馈,sdh,sdi昨天晚上已经开始写业务数据了,不能删除!

            啊,这,,,新上线的库还没有来得及备份啊!

            小C抽了根烟冷静了一下,默默打开了BOSS直聘,过了一会又打开了12306,最后又抱着试一试的心态在浏览器上搜索了ASM磁盘组误操作后如何恢复?

            终于看到了希望,原来drop diskgroup只是逻辑删除,数据被覆盖之前还可以挽救回来,最终小C通过kfed工具成功的挽回了数据...

            本次案例小C深有感触,明白了对生产环境要时刻抱有敬畏之心,并更新了自己的微信签名,从原来的"乾坤未定,你我皆是黑马!"改成了"乾坤已定,你我皆是牛马!"

            下面,让我们看下如何使用kfed工具恢复误删除的磁盘组:

            环境说明:

              DB:Oracle 19.22单机+ASM
              OS:Oracle Linux 7.6
              复制

              启动has:

                [root@cjc-db-02 bin]#  mount -o remount,size=5G dev/shm
                [root@cjc-db-02 bin]# ./crsctl start has
                复制

                查看资源状态:

                  [root@cjc-db-02 bin]# ./crsctl stat res -t
                  --------------------------------------------------------------------------------
                  Name Target State Server State details
                  --------------------------------------------------------------------------------
                  Local Resources
                  --------------------------------------------------------------------------------
                  ora.DATA.dg
                  ONLINE ONLINE cjc-db-02 STABLE
                  ora.LISTENER.lsnr
                  ONLINE ONLINE cjc-db-02 STABLE
                  ora.asm
                  ONLINE ONLINE cjc-db-02 Started,STABLE
                  ora.ons
                  OFFLINE OFFLINE cjc-db-02 STABLE
                  --------------------------------------------------------------------------------
                  Cluster Resources
                  --------------------------------------------------------------------------------
                  ora.cjc.db
                  1 OFFLINE OFFLINE Instance Shutdown,ST
                  ABLE
                  ora.cssd
                  1 ONLINE ONLINE cjc-db-02 STABLE
                  ora.diskmon
                  1 OFFLINE OFFLINE STABLE
                  ora.evmd
                  1 ONLINE ONLINE cjc-db-02 STABLE
                  --------------------------------------------------------------------------------
                  复制

                  查看磁盘组信息:

                    [grid@cjc-db-02 ~]$ sqlplus  as sysasm
                    set line 300
                    col name for a15
                    select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;
                    GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
                    ------------ --------------- ----------- ------ ---------- ----------
                    1 DATA MOUNTED EXTERN 12288 9740
                    复制

                    查看磁盘信息:

                      col path for a30
                      select GROUP_NUMBER,DISK_NUMBER,STATE,REDUNDANCY,OS_MB,TOTAL_MB,FREE_MB,NAME,PATH from v$asm_disk order by 1,2;
                      GROUP_NUMBER DISK_NUMBER STATE REDUNDA OS_MB TOTAL_MB FREE_MB NAME PATH
                      ------------ ----------- -------- ------- ---------- ---------- ---------- --------------- ------------------------------
                      0 0 NORMAL UNKNOWN2048 0 0 dev/sdk
                      0 1 NORMAL UNKNOWN2048 0 0 dev/sdh
                      0 2 NORMAL UNKNOWN2048 0 0 dev/sdi
                      0 3 NORMAL UNKNOWN2048 0 0 dev/sdj
                      1 0 NORMAL UNKNOWN2048 2048 1624 DATA_0000 dev/sdb
                      1 1 NORMAL UNKNOWN2048 2048 1628 DATA_0001 dev/sdc
                      1 2 NORMAL UNKNOWN2048 2048 1620 DATA_0002 dev/sdd
                      1 3 NORMAL UNKNOWN2048 2048 1624 DATA_0003 dev/sde
                      1 4 NORMAL UNKNOWN2048 2048 1632 DATA_0004 dev/sdf
                      1 5 NORMAL UNKNOWN2048 2048 1612 DATA_0005 dev/sdg


                      10 rows selected.
                      复制

                      新建磁盘组 CJC:

                        [grid@cjc-db-02 ~]$ asmca
                        复制

                        查看磁盘组信息:

                          set line 300
                          col name for a15
                          select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;
                          GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
                          ------------ --------------- ----------- ------ ---------- ----------
                          1 DATA MOUNTED EXTERN 12288 9740
                          2 CJC MOUNTED EXTERN 4096 3988
                          复制
                            col path for a30
                            select GROUP_NUMBER,DISK_NUMBER,STATE,REDUNDANCY,OS_MB,TOTAL_MB,FREE_MB,NAME,PATH from v$asm_disk order by 1,2;


                            GROUP_NUMBER DISK_NUMBER STATE REDUNDA OS_MB TOTAL_MB FREE_MB NAME PATH
                            ------------ ----------- -------- ------- ---------- ---------- ---------- ------------------------------ ------------------------------
                            0 0 NORMAL UNKNOWN2048 0 0 /dev/sdk
                            0 1 NORMAL UNKNOWN2048 0 0 /dev/sdj
                            1 0 NORMAL UNKNOWN2048 2048 1620 DATA_0000 /dev/sdb
                            1 1 NORMAL UNKNOWN2048 2048 1624 DATA_0001 /dev/sdc
                            1 2 NORMAL UNKNOWN2048 2048 1612 DATA_0002 /dev/sdd
                            1 3 NORMAL UNKNOWN2048 2048 1624 DATA_0003 /dev/sde
                            1 4 NORMAL UNKNOWN2048 2048 1624 DATA_0004 /dev/sdf
                            1 5 NORMAL UNKNOWN2048 2048 1608 DATA_0005 /dev/sdg
                            2 0 NORMAL UNKNOWN2048 2048 1980 CJC_0000 /dev/sdh
                            2 1 NORMAL UNKNOWN2048 2048 1996 CJC_0001 /dev/sdi


                            10 rows selected.
                            复制

                            新增测试数据:

                              [oracle@cjc-db-02 ~]$ sqlplus / as sysdba
                              select name from v$dbfile;
                              NAME
                              --------------------------------------------------------------------------------
                              +DATA/CJC/DATAFILE/users.260.1165422711
                              +DATA/CJC/DATAFILE/undotbs1.259.1165422693
                              +DATA/CJC/DATAFILE/system.257.1165422411
                              +DATA/CJC/DATAFILE/sysaux.258.1165422581
                              复制
                                create tablespace CJC datafile '+CJC' size 10M;
                                create user CJC identified by "a" default tablespace CJC;
                                grant dba to CJC;
                                conn CJC/a
                                create table t1 as select level as id from dual connect by level<=10;
                                select * from t1;
                                ID
                                ----------
                                1
                                2
                                3
                                4
                                5
                                6
                                7
                                8
                                9
                                10


                                10 rows selected.
                                复制

                                备份ASM磁盘头数据,方便后面对比分析及恢复:

                                磁盘头信息备份

                                  kfed read /dev/sdh >/tmp/sdh01
                                  复制

                                  块号1备份

                                    kfed read /dev/sdh blkn=1 >/tmp/sdh02
                                    复制

                                    aun1备份

                                      kfed read /dev/sdh aun=1  >/tmp/sdh03
                                      复制

                                      磁盘头信息备份

                                        kfed read /dev/sdi >/tmp/sdi01
                                        复制

                                        块号1备份

                                          kfed read /dev/sdi blkn=1 >/tmp/sdi02
                                          复制

                                          aun1备份

                                            kfed read /dev/sdi aun=1  >/tmp/sdi03
                                            复制

                                            删除磁盘组,模拟误操作:

                                              [grid@cjc-db-02 ~]$ sqlplus / as sysasm
                                              SQL> drop diskgroup CJC including contents;
                                              ORA-15039: diskgroup not dropped
                                              ORA-15027: active use of diskgroup "CJC" precludes its dismount
                                              复制

                                              停库后继续删除:

                                                [oracle@cjc-db-02 ~]$ srvctl status database -d cjc
                                                Database is running.
                                                [oracle@cjc-db-02 ~]$ srvctl stop database -d cjc
                                                [oracle@cjc-db-02 ~]$ srvctl status database -d cjc
                                                Database is not running.
                                                复制

                                                成功删除掉CJC磁盘组:

                                                  [grid@cjc-db-02 ~]$ sqlplus / as sysasm
                                                  SQL> drop diskgroup CJC including contents;
                                                  Diskgroup dropped.
                                                  复制

                                                  启动数据库报错:

                                                    [oracle@cjc-db-02 ~]$ srvctl start database -d cjc
                                                    PRCR-1079 : Failed to start resource ora.cjc.db
                                                    CRS-2640: Required resource 'ora.CJC.dg' is missing.


                                                    SQL> startup
                                                    ORA-39511: Start of CRS resource for instance '222' failed with error:[CRS-2640: Required resource 'ora.CJC.dg' is missing.
                                                    CRS-0222: Resource 'ora.cjc.db' has dependency error.
                                                    clsr_start_resource:260 status:222
                                                    clsrapi_start_db:start_asmdbs status:222
                                                    复制

                                                    恢复误删除的磁盘组:

                                                    查看磁盘组信息:

                                                      set line 300
                                                      col name for a15
                                                      select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;


                                                      GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
                                                      ------------ --------------- ----------- ------ ---------- ----------
                                                      1 DATA MOUNTED EXTERN 12288 9704
                                                      复制
                                                        col path for a30
                                                        select GROUP_NUMBER,DISK_NUMBER,STATE,REDUNDANCY,OS_MB,TOTAL_MB,FREE_MB,NAME,PATH from v$asm_disk order by 1,2;SQL>


                                                        GROUP_NUMBER DISK_NUMBER STATE REDUNDA OS_MB TOTAL_MB FREE_MB NAME PATH
                                                        ------------ ----------- -------- ------- ---------- ---------- ---------- --------------- ------------------------------
                                                        0 0 NORMAL UNKNOWN2048 0 0 /dev/sdk
                                                        0 1 NORMAL UNKNOWN2048 0 0 /dev/sdh
                                                        0 2 NORMAL UNKNOWN2048 0 0 /dev/sdi
                                                        0 3 NORMAL UNKNOWN2048 0 0 /dev/sdj
                                                        1 0 NORMAL UNKNOWN2048 2048 1620 DATA_0000 /dev/sdb
                                                        1 1 NORMAL UNKNOWN2048 2048 1624 DATA_0001 /dev/sdc
                                                        1 2 NORMAL UNKNOWN2048 2048 1612 DATA_0002 /dev/sdd
                                                        1 3 NORMAL UNKNOWN2048 2048 1620 DATA_0003 /dev/sde
                                                        1 4 NORMAL UNKNOWN2048 2048 1620 DATA_0004 /dev/sdf
                                                        1 5 NORMAL UNKNOWN2048 2048 1608 DATA_0005 /dev/sdg


                                                        10 rows selected.
                                                        复制

                                                        恢复

                                                        --备份磁盘头信息

                                                          kfed read /dev/sdh >/tmp/sdh01xxx
                                                          复制

                                                          块号1备份

                                                            kfed read /dev/sdh blkn=1 >/tmp/sdh02xxx
                                                            复制

                                                            aun1备份

                                                              kfed read /dev/sdh aun=1  >/tmp/sdh03xxx
                                                              复制

                                                              磁盘头信息备份

                                                                kfed read /dev/sdi >/tmp/sdi01xxx
                                                                复制

                                                                块号1备份

                                                                  kfed read /dev/sdi blkn=1 >/tmp/sdi02xxx
                                                                  复制

                                                                  aun1备份

                                                                    kfed read /dev/sdi aun=1  >/tmp/sdi03xxx
                                                                    复制

                                                                    通过diff命令进行对比:

                                                                      [grid@cjc-db-02 ~]$ diff -C 1 /tmp/sdh01 /tmp/sdh01xxx|grep "kfdhdb.hdrsts"
                                                                      ! kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
                                                                      ! kfdhdb.hdrsts: 4 ; 0x027: KFDHDR_FORMER
                                                                      复制
                                                                        [grid@cjc-db-02 ~]$ diff -C 1 /tmp/sdi01 /tmp/sdi01xxx|grep "kfdhdb.hdrsts"
                                                                        ! kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
                                                                        ! kfdhdb.hdrsts: 4 ; 0x027: KFDHDR_FORMER
                                                                        复制

                                                                        通过对比删除前和删除后磁盘头数据的不同部分,可以看到kfdhdb.hdrsts行有差异。

                                                                        尝试改回删除前磁盘头的信息:

                                                                        --修改sdh01xxx文件

                                                                        kfdhdb.hdrsts:状态值从4改为3,状态名称从KFDHDR_FORMER改为KFDHDR_MEMBER

                                                                        --修改sdi01xxx文件

                                                                        kfdhdb.hdrsts:状态值从4改为3,状态名称从KFDHDR_FORMER改为KFDHDR_MEMBER

                                                                          [grid@cjc-db-02 ~]$ vi /tmp/sdh01xxx
                                                                          复制

                                                                            kfdhdb.hdrsts:                        4 ; 0x027: KFDHDR_FORMER
                                                                            复制

                                                                            改成

                                                                              kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
                                                                              复制
                                                                                [grid@cjc-db-02 ~]$ vi /tmp/sdi01xxx
                                                                                复制


                                                                                  kfdhdb.hdrsts:                        4 ; 0x027: KFDHDR_FORMER
                                                                                  复制

                                                                                  改成

                                                                                    kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
                                                                                    复制

                                                                                    合并修改:

                                                                                      [grid@cjc-db-02 ~]$ kfed merge /dev/sdh text=/tmp/sdh01xxx
                                                                                      [grid@cjc-db-02 ~]$ kfed merge /dev/sdi text=/tmp/sdi01xxx
                                                                                      复制

                                                                                      检查磁盘组:可以看到CJC磁盘组了

                                                                                        [grid@cjc-db-02 ~]$ sqlplus / as sysasm
                                                                                        set line 300
                                                                                        col name for a15
                                                                                        select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;
                                                                                        GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
                                                                                        ------------ --------------- ----------- ------ ---------- ----------
                                                                                        0 CJC DISMOUNTED 0 0
                                                                                        1 DATA MOUNTED EXTERN 12288 9704
                                                                                        复制

                                                                                        挂载磁盘组

                                                                                          SQL> alter diskgroup CJC mount;
                                                                                          复制
                                                                                            [grid@cjc-db-02 trace]$ tail -100f alert_+ASM.log 
                                                                                            .....
                                                                                            2024-04-06T14:43:30.834474+08:00
                                                                                            SUCCESS: alter diskgroup CJC mount
                                                                                            WARNING: unknown state for diskgroup resource ora.CJC.dg, Return Value: 3
                                                                                            复制

                                                                                            查看

                                                                                              set line 300
                                                                                              col name for a15
                                                                                              select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;SQL> SQL>


                                                                                              GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
                                                                                              ------------ --------------- ----------- ------ ---------- ----------
                                                                                              1 DATA MOUNTED EXTERN 12288 9704
                                                                                              2 CJC MOUNTED EXTERN 4096 3976
                                                                                              复制

                                                                                              可以正常启动数据库:

                                                                                                [oracle@cjc-db-02 ~]$ sqlplus / as sysdba
                                                                                                SQL*Plus: Release 19.0.0.0.0 - Production on Sat Apr 6 14:44:53 2024
                                                                                                Version 19.22.0.0.0
                                                                                                Copyright (c) 1982, 2023, Oracle.  All rights reserved.
                                                                                                Connected to an idle instance.


                                                                                                SQL> startup
                                                                                                ORACLE instance started.


                                                                                                Total System Global Area 1174402440 bytes
                                                                                                Fixed Size 8938888 bytes
                                                                                                Variable Size 369098752 bytes
                                                                                                Database Buffers 788529152 bytes
                                                                                                Redo Buffers 7835648 bytes
                                                                                                Database mounted.
                                                                                                Database opened.
                                                                                                复制

                                                                                                可以看到CJC磁盘组的数据

                                                                                                  SQL> select * from cjc.t1;


                                                                                                  ID
                                                                                                  ----------
                                                                                                  1
                                                                                                  2
                                                                                                  3
                                                                                                  4
                                                                                                  5
                                                                                                  6
                                                                                                  7
                                                                                                  8
                                                                                                  9
                                                                                                  10


                                                                                                  10 rows selected.
                                                                                                  复制

                                                                                                  参考:微信公众"数据库运维之道"文章:【应知应会】使用kfed运维兵器修复ASM磁盘和磁盘组

                                                                                                  【应知应会】使用kfed运维兵器修复ASM磁盘和磁盘组

                                                                                                  数据库运维之道,公众号:数据库运维之道【应知应会】使用kfed运维兵器修复ASM磁盘和磁盘组

                                                                                                  ###chenjuchao 20240406###

                                                                                                  文章转载自IT小Chen,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                                                                                  评论