经常删库的小伙伴们,经常听说一句话:"删库跑路,三年起步!"
说的就是变更不规范,亲人两行泪的故事。
那么究竟是什么导致故事变成了事故呢,这一切的背后究竟是人性的扭曲、良心的泯灭还是道德的沦丧呢?让我们回顾一下案发现场:
注意:
本故事纯属虚构,如有雷同,纯属巧合,请勿模仿。
又快到下班时间了,突然,DBA小C接到了客户的电话,反馈数据库服务器A系统磁盘空间快不足了,之前给ASM加的两块盘sdh,sdi还没开始用,先踢出ASM磁盘组,临时扩容下系统盘。
接到任务,粗心大意的小C只想快速完成工作,避免加班,所以他匆忙检查了sdh,sdi盘已经被加到新建的+CJC磁盘组里,既然客户说没在用,那就不检查了,直接删除+CJC磁盘组。
说干就干,直接执行drop diskgroup:
发现active diskgroup磁盘删除不了:
[grid@cjc-db-02 ~]$ sqlplus as sysasm
SQL> drop diskgroup CJC including contents;
ORA-15039: diskgroup not dropped
ORA-15027: active use of diskgroup "CJC" precludes its dismount
复制
一不做二不休,小C想到,别耽误我下班,直接停库操作:
[oracle@cjc-db-02 ~]$ srvctl status database -d cjc
Database is running.
[oracle@cjc-db-02 ~]$ srvctl stop database -d cjc
[oracle@cjc-db-02 ~]$ srvctl status database -d cjc
Database is not running.
复制
再次执行,成功删除掉了+CJC磁盘组:
[grid@cjc-db-02 ~]$ sqlplus as sysasm
SQL> drop diskgroup CJC including contents;
Diskgroup dropped.
复制
启动数据库,有报错:
[oracle@cjc-db-02 ~]$ srvctl start database -d cjc
PRCR-1079 : Failed to start resource ora.cjc.db
CRS-2640: Required resource 'ora.CJC.dg' is missing.
SQL> startup
ORA-39511: Start of CRS resource for instance '222' failed with error:[CRS-2640: Required resource 'ora.CJC.dg' is missing.
CRS-0222: Resource 'ora.cjc.db' has dependency error.
clsr_start_resource:260 status:222
clsrapi_start_db:start_asmdbs status:222
复制
尝试重新启动has,发现问题仍然没有解决:
[root@cjc-db-02 bin]# ./crsctl stop has
[root@cjc-db-02 bin]# ./crsctl start has
复制
原来是因为踢盘后,CRS信息还没有更新,需要通过 stvctl modify ...命令更新后才能启动数据库。
正当小C准备更新CRS信息时,客户电话又打来了,先别踢盘了,业务同事反馈,sdh,sdi昨天晚上已经开始写业务数据了,不能删除!
啊,这,,,新上线的库还没有来得及备份啊!
小C抽了根烟冷静了一下,默默打开了BOSS直聘,过了一会又打开了12306,最后又抱着试一试的心态在浏览器上搜索了ASM磁盘组误操作后如何恢复?
终于看到了希望,原来drop diskgroup只是逻辑删除,数据被覆盖之前还可以挽救回来,最终小C通过kfed工具成功的挽回了数据...
本次案例小C深有感触,明白了对生产环境要时刻抱有敬畏之心,并更新了自己的微信签名,从原来的"乾坤未定,你我皆是黑马!"改成了"乾坤已定,你我皆是牛马!"
下面,让我们看下如何使用kfed工具恢复误删除的磁盘组:
环境说明:
DB:Oracle 19.22单机+ASM
OS:Oracle Linux 7.6
复制
启动has:
[root@cjc-db-02 bin]# mount -o remount,size=5G dev/shm
[root@cjc-db-02 bin]# ./crsctl start has
复制
查看资源状态:
[root@cjc-db-02 bin]# ./crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE cjc-db-02 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE cjc-db-02 STABLE
ora.asm
ONLINE ONLINE cjc-db-02 Started,STABLE
ora.ons
OFFLINE OFFLINE cjc-db-02 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cjc.db
1 OFFLINE OFFLINE Instance Shutdown,ST
ABLE
ora.cssd
1 ONLINE ONLINE cjc-db-02 STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.evmd
1 ONLINE ONLINE cjc-db-02 STABLE
--------------------------------------------------------------------------------
复制
查看磁盘组信息:
[grid@cjc-db-02 ~]$ sqlplus as sysasm
set line 300
col name for a15
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
------------ --------------- ----------- ------ ---------- ----------
1 DATA MOUNTED EXTERN 12288 9740
复制
查看磁盘信息:
col path for a30
select GROUP_NUMBER,DISK_NUMBER,STATE,REDUNDANCY,OS_MB,TOTAL_MB,FREE_MB,NAME,PATH from v$asm_disk order by 1,2;
GROUP_NUMBER DISK_NUMBER STATE REDUNDA OS_MB TOTAL_MB FREE_MB NAME PATH
------------ ----------- -------- ------- ---------- ---------- ---------- --------------- ------------------------------
0 0 NORMAL UNKNOWN2048 0 0 dev/sdk
0 1 NORMAL UNKNOWN2048 0 0 dev/sdh
0 2 NORMAL UNKNOWN2048 0 0 dev/sdi
0 3 NORMAL UNKNOWN2048 0 0 dev/sdj
1 0 NORMAL UNKNOWN2048 2048 1624 DATA_0000 dev/sdb
1 1 NORMAL UNKNOWN2048 2048 1628 DATA_0001 dev/sdc
1 2 NORMAL UNKNOWN2048 2048 1620 DATA_0002 dev/sdd
1 3 NORMAL UNKNOWN2048 2048 1624 DATA_0003 dev/sde
1 4 NORMAL UNKNOWN2048 2048 1632 DATA_0004 dev/sdf
1 5 NORMAL UNKNOWN2048 2048 1612 DATA_0005 dev/sdg
10 rows selected.
复制
新建磁盘组 CJC:
[grid@cjc-db-02 ~]$ asmca
复制
查看磁盘组信息:
set line 300
col name for a15
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
------------ --------------- ----------- ------ ---------- ----------
1 DATA MOUNTED EXTERN 12288 9740
2 CJC MOUNTED EXTERN 4096 3988
复制
col path for a30
select GROUP_NUMBER,DISK_NUMBER,STATE,REDUNDANCY,OS_MB,TOTAL_MB,FREE_MB,NAME,PATH from v$asm_disk order by 1,2;
GROUP_NUMBER DISK_NUMBER STATE REDUNDA OS_MB TOTAL_MB FREE_MB NAME PATH
------------ ----------- -------- ------- ---------- ---------- ---------- ------------------------------ ------------------------------
0 0 NORMAL UNKNOWN2048 0 0 /dev/sdk
0 1 NORMAL UNKNOWN2048 0 0 /dev/sdj
1 0 NORMAL UNKNOWN2048 2048 1620 DATA_0000 /dev/sdb
1 1 NORMAL UNKNOWN2048 2048 1624 DATA_0001 /dev/sdc
1 2 NORMAL UNKNOWN2048 2048 1612 DATA_0002 /dev/sdd
1 3 NORMAL UNKNOWN2048 2048 1624 DATA_0003 /dev/sde
1 4 NORMAL UNKNOWN2048 2048 1624 DATA_0004 /dev/sdf
1 5 NORMAL UNKNOWN2048 2048 1608 DATA_0005 /dev/sdg
2 0 NORMAL UNKNOWN2048 2048 1980 CJC_0000 /dev/sdh
2 1 NORMAL UNKNOWN2048 2048 1996 CJC_0001 /dev/sdi
10 rows selected.
复制
新增测试数据:
[oracle@cjc-db-02 ~]$ sqlplus / as sysdba
select name from v$dbfile;
NAME
--------------------------------------------------------------------------------
+DATA/CJC/DATAFILE/users.260.1165422711
+DATA/CJC/DATAFILE/undotbs1.259.1165422693
+DATA/CJC/DATAFILE/system.257.1165422411
+DATA/CJC/DATAFILE/sysaux.258.1165422581
复制
create tablespace CJC datafile '+CJC' size 10M;
create user CJC identified by "a" default tablespace CJC;
grant dba to CJC;
conn CJC/a
create table t1 as select level as id from dual connect by level<=10;
select * from t1;
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
复制
备份ASM磁盘头数据,方便后面对比分析及恢复:
磁盘头信息备份
kfed read /dev/sdh >/tmp/sdh01
复制
块号1备份
kfed read /dev/sdh blkn=1 >/tmp/sdh02
复制
aun1备份
kfed read /dev/sdh aun=1 >/tmp/sdh03
复制
磁盘头信息备份
kfed read /dev/sdi >/tmp/sdi01
复制
块号1备份
kfed read /dev/sdi blkn=1 >/tmp/sdi02
复制
aun1备份
kfed read /dev/sdi aun=1 >/tmp/sdi03
复制
删除磁盘组,模拟误操作:
[grid@cjc-db-02 ~]$ sqlplus / as sysasm
SQL> drop diskgroup CJC including contents;
ORA-15039: diskgroup not dropped
ORA-15027: active use of diskgroup "CJC" precludes its dismount
复制
停库后继续删除:
[oracle@cjc-db-02 ~]$ srvctl status database -d cjc
Database is running.
[oracle@cjc-db-02 ~]$ srvctl stop database -d cjc
[oracle@cjc-db-02 ~]$ srvctl status database -d cjc
Database is not running.
复制
成功删除掉CJC磁盘组:
[grid@cjc-db-02 ~]$ sqlplus / as sysasm
SQL> drop diskgroup CJC including contents;
Diskgroup dropped.
复制
启动数据库报错:
[oracle@cjc-db-02 ~]$ srvctl start database -d cjc
PRCR-1079 : Failed to start resource ora.cjc.db
CRS-2640: Required resource 'ora.CJC.dg' is missing.
SQL> startup
ORA-39511: Start of CRS resource for instance '222' failed with error:[CRS-2640: Required resource 'ora.CJC.dg' is missing.
CRS-0222: Resource 'ora.cjc.db' has dependency error.
clsr_start_resource:260 status:222
clsrapi_start_db:start_asmdbs status:222
复制
恢复误删除的磁盘组:
查看磁盘组信息:
set line 300
col name for a15
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
------------ --------------- ----------- ------ ---------- ----------
1 DATA MOUNTED EXTERN 12288 9704
复制
col path for a30
select GROUP_NUMBER,DISK_NUMBER,STATE,REDUNDANCY,OS_MB,TOTAL_MB,FREE_MB,NAME,PATH from v$asm_disk order by 1,2;SQL>
GROUP_NUMBER DISK_NUMBER STATE REDUNDA OS_MB TOTAL_MB FREE_MB NAME PATH
------------ ----------- -------- ------- ---------- ---------- ---------- --------------- ------------------------------
0 0 NORMAL UNKNOWN2048 0 0 /dev/sdk
0 1 NORMAL UNKNOWN2048 0 0 /dev/sdh
0 2 NORMAL UNKNOWN2048 0 0 /dev/sdi
0 3 NORMAL UNKNOWN2048 0 0 /dev/sdj
1 0 NORMAL UNKNOWN2048 2048 1620 DATA_0000 /dev/sdb
1 1 NORMAL UNKNOWN2048 2048 1624 DATA_0001 /dev/sdc
1 2 NORMAL UNKNOWN2048 2048 1612 DATA_0002 /dev/sdd
1 3 NORMAL UNKNOWN2048 2048 1620 DATA_0003 /dev/sde
1 4 NORMAL UNKNOWN2048 2048 1620 DATA_0004 /dev/sdf
1 5 NORMAL UNKNOWN2048 2048 1608 DATA_0005 /dev/sdg
10 rows selected.
复制
恢复
--备份磁盘头信息
kfed read /dev/sdh >/tmp/sdh01xxx
复制
块号1备份
kfed read /dev/sdh blkn=1 >/tmp/sdh02xxx
复制
aun1备份
kfed read /dev/sdh aun=1 >/tmp/sdh03xxx
复制
磁盘头信息备份
kfed read /dev/sdi >/tmp/sdi01xxx
复制
块号1备份
kfed read /dev/sdi blkn=1 >/tmp/sdi02xxx
复制
aun1备份
kfed read /dev/sdi aun=1 >/tmp/sdi03xxx
复制
通过diff命令进行对比:
[grid@cjc-db-02 ~]$ diff -C 1 /tmp/sdh01 /tmp/sdh01xxx|grep "kfdhdb.hdrsts"
! kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
! kfdhdb.hdrsts: 4 ; 0x027: KFDHDR_FORMER
复制
[grid@cjc-db-02 ~]$ diff -C 1 /tmp/sdi01 /tmp/sdi01xxx|grep "kfdhdb.hdrsts"
! kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
! kfdhdb.hdrsts: 4 ; 0x027: KFDHDR_FORMER
复制
通过对比删除前和删除后磁盘头数据的不同部分,可以看到kfdhdb.hdrsts行有差异。
尝试改回删除前磁盘头的信息:
--修改sdh01xxx文件
kfdhdb.hdrsts:状态值从4改为3,状态名称从KFDHDR_FORMER改为KFDHDR_MEMBER
--修改sdi01xxx文件
kfdhdb.hdrsts:状态值从4改为3,状态名称从KFDHDR_FORMER改为KFDHDR_MEMBER
[grid@cjc-db-02 ~]$ vi /tmp/sdh01xxx
复制
将
kfdhdb.hdrsts: 4 ; 0x027: KFDHDR_FORMER
复制
改成
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
复制
[grid@cjc-db-02 ~]$ vi /tmp/sdi01xxx
复制
将
kfdhdb.hdrsts: 4 ; 0x027: KFDHDR_FORMER
复制
改成
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
复制
合并修改:
[grid@cjc-db-02 ~]$ kfed merge /dev/sdh text=/tmp/sdh01xxx
[grid@cjc-db-02 ~]$ kfed merge /dev/sdi text=/tmp/sdi01xxx
复制
检查磁盘组:可以看到CJC磁盘组了
[grid@cjc-db-02 ~]$ sqlplus / as sysasm
set line 300
col name for a15
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
------------ --------------- ----------- ------ ---------- ----------
0 CJC DISMOUNTED 0 0
1 DATA MOUNTED EXTERN 12288 9704
复制
挂载磁盘组
SQL> alter diskgroup CJC mount;
复制
[grid@cjc-db-02 trace]$ tail -100f alert_+ASM.log
.....
2024-04-06T14:43:30.834474+08:00
SUCCESS: alter diskgroup CJC mount
WARNING: unknown state for diskgroup resource ora.CJC.dg, Return Value: 3
复制
查看
set line 300
col name for a15
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from v$asm_diskgroup;SQL> SQL>
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB
------------ --------------- ----------- ------ ---------- ----------
1 DATA MOUNTED EXTERN 12288 9704
2 CJC MOUNTED EXTERN 4096 3976
复制
可以正常启动数据库:
[oracle@cjc-db-02 ~]$ sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Sat Apr 6 14:44:53 2024
Version 19.22.0.0.0
Copyright (c) 1982, 2023, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 1174402440 bytes
Fixed Size 8938888 bytes
Variable Size 369098752 bytes
Database Buffers 788529152 bytes
Redo Buffers 7835648 bytes
Database mounted.
Database opened.
复制
可以看到CJC磁盘组的数据
SQL> select * from cjc.t1;
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
复制
参考:微信公众号"数据库运维之道"文章:【应知应会】使用kfed运维兵器修复ASM磁盘和磁盘组
【应知应会】使用kfed运维兵器修复ASM磁盘和磁盘组
数据库运维之道,公众号:数据库运维之道【应知应会】使用kfed运维兵器修复ASM磁盘和磁盘组
###chenjuchao 20240406###