在HDFS中,提供了fsck命令,用于检查HDFS上文件和目录的健康状态(比如文件数据块损坏或者副本数目不够),获取文件的block信息和位置信息等。fs
ck命令必须由HDFS超级用户来执行,普通用户无权限。
HDFS健康的标准:如果所有的文件满足最小副本(dfs.replication.min)的要求,那么就认为文件系统是健康的。
(1)检查hdfs文件或目录的健康状况:hdfs fsck /
检查结果说明如下:Missing block表示所有副本都丢失了的block(namenode里保存有这个块与所属文件的映射关系,各datanode向namenode汇报的块
信息里都没有这个块或块损坏,都认为丢失)。Corruptblock表示namenode里保存的某block所有的副本的时间戳、块大小与datanode汇报的都不一致的
block,Corruptblock也被认为是Missing block(有一些不合理,容易混淆,已有大神在Hadoop hdfs的社区提了JIRA,给出了修复方案,可以参考:https:
)。 //issues.apache.org/jira/browse/HDFS-7281
每个小圆点表示一个检查为健康的文件,Under-replicated blocks是指datanode汇报的块少于指定的副本数,比如实验中183集群共3个datanode,上传
文件时指定4个副本,根据同一节点不放两个副本的原则,检查时就会显示Target Replicas is 4 but found 3 replica(s)。Over-replicated blocks是指
datanode汇报的块超过指定的副本数,这种情况,一般是由于网络故障,datanode上实际存在足够的副本,但汇报信息没有发送到namenode,认为副本丢
失,系统会进行复制,网络恢复,检查发现已超过指定副本数,系统会自动删除多余的副本块。Minimally replicated blocks表示满足最低复制标准(dfs.
replication.min)的block数,一般默认最低标准为1个副本,如果某个块一个正常的副本都没有,会被认为是UNDER MIN REPL'D BLOCKS(低于最少副本的
块)。
[root@tdh522-183 ~]# hdfs fsck /
2019-03-20 20:14:45,026 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
Connecting to namenode via http://tdh522-183:50070/fsck?ugi=admin&path=%2F
FSCK started by admin (auth:KERBEROS_SSL) from /192.168.31.183 for path / at Wed Mar 20 20:14:50 CST 2019
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
...................................................................................
/yarn1/user/hdfs/.staging/job_1551170298402_0003/job.jar: Under replicated BP-551875297-192.168.31.183-
1548149654359:blk_1073743218_2422. Target Replicas is 10 but found 3 replica(s).
.
/yarn1/user/hdfs/.staging/job_1551170298402_0003/job.split: Under replicated BP-551875297-192.168.31.183-
1548149654359:blk_1073743219_2423. Target Replicas is 10 but found 3 replica(s).
................
.......
/zyq2: Under replicated BP-551875297-192.168.31.183-1548149654359:blk_1073746223_5460. Target Replicas is 4
but found 3 replica(s).
.
/zyq3: Under replicated BP-551875297-192.168.31.183-1548149654359:blk_1073746224_5461. Target Replicas is 10
but found 3 replica(s).
....
/zyqtest: CORRUPT blockpool BP-551875297-192.168.31.183-1548149654359 block blk_1073746400
/zyqtest: MISSING 1 blocks of total size 9 B.Status: CORRUPT
Total size: 335407450 B (Total open files size: 332 B)
Total dirs: 567
Total files: 512
Total symlinks: 0 (Files currently being written: 4)
Total blocks (validated): 437 (avg. block size 767522 B) (Total open file blocks (not validated): 4)
********************************
UNDER MIN REPL'D BLOCKS: 1 (0.22883295 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 9 B
CORRUPT BLOCKS: 1
********************************
Minimally replicated blocks: 436 (99.771164 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 4 (0.9153318 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.9816933
Corrupt blocks: 1
Missing replicas: 22 (1.6591252 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Wed Mar 20 20:14:50 CST 2019 in 143 milliseconds
The filesystem under path '/' is CORRUPT
(2)查看文件中损坏的块hdfs fsck / -list-corruptfileblocks
[root@tdh522-183 ~]# hdfs fsck / -list-corruptfileblocks
2019-03-20 20:29:34,265 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
Connecting to namenode via http://tdh522-183:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The list of corrupt files under path '/' are:
blk_1073746400 /zyqtest
The filesystem under path '/' has 1 CORRUPT files
(3)将损坏的文件移动至/lost+found目录hdfs fsck / -move (这次竟然出现inner error,之前成功move过,待后续分析)
[root@tdh522-183 ~]# hdfs fsck / -move
2019-03-20 20:31:20,689 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
Connecting to namenode via http://tdh522-183:50070/fsck?ugi=hdfs&move=1&path=%2F
FSCK started by hdfs (auth:KERBEROS_SSL) from /192.168.31.183 for path / at Wed Mar 20 20:31:22 CST 2019
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
...................................................................................
/yarn1/user/hdfs/.staging/job_1551170298402_0003/job.jar: Under replicated BP-551875297-192.168.31.183-
1548149654359:blk_1073743218_2422. Target Replicas is 10 but found 3 replica(s).
.
/yarn1/user/hdfs/.staging/job_1551170298402_0003/job.split: Under replicated BP-551875297-192.168.31.183-
1548149654359:blk_1073743219_2423. Target Replicas is 10 but found 3 replica(s).
................
.......
/zyq2: Under replicated BP-551875297-192.168.31.183-1548149654359:blk_1073746223_5460. Target Replicas is 4 but
found 3 replica(s).
.
/zyq3: Under replicated BP-551875297-192.168.31.183-1548149654359:blk_1073746224_5461. Target Replicas is 10
but found 3 replica(s).
....
/zyqtest: CORRUPT blockpool BP-551875297-192.168.31.183-1548149654359 block blk_1073746400
/zyqtest: MISSING 1 blocks of total size 9 B.Status: CORRUPT
Total size: 335407450 B (Total open files size: 332 B)
Total dirs: 567
Total files: 512
Total symlinks: 0 (Files currently being written: 4)
Total blocks (validated): 437 (avg. block size 767522 B) (Total open file blocks (not validated): 4)
********************************
UNDER MIN REPL'D BLOCKS: 1 (0.22883295 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 9 B
CORRUPT BLOCKS: 1
********************************
Minimally replicated blocks: 436 (99.771164 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 4 (0.9153318 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.9816933
Corrupt blocks: 1
Missing replicas: 22 (1.6591252 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Wed Mar 20 20:31:22 CST 2019 in 246 milliseconds
FSCK ended at Wed Mar 20 20:31:22 CST 2019 in 246 milliseconds
fsck encountered internal errors!
Fsck on path '/' FAILED
[root@tdh522-183 ~]#
[root@tdh522-183 ~]# hdfs dfs -ls /lost+found
2019-03-20 20:37:59,423 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
Found 1 items
drw-r--r-- - hdfs hbase 0 2019-03-07 09:38 /lost+found/zyq1
(4)删除损坏的文件hdfs fsck / -delete
[root@tdh522-183 ~]# hdfs fsck / -delete
2019-03-20 20:47:02,468 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
Connecting to namenode via http://tdh522-183:50070/fsck?ugi=hdfs&delete=1&path=%2F
FSCK started by hdfs (auth:KERBEROS_SSL) from /192.168.31.183 for path / at Wed Mar 20 20:47:04 CST 2019
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
.......................................................................................
/yarn1/user/hdfs/.staging/job_1551170298402_0003/job.jar: Under replicated BP-551875297-192.168.31.183-
1548149654359:blk_1073743218_2422. Target Replicas is 10 but found 3 replica(s).
.
/yarn1/user/hdfs/.staging/job_1551170298402_0003/job.split: Under replicated BP-551875297-192.168.31.183-
1548149654359:blk_1073743219_2423. Target Replicas is 10 but found 3 replica(s).
............
...........
/zyq2: Under replicated BP-551875297-192.168.31.183-1548149654359:blk_1073746223_5460. Target Replicas is 4
but found 3 replica(s).
.
/zyq3: Under replicated BP-551875297-192.168.31.183-1548149654359:blk_1073746224_5461. Target Replicas is 10
but found 3 replica(s).
....
/zyqtest: CORRUPT blockpool BP-551875297-192.168.31.183-1548149654359 block blk_1073746400
/zyqtest: MISSING 1 blocks of total size 9 B.Status: CORRUPT
Total size: 335407814 B (Total open files size: 332 B)
Total dirs: 567
Total files: 516
Total symlinks: 0 (Files currently being written: 4)
Total blocks (validated): 441 (avg. block size 760561 B) (Total open file blocks (not validated): 4)
********************************
UNDER MIN REPL'D BLOCKS: 1 (0.22675736 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 9 B
CORRUPT BLOCKS: 1
********************************
Minimally replicated blocks: 440 (99.77324 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 4 (0.90702945 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.9818594
Corrupt blocks: 1
Missing replicas: 22 (1.6442451 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Wed Mar 20 20:47:04 CST 2019 in 237 milliseconds
The filesystem under path '/' is CORRUPT
,hdfs
[root@tdh522-183 ~]# hdfs fsck /
2019-03-20 20:50:18,139 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
Connecting to namenode via http://tdh522-183:50070/fsck?ugi=hdfs&path=%2F
FSCK started by hdfs (auth:KERBEROS_SSL) from /192.168.31.183 for path / at Wed Mar 20 20:50:19 CST 2019
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
.......................................................................................
/yarn1/user/hdfs/.staging/job_1551170298402_0003/job.jar: Under replicated BP-551875297-192.168.31.183-
1548149654359:blk_1073743218_2422. Target Replicas is 10 but found 3 replica(s).
.
/yarn1/user/hdfs/.staging/job_1551170298402_0003/job.split: Under replicated BP-551875297-192.168.31.183-
1548149654359:blk_1073743219_2423. Target Replicas is 10 but found 3 replica(s).
............
...........
/zyq2: Under replicated BP-551875297-192.168.31.183-1548149654359:blk_1073746223_5460. Target Replicas is 4
but found 3 replica(s).
.
/zyq3: Under replicated BP-551875297-192.168.31.183-1548149654359:blk_1073746224_5461. Target Replicas is 10
but found 3 replica(s).
...Status: HEALTHY
Total size: 335407805 B (Total open files size: 332 B)
Total dirs: 567
Total files: 515
Total symlinks: 0 (Files currently being written: 4)
Total blocks (validated): 440 (avg. block size 762290 B) (Total open file blocks (not validated): 4)
Minimally replicated blocks: 440 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 4 (0.90909094 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.9886363
Corrupt blocks: 0
Missing replicas: 22 (1.6454749 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Wed Mar 20 20:50:20 CST 2019 in 82 milliseconds
The filesystem under path '/' is HEALTHY
(5)打印文件块的位置信息hdfs fsck -files -blocks -locations
[root@tdh522-183 ~]# hdfs fsck /tmp/zyq -files -blocks -locations
2019-03-20 21:00:07,185 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
Connecting to namenode via http://tdh522-183:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Ftmp%2Fzyq
FSCK started by hdfs (auth:KERBEROS_SSL) from /192.168.31.183 for path /tmp/zyq at Wed Mar 20 21:00:09 CST 2019
/tmp/zyq <dir>
/tmp/zyq/_SUCCESS 0 bytes, 0 block(s): OK
/tmp/zyq/part-m-00000 655 bytes, 1 block(s): OK
0. BP-551875297-192.168.31.183-1548149654359:blk_1073748753_7999 len=655 repl=3 [DatanodeInfoWithStorage
[192.168.31.183:50010,DS-ac290cf6-ea15-4918-adf6-dfe1c018c65c,DISK], DatanodeInfoWithStorage[192.168.31.185:
50010,DS-eca4adcc-dddc-4723-bd6a-5b3e0b848a9c,DISK], DatanodeInfoWithStorage[192.168.31.184:50010,DS-15fcb7e9-
e0e8-41ab-b1d9-dcb23ad3f539,DISK]]
Status: HEALTHY
Total size: 655 B
Total dirs: 1
Total files: 2
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 655 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Wed Mar 20 21:00:09 CST 2019 in 0 milliseconds
The filesystem under path '/tmp/zyq' is HEALTHY