如何获取HDFS上文件的存储位置

2019-02-27  本文已影响9人  润土1030

我们知道存储在HDFS上的文件一般有多个副本,默认是3个,访问这个文件是通过一个URL来的,但是这个文件到底存储在哪个DataNode节点的什么位置,这是很多人不清楚的。其实HDFS提供了一个命令,接下来我们就看看这个问题。

hdfs fsck命令

HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as <tt>bin/hdfs fsck</tt>. For command usage, see fsck. fsck can be run on the whole file system or on a subset of files.

命令使用方式
hdfs fsck file_path_on_hdfs -files -blocks -locations
执行命令查看我们的文件
[hdfs@dlbdn3 data]$ hdfs fsck /user/ericsson/eop/template_workflow.xml -files -blocks -locations
Connecting to namenode via http://dlbdn3:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.123.4 for path /user/ericsson/eop/template_workflow.xml at Wed Feb 27 17:28:57 CST 2019
/user/ericsson/eop/template_workflow.xml 3685 bytes, 1 block(s):  OK
0. BP-358999289-192.168.123.4-1530520401469:blk_1074308735_568435 len=3685 Live_repl=3 [DatanodeInfoWithStorage[192.168.123.4:7710,DS-c440ebd2-4553-4b87-b2e1-67a8ae1e29c1,DISK], DatanodeInfoWithStorage[192.168.123.3:7710,DS-4c6c7796-0027-4cb9-a476-041a13146dcf,DISK], DatanodeInfoWithStorage[192.168.123.2:7710,DS-83c58757-f199-48e1-9d04-bd09fc996fbc,DISK]]

Status: HEALTHY
 Total size:    3685 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  1 (avg. block size 3685 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      3
 Number of racks:       1
FSCK ended at Wed Feb 27 17:28:57 CST 2019 in 1 milliseconds


The filesystem under path '/user/ericsson/eop/template_workflow.xml' is HEALTHY

根据DatanodeInfoWithStorage里面提供的ip信息,进去对应节点, 执行find
[root@dlbdn3 subdir166]# find / -name "*blk_1074308735_568435*"
find: ‘/run/user/42/gvfs’: Permission denied
/data/2/dfs/dn/current/BP-358999289-192.168.123.4-1530520401469/current/finalized/subdir8/subdir166/blk_1074308735_568435.meta
[root@dlbdn3 subdir166]# cd /data/2/dfs/dn/current/BP-358999289-192.168.123.4-1530520401469/current/finalized/subdir8/subdir166
[root@dlbdn3 subdir166]# ll | grep blk_1074308735
-rw-r--r-- 1 hdfs hdfs   3685 Feb 27 16:29 blk_1074308735
-rw-r--r-- 1 hdfs hdfs     39 Feb 27 16:29 blk_1074308735_568435.meta
[root@dlbdn3 subdir166]# 

查看blk文件的内容,是否是我们想要找的文件
image.png
image.png
确认是一个文件,至此就找到了HDFS文件上存储的信息。很简单吧,也很实用,很多时候需要知道这个信息。
上一篇下一篇

猜你喜欢

热点阅读