java 判断hdfs文件的文件类型

2018-11-09 本文已影响0人别情书

因为java读取不同的文件类型（text，sequence，orc等）会使用不用的api去读取数据，所以还做了文件类型的判断，这里只做了text和sequence的判断，代码如下（该代码也是HdfsUtil.class中的一个方法，参见hadoop kerberos认证）：

/**
 * 根据文件路径返回文件类型
 *
 *@param filePath 
 *@return FileTypeEnum
 *@author lshua
 *@throws IOException 
 *@date 2018年6月28日
 */
public static FileTypeEnum getFileType(String filePath) throws IOException {
    FileSystem fs = null;
    FSDataInputStream fin = null;
    try {
        fs = FileSystem.get(conf);
        fin = fs.open(new Path(filePath));
    } catch (IOException e) {
        e.printStackTrace();
    }
    
    short leadBytes = 0;
    try {
        leadBytes = fin.readShort();
    } catch (EOFException e) {
        fin.seek(0);
        e.printStackTrace();
    }
    
    FileTypeEnum fileType = null;
    
    /**
     * 只做两种类型判断，sequence和text
     */
    switch (leadBytes) { 
    // 'S'  'E'
    case 0x5345: 
        if(fin.readByte() == 'Q') {
            fileType = FileTypeEnum.SEQUENCE;
        } else {
            fileType = FileTypeEnum.TEXT;
        }
        break;
    default:
        fileType = FileTypeEnum.TEXT;
        break;
    }
    // 关闭流
    IOUtils.closeStream(fin);
    
    return fileType;
}

其中sequence文件类型的判断是参考org.apache.hadoop.fs.shell.Display.Text的源码，可自行查看。然后其中还涉及了一个文件类型的枚举类也列出来：

/**
 *@author lshua
 *@time 2018年6月28日
 *@description: hdfs上的文件类型枚举，只枚举了准备处理的。
 */

public enum FileTypeEnum {

    SEQUENCE("sequence", 1),
    TEXT("text", 2),
    ORC("orc", 3),
    UNKNOW("unknow", 0);
    
    private String typeName;
    private int index;
    
    private FileTypeEnum(String typeName, int index) {
        this.typeName = typeName;
        this.index = index;
    }

    public String getTypeName() {
        return typeName;
    }

    public void setTypeName(String typeName) {
        this.typeName = typeName;
    }

    public int getIndex() {
        return index;
    }

    public void setIndex(int index) {
        this.index = index;
    }
    
}

其实org.apache.hadoop.fs.shell.Display.Text中读文件的时候是有对类型进行判断，然后返回数据流的，应该可以不用自己去搞文件枚举这些，直接用他的方法获取数据流读就好了，有空研究下再更新。

java 判断hdfs文件的文件类型

猜你喜欢

热点阅读