HDFS du和fsck统计同一目录空间占用结果不同的问题

2023-10-07 本文已影响0人 AlienPaul

问题现象

针对同一目录，例如/user/abc，分别使用hdfs fsck /user/abc和hdfs dfs -du -s -h查看磁盘空间占用情况。两种方式统计出的磁盘空间占用差距很大。通常来说这两个命令的统计结果应该是相同的。

原因分析

hdfs fsck命令只统计目录中当前版本的用户数据，hdfs dfs -du命令除了这些之外，默认还会统计上目录中所有的快照（snapshot）占用的空间。

要查询快照的位置，可以切换HDFS管理员用户（启动HDFS服务的用户），使用hdfs lsSnapshottableDir查看启用了快照的目录。

如果目录确实是可快照的，可以使用hdfs dfs -du -x 目标目录命令统计目录磁盘占用，-x参数忽略目录中快照占用的磁盘空间。参见Apache Hadoop 3.3.6 – Overview。如果使用的Hadoop版本du命令不支持-x参数，使用hdfs dfs -du -s -h 目标目录/.snapshot/统计出目录中快照占用的磁盘空间。总空间减去快照占用的空间即最新版本数据占用的空间，应该和hdfs fsck统计结果一致。

fsck命令也提供了-includeSnapshots参数，可以在统计的时候考虑到snapshot信息。

处理方式

如果使用场景是检查数据迁移/备份前后数据大小是否一致，建议使用相同的命令，例如hdfs fsck命令。

如果是其他场景，可考虑是否有必要保留快照。快照相关操作可参考：Hadoop生态圈（十六）- HDFS Snapshot快照详解-CSDN博客。

源代码分析

du命令

我们从FsShell::run方法开始跟踪du命令的调用入口。run方法代码如下所示，无关代码已经省略：

@Override
public int run(String[] argv) {
    // initialize FsShell
    // 初始化
    init();
// ...
    if (argv.length < 1) {
        printUsage(System.err);
    } else {
        // 取出第一个参数作为命令
        // 例如我们执行du，那么cmd就是du
        String cmd = argv[0];
        Command instance = null;
        try {
            // 从commandFactory中找到命令对应的command
            instance = commandFactory.getInstance(cmd);
            if (instance == null) {
                throw new UnknownCommandException();
            }
           // ...
            try {
                // 然后执行
                exitCode = instance.run(Arrays.copyOfRange(argv, 1, argv.length));
            } finally {
                scope.close();
            }
        } catch (IllegalArgumentException e) {
            // ...
        } catch (Exception e) {
            // ...
        }
    }
    // ...
    return exitCode;
}

通过上面的分析可以得知hdfs dfs支持的命令位于commandFactory中。将Command加入到CommandFactory的方法为addClass，如下所示：

public void addClass(Class<? extends Command> cmdClass, String ... names) {
    for (String name : names) classMap.put(name, cmdClass);
}

跟踪这个方法的调用位置可以找到FsUsage中的registerCommands方法。

public static void registerCommands(CommandFactory factory) {
    factory.addClass(Df.class, "-df");
    factory.addClass(Du.class, "-du");
    factory.addClass(Dus.class, "-dus");
}

这个方法将Du加入到commandFactory。我们找到了du命令的处理类Du。

我们首先分析处理参数的方法processOptions，不难发现-x参数对应的是excludeSnapshots(排除snapshot)。

@Override
protected void processOptions(LinkedList<String> args) throws IOException {
    CommandFormat cf = new CommandFormat(0, Integer.MAX_VALUE, "h", "s", "v", "x");
    cf.parse(args);
    setHumanReadable(cf.getOpt("h"));
    summary = cf.getOpt("s");
    showHeaderLine = cf.getOpt("v");
    excludeSnapshots = cf.getOpt("x");
    if (args.isEmpty()) args.add(Path.CUR_DIR);
}

接下来我们看处理逻辑，位于processPath。可以得知如果使用了-x参数，统计结果会减去snapshot的统计结果。

@Override
protected void processPath(PathData item) throws IOException {
    // 获取内容统计
    ContentSummary contentSummary = item.fs.getContentSummary(item.path);
    // 获取文件个数
    long length = contentSummary.getLength();
    // 获取占用的空间
    long spaceConsumed = contentSummary.getSpaceConsumed();
    // 如果需要排除统计snapshot，分别减掉snapshot的文件数和空间占用
    if (excludeSnapshots) {
        length -= contentSummary.getSnapshotLength();
        spaceConsumed -= contentSummary.getSnapshotSpaceConsumed();
    }
    getUsagesTable().addRow(formatSize(length),
                            formatSize(spaceConsumed), item);
}

注意：排除统计snapshot的-x参数功能在HDFS-8986中引入。

fsck命令

调用入口位于DFSck的run方法。进一步跟踪可知它调用了doWork方法。这个方法拼装/fsck请求参数，然后向namenode发送。到这里可以得知fsck命令逻辑不在DFSck中，一定在处理/fsck请求的地方。

我们找到处理请求的类FsckServlet。查看它的doGet方法，无关代码已经省略。

public void doGet(HttpServletRequest request, HttpServletResponse response
                 ) throws IOException {
    @SuppressWarnings("unchecked")
    // ...
    try {
        ugi.doAs((PrivilegedExceptionAction<Object>) () -> {
            // ...
            NamenodeFsck fsck = new NamenodeFsck(conf, nn,
                                                 bm.getDatanodeManager().getNetworkTopology(), pmap, out,
                                                 totalDatanodes, remoteAddress);
            // ...
            boolean success = false;
            try {
                fsck.fsck();
                success = true;
            } finally {
                namesystem.logFsckEvent(success, auditSource, remoteAddress);
            }
            return null;
        });
    } catch (InterruptedException e) {
        response.sendError(400, e.getMessage());
    }
}

到这里发现它最终调用的是NamenodeFsck的fsck方法。这个方法非常长。我们只关心和统计大小相关的地方。

public void fsck() throws AccessControlException {
    final long startTime = Time.monotonicNow();
    String operationName = "fsck";
    try {
        // ...

        // 如果snapshottableDirs不为null，从namenode查询可快照的目录，加入到snapshottableDirs集合
        if (snapshottableDirs != null) {
            SnapshottableDirectoryStatus[] snapshotDirs =
                namenode.getRpcServer().getSnapshottableDirListing();
            if (snapshotDirs != null) {
                for (SnapshottableDirectoryStatus dir : snapshotDirs) {
                    snapshottableDirs.add(dir.getFullPath().toString());
                }
            }
        }

        final HdfsFileStatus file = namenode.getRpcServer().getFileInfo(path);
        if (file != null) {

            // ...

            Result replRes = new ReplicationResult(conf);
            Result ecRes = new ErasureCodingResult(conf);
            // 统计path下文件信息
            check(path, file, replRes, ecRes);

            // ...

        } else {
            // ...
        }
    } catch (Exception e) {
        // ...
    } finally {
        out.close();
    }
}

上面的关键在于snapshottableDirs变量。我们跟踪下它在哪里被初始化：

NamenodeFsck(Configuration conf, NameNode namenode,
             NetworkTopology networktopology,
             Map<String, String[]> pmap, PrintWriter out,
             int totalDatanodes, InetAddress remoteAddress) {
    // ...
    else if (key.equals("includeSnapshots")) {
        this.snapshottableDirs = new ArrayList<String>();
    }
    // ...
}

fsck命令支持includeSnapshots选项。如果使用这个选项，统计结果会包含snapshot的信息。

接着分析check方法：

@VisibleForTesting
void check(String parent, HdfsFileStatus file, Result replRes, Result ecRes)
    throws IOException {
    String path = file.getFullName(parent);
    if ((totalDirs + totalSymlinks + replRes.totalFiles + ecRes.totalFiles)
        % 1000 == 0) {
        out.println();
        out.flush();
    }
    // 如果fsck后的路径是一个目录
    // 调用checkDir方法
    if (file.isDirectory()) {
        checkDir(path, replRes, ecRes);
        return;
    }
    if (file.isSymlink()) {
        if (showFiles) {
            out.println(path + " <symlink>");
        }
        totalSymlinks++;
        return;
    }
    LocatedBlocks blocks = getBlockLocations(path, file);
    if (blocks == null) { // the file is deleted
        return;
    }

    final Result r = file.getErasureCodingPolicy() != null ? ecRes: replRes;
    // 统计文件信息
    collectFileSummary(path, file, r, blocks);
    // 统计块信息
    collectBlocksSummary(parent, file, r, blocks);
}

通过上面分析可以得知，如果fsck一个目录的话，会调用checkDir方法统计。继续分析checkDir。

private void checkDir(String path, Result replRes, Result ecRes) throws IOException {
    // 如果snapshottableDirs不为空并且path是可快照的目录
    // 统计path之下.snapshot目录的文件信息
    if (snapshottableDirs != null && snapshottableDirs.contains(path)) {
        String snapshotPath = (path.endsWith(Path.SEPARATOR) ? path : path
                               + Path.SEPARATOR)
            + HdfsConstants.DOT_SNAPSHOT_DIR;
        HdfsFileStatus snapshotFileInfo = namenode.getRpcServer().getFileInfo(
            snapshotPath);
        check(snapshotPath, snapshotFileInfo, replRes, ecRes);
    }
    byte[] lastReturnedName = HdfsFileStatus.EMPTY_NAME;
    DirectoryListing thisListing;
    if (showFiles) {
        out.println(path + " <dir>");
    }
    totalDirs++;
    do {
        assert lastReturnedName != null;
        // 递归列出path下文件
        // 注意这里只列出最新版本的文件，即不包含snapshot
        thisListing = namenode.getRpcServer().getListing(
            path, lastReturnedName, false);
        if (thisListing == null) {
            return;
        }
        HdfsFileStatus[] files = thisListing.getPartialListing();
        for (int i = 0; i < files.length; i++) {
            check(path, files[i], replRes, ecRes);
        }
        lastReturnedName = thisListing.getLastName();
    } while (thisListing.hasMore());
}

从上面分析中可知，如果允许统计快照信息，并且统计的目录正好是可快照的，该目录下.snapshot目录中的内容会被统计进来。

参考文献

Re: HDFS du and fsck command shows different stora... - Cloudera Community - 85144 (bingj.com)

Apache Hadoop 3.3.6 – Overview

Hadoop生态圈（十六）- HDFS Snapshot快照详解-CSDN博客