HDFS 磁盘写及balance

2021-11-09  本文已影响0人  halfempty

1. HDFS写策略

第一复本写本地, 第二复本写其他机架, 第三复本写其他机架的不同节点
目的: 尽可能地容灾, 不仅防止单台机器宕机, 也防止整个机架异常; 同时保证写的速度 (本地更快)

1.1. 本地偏好配置

dfs.namenode.block-placement-policy.default.prefer-local-node
默认为true, 当存在本地put操作时, 优先选择本机, 最终结果是本机datanode存储使用率高

Controls how the default block placement policy places the first replica of a block. When true, it will prefer the node where the client is running. When false, it will prefer a node in the same rack as the client. Setting to false avoids situations where entire copies of large files end up on a single node, thus creating hotspots.

2. 磁盘卷组选择

配置项: dfs.datanode.fsdataset.volume.choosing.policy

2.1 Round-Robin策略

本质就是轮询


image.png

2.2. Available Space策略

3. Balancer命令

# hdfs balancer --help
Usage: hdfs balancer
    [-policy <policy>]  the balancing policy: datanode or blockpool
    [-threshold <threshold>]    Percentage of disk capacity
    [-exclude [-f <hosts-file> | <comma-separated list of hosts>]]  Excludes the specified datanodes.
    [-include [-f <hosts-file> | <comma-separated list of hosts>]]  Includes only the specified datanodes.
    [-source [-f <hosts-file> | <comma-separated list of hosts>]]   Pick only the specified datanodes as source nodes.
    [-idleiterations <idleiterations>]  Number of consecutive idle iterations (-1 for Infinite) before exit.
    [-runDuringUpgrade] Whether to run the balancer during an ongoing HDFS upgrade.This is usually not desired since it will not affect used space on over-utilized machines.

3.1. BALANCER原理

计算每个DataNode节点磁盘使用率, 并结合集群平均使用率v1, 以及配置项threshold, 将DataNode划分为四个等级

HDFS集群的平均使用率= sum(DFS Used) * 100 / sum(Capacity)

image.png

相关参数

上一篇下一篇

猜你喜欢

热点阅读