大数据系列(二):HDFS(Hadoop分布式文件系统)(一)

2015-11-26  本文已影响0人  Carlin_entheos

HDFS设计

HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware.

HDFS不适用的情况<p>

HDFS概念<p>

数据块(block)<p>

数据块(block):是最小可读写数据的数量(is the minimum amount of data that it can read or write.)HDFS也拥有数据块,默认值为64MB。HDFS上的文件也被划分为块大小(block-sized)的分块(chunks),作为独立的储存单元。

为什么HDFS的一个数据块如此大(Why Is a Block in HDFS So Large?)<p>
优势<p>

Namenode and datanode <p>

一个HDFS集群有两类节点,以管理-工作形式运行,即一个namenode(管理者)和多个datanode(工作者)(An HDFS cluster has two types of node operating in a master-worker pattern: a namenode (the master) and a number of datanodes (workers).)

Namenode
Datanodes <p>

Datanodes store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of that they are storing.(Datanodes存储并检索数据块(受客户端和namenode调度),并定期向namenode发送它储存的块的列表)

HDFS Federation <p>

The namenode keeps a reference to every file and block in the filesystem in memory, which means that on very large clusters with many files, memory becomes the limiting factor for scaling. HDFS Federation allows a cluster to scale by adding namenodes, each of which manages a portion of the filesystem namespace. (namenode和内存文件系统中每个文件和数据块保持关联,这意味着在多文件的大集群上,内存会限制规模化的增长。HDFS Federation 允许一个集群通过增加namenodes实现规模化,每一个namenode管理命名空间文件系统的一部分)

Under federation, each namenode manages a namespace volume, which is made up of the metadata for the namespace, and a block pool containing all the blocks for the files in the namespace(在Federation体制下,每个namenode管理一个在命名空间组成元数据的namespace volume和一个包括命名空间文件所有数据块的block pool)

HDFS High-Availability <p>

The combination of replicating namenode metadata on multiple filesystems, and using the secondary namenode to create checkpoints protects against data loss, but does not provide high-availability of the filesystem. The namenode is still a single point of failure (SPOF).(在多个文件系统上复制namenode元数据并使用第二namenode来建立检查点可以防止数据丢失,但并没有提供高有效性。namenode依旧是SPOF)

To support HDFS High-Availability, there is a pair of namenodes in an active standby configuration. In the event of the failure of the active namenode, the standby takes over its duties to continue servicing client requests without a significant interruption. (为了支持HA, 会有备用的namenodes待命。如果活动的namenode失效,这些备用可以继续工作避免明显的中断)

上一篇下一篇

猜你喜欢

热点阅读