6.824 Note1: MapReduce (2004)

2017-09-12  本文已影响65人  梦工厂

一:问题背景

很多计算任务涉及到海量数据的处理,想要在可以接受的时间内完成计算任务,就必须将这些任务分布到成百上千的机器上运行。

如何分发数据,任务调度,处理容错,这些问题需要大量的代码来处理。

因此实现一个分布式的任务需要处理任务本身的代码+实现分布式的大量额外代码;

为了解决以上问题,MapReduce应运而生。

MapReduce是一个编程模型,隐藏了关于并行计算、容错、数据分布、负载均衡这些细节。

即:用户只用表述想要执行的简单操作,MapReduce可以负责实现自动的并行化和分布式计算任务;

二:编程模型

MapReduce的用户将任务划分为两个计算操作Map() 和Reduce() 。

统计单词出现次数的示例:

map(String key, String value):
    // key: document name
    // value: document contents
    for each word w in value:
    EmitIntermediate(w, "1");

reduce(String key, Iterator values):
    // key: a word
    // values: a list of counts
    int result = 0;
    for each v in values:
        result += ParseInt(v);
    Emit(AsString(result));

三:实现

3.1 执行过程
MapReduce 模型MapReduce 模型
  1. The MapReduce library in the user program firstsplits the input files into M pieces of typically 16megabytes to 64 megabytes (MB) per piece (con-trollable by the user via an optional parameter). It then starts up many copies of the program on a cluster of machines.

  2. One of the copies of the program is special – the master. The rest are workers that are assigned work by the master. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a map task or a reduce task.

    Task:M+N > Worker

  3. A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory.

    Map阶段:读取文件内容,调用map()函数,写入中间文件;

  4. Periodically, the buffered pairs are written to localdisk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers.

    Map任务成功,返回中间文件的位置信息;

  5. When a reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of the map workers. When a reduce worker has read all in-termediate data, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. The sorting is needed because typically many different keys map to the same reduce task. If the amount of intermediate data is too large to fit inmemory, an external sort is used

    Reduce阶段:获取key region的所有中间文件内容,排序生成key-values集合,调用reduce()函数,写入输出文件;

  6. The reduce worker iterates over the sorted intermediate data and for each unique intermediate key en-countered, it passes the key and the corresponding set of intermediate values to the user’s Reduce function. The output of the Reduce function is appended to a final output file for this reduce partition.

  7. When all map tasks and reduce tasks have been completed, the master wakes up the user program. At this point, the MapReduce call in the user pro- gram returns back to the user code.

3.2 Master数据结构

Master存储每一个Map任务和Reduce任务的状态:空闲、工作、完成;以及非空闲任务的worker的机器标示;

Master存储中间文件的位置信息,因此Map任务完成时,对应的中间文件位置信息也会更新,最终传递给Reduce任务;

3.3 Fault Tolerance
3.4 其他

四:总结

MapReduce single-handedly made big cluster computation popular.


[2017.9 梦工厂]

上一篇下一篇

猜你喜欢

热点阅读