Hadoop分布式计算原理

2017-04-26 本文已影响74人 Ariel_Tian

hdfs原始数据：
hello a
hello b

map阶段[映射成键值对]：
输入数据：
<0,"hello a">
<8,"hello b">

输出数据：
    map(key,value,context) {
        String line = value;    //hello a
        String[] words  = value.split("\t");
        for(String word : words) {
            //hello
            // a
            // hello 
            // b
            context.write(word,1);
        }
    }
<hello,1>
<a,1>
<hello,1>
<b,1>

reduce阶段（分组排序）：
输入数据：
<a,1>
<b,1>
<hello,{1,1}>

输出数据：
    reduce(key,value,context) {
        int sum = 0;
        String word = key;
        for(int i : value) {
            sum += i;
        }
        context.write(word,sum);
    }

Hadoop分布式计算原理

猜你喜欢

热点阅读