大数据系统编程Tip

2018-05-16 本文已影响0人 TechGraver

大数据系统编程中，总会有一些意想不到的地方，开个帖子记录📝一下。

hadoop MapReduce

编程中主函数各语句的用法：

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        Configuration conf = new Configuration();
//创建Hadoop conf对象，，其构造方法会默认加载hadoop中的两个配置文件，分别是hdfs-site.xml以及core-site.xml，这两个文件中会有访问hdfs所需的参数值，主要是fs.default.name，指定了hdfs的地址，有了这个地址客户端就可以通过这个地址访问hdfs了。即可理解为configuration就是hadoop中的配置信息。    
    
   
        String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
//GenericOptionsParser是hadoop框架中解析命令行参数的基本类。它能够辨别一些标准的命令行参数，能够使应用程序轻易地指定namenode，jobtracker，以及其他额外的配置资源。
       
    
        if(otherArgs.length !=2){
            System.out.println("Usage: Hw2Part1 <in> <out>");
            System.exit(2);
        }

        //Job job = new Job(conf,"Hw2Part1");
        conf.set(" mapred.textoutputformat.separator", " ");
//MapReduce默认的key-value的分隔符为tab，这样输出过程中会导致格式不规律，即key1 key2 tab value1 value2... 可以通过这个语句，设定最后输出时，key value之间的分隔符为空格 
    
    
        Job job = Job.getInstance(conf,"Hw2Part1");
    
        job.setJarByClass(Hw2Part1.class);
        job.setMapperClass(SourceMapper.class);
        job.setReducerClass(TimeReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);


        //FileInputFormat.addInputPath(job,new Path(otherArgs[0]));
        //FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));

        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // add the output path as given by the command line
        FileOutputFormat.setOutputPath(job,
        new Path(otherArgs[otherArgs.length - 1]));
        //对输入路径和输出路径进行处理
        
        System.exit(job.waitForCompletion(true) ? 0 : 1);
        //此句是对job进行提交，一般情况下我们提交一个job都是通过job.waitForCompletion方法提交，该方法内部会调用job.submit()方法


    }

参考

http://www.cnblogs.com/caoyuanzhanlang/archive/2013/02/21/2920934.html

MapReduce编程中引用包的问题

import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

mapred代表的是hadoop旧API，而mapreduce代表的是hadoop新的API

对于mapred来说，FileInputFormat.setInputPaths(jobConf, in);第一个参数是jobConf

对于mapreduce来说，FileInputFormat.setInputPaths(jobConf, in);第一个参数是job

大数据系统编程Tip

hadoop MapReduce

猜你喜欢

热点阅读