IDEA运行MapReduce的wordcount案例遇到的几个

2019-02-23  本文已影响0人  白面葫芦娃92

1、仿照官网的实例,改造为Scala代码

import java.util.StringTokenizer
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{IntWritable, Text}
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
import org.apache.hadoop.mapreduce.{Job, Mapper, Reducer}

object WordCount {
  def main(args: Array[String]): Unit = {
    val conf = new Configuration()
    val job = Job.getInstance(conf,"word count")
    job.setJarByClass(WordCount.getClass)
    job.setMapperClass(classOf[TokenizerMapper])
    job.setCombinerClass(classOf[IntSumReducer])
    job.setReducerClass(classOf[IntSumReducer])
    job.setOutputKeyClass(classOf[Text])
    job.setOutputValueClass(classOf[IntWritable])
    FileInputFormat.addInputPath(job,new Path("/data/ruozeinput.txt"))
    FileOutputFormat.setOutputPath(job, new Path("/out/WCoutput"))
    System.exit(
      if (job.waitForCompletion(true)) 0 else 1
    )
  }

  class TokenizerMapper extends Mapper [Object,Text,Text,IntWritable] {
    val one = new IntWritable(1)
    val word = new Text()
    def map(key:Object,value:Text,context:Context): Unit = {
      val itr = new StringTokenizer(value.toString)
      while (itr.hasMoreTokens) {
        word.set(itr.nextToken())
        context.write(word,one)
      }
    }
  }

  class IntSumReducer extends Reducer[Text,IntWritable,Text,IntWritable] {
    val result = new IntWritable()
    def reduce (key:Text,values:Iterable[IntWritable],context:Context): Unit = {
      var sum = 0
      for (valu <- values) {
        sum += valu.get()
      }
      result.set(sum)
      context.write(key,result)
    }
  }
}

2、确认虚拟机已启动hdfs和yarn,IDEA的Resources文件夹中有core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml这几个配置文件,运行代码,发现报错ResourceManager连接不上

19/01/20 17:57:33 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/20 17:57:36 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/01/20 17:57:38 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/01/20 17:57:40 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
.......

3、尝试修改yarn-site.xml配置
原配置为:

<configuration>
 <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
</property>
</configuration>

现修改为:

<configuration>
 <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
</property>
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop001</value>
</property>                  
<property>
        <name>yarn.resourcemanager.address</name>
        <value>hadoop001:8032</value>
</property>
<property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>hadoop001:8030</value>
</property>
<property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>hadoop001:8031</value>
</property>
</configuration>

4、尝试运行代码,发现依然连接不上,此时再jps,发现yarn挂掉了,重启yarn之后,第一次jps查看时有resourcemanager进程,稍过一会再jps发现resourcemanager又挂了
进入$HADOOP_HOME/logs目录查看该进程日志:

/************************************************************
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG:   host = hadoop001/192.168.137.141
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.6.0-cdh5.7.0
STARTUP_MSG:   classpath = ......skipping...
2019-01-20 17:08:22,641 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Update thread interrupted. Exiting.
2019-01-20 17:08:22,658 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2019-01-20 17:08:22,649 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted
2019-01-20 17:08:22,650 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
2019-01-20 17:08:22,650 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
2019-01-20 17:08:22,674 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
2019-01-20 17:08:22,676 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:278)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:990)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1090)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1222)
Caused by: java.net.BindException: Port in use: 0.0.0.0:8088
        at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:951)
        at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:887)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:273)
        ... 4 more
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:437)
        at sun.nio.ch.Net.bind(Net.java:429)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
        at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:946)
        ... 6 more
2019-01-20 17:08:22,696 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at hadoop001/192.168.137.141
************************************************************/

参考文章:https://blog.csdn.net/caiandyong/article/details/50913268
发觉虚拟机内的yarn-site.xml文件依然是老配置,并没有修改,于是将虚拟机内yarn-site.xml的配置改为和IDEA的Resources文件夹中的yarn-site.xml一致,重启yarn,resourcemanager不再自动关闭了
5、再次尝试运行代码,resourcemanager连接成功,但是job失败了

19/01/20 18:11:17 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
19/01/20 18:11:22 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/20 18:11:22 WARN JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
19/01/20 18:11:22 INFO FileInputFormat: Total input paths to process : 1
19/01/20 18:11:23 INFO JobSubmitter: number of splits:1
19/01/20 18:11:24 INFO JobSubmitter: Submitting tokens for job: job_1547976418140_0004
19/01/20 18:11:24 INFO YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
19/01/20 18:11:24 INFO YarnClientImpl: Submitted application application_1547976418140_0004
19/01/20 18:11:24 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547976418140_0004/
19/01/20 18:11:24 INFO Job: Running job: job_1547976418140_0004
19/01/20 18:11:29 INFO Job: Job job_1547976418140_0004 running in uber mode : false
19/01/20 18:11:29 INFO Job:  map 0% reduce 0%
19/01/20 18:11:29 INFO Job: Job job_1547976418140_0004 failed with state FAILED due to: Application application_1547976418140_0004 failed 2 times due to AM Container for appattempt_1547976418140_0004_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://hadoop001:8088/proxy/application_1547976418140_0004/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1547976418140_0004_02_000001
Exit code: 1
Exception message: /bin/bash: line 0: fg: no job control

Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
    at org.apache.hadoop.util.Shell.run(Shell.java:478)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
19/01/20 18:11:29 INFO Job: Counters: 0

加conf.set("mapreduce.app-submission.cross-platform", "true"),再次运行

19/01/20 23:45:30 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
19/01/20 23:45:31 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/20 23:45:31 WARN JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
19/01/20 23:45:31 INFO FileInputFormat: Total input paths to process : 1
19/01/20 23:45:31 INFO JobSubmitter: number of splits:1
19/01/20 23:45:31 INFO JobSubmitter: Submitting tokens for job: job_1547989123887_0006
19/01/20 23:45:32 INFO YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
19/01/20 23:45:32 INFO YarnClientImpl: Submitted application application_1547989123887_0006
19/01/20 23:45:32 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547989123887_0006/
19/01/20 23:45:32 INFO Job: Running job: job_1547989123887_0006
19/01/20 23:45:56 INFO Job: Job job_1547989123887_0006 running in uber mode : false
19/01/20 23:45:56 INFO Job:  map 0% reduce 0%
19/01/20 23:46:10 INFO Job: Task Id : attempt_1547989123887_0006_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
    ... 8 more

19/01/20 23:46:22 INFO Job: Task Id : attempt_1547989123887_0006_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
    ... 8 more

错误提示找不到WordCountjava$TokenizerMapper这个类,于是,把这个类打成jar包,加入依赖中
19/01/20 21:05:04 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
19/01/20 21:05:05 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/20 21:05:07 INFO FileInputFormat: Total input paths to process : 1
19/01/20 21:05:07 INFO JobSubmitter: number of splits:1
19/01/20 21:05:08 INFO JobSubmitter: Submitting tokens for job: job_1547989123887_0001
19/01/20 21:05:09 INFO YarnClientImpl: Submitted application application_1547989123887_0001
19/01/20 21:05:09 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547989123887_0001/
19/01/20 21:05:09 INFO Job: Running job: job_1547989123887_0001
19/01/20 21:05:40 INFO Job: Job job_1547989123887_0001 running in uber mode : false
19/01/20 21:05:40 INFO Job:  map 0% reduce 0%
19/01/20 21:05:57 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: scala.Function1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
    at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

19/01/20 21:06:08 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_1, Status : FAILED
Error: java.lang.ClassNotFoundException: scala.Function1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
    at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

19/01/20 21:06:26 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_2, Status : FAILED
Error: java.lang.ClassNotFoundException: scala.Function1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
    at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

19/01/20 21:07:03 INFO Job:  map 100% reduce 100%
19/01/20 21:07:04 INFO Job: Job job_1547989123887_0001 failed with state FAILED due to: Task failed task_1547989123887_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

19/01/20 21:07:04 INFO Job: Counters: 12
    Job Counters 
        Failed map tasks=4
        Launched map tasks=4
        Other local map tasks=3
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=73744
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=73744
        Total vcore-seconds taken by all map tasks=73744
        Total megabyte-seconds taken by all map tasks=75513856
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0

Process finished with exit code 1

根据此错误,推测应该是scala中有些函数无法运行(此问题暂时不知如何解决=_=!),下面用官网实例的java代码执行一下

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCountjava {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable>{

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        conf.set("mapreduce.app-submission.cross-platform", "true");
        Job job = Job.getInstance(conf, "word count java");
        job.setJarByClass(WordCountjava.class);
//        job.setJar("hdfs://hadoop001:9000/lib/hadoop-mapreduce.tar.gz");
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
//        FileInputFormat.addInputPath(job, new Path(args[0]));
//        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        FileInputFormat.addInputPath(job, new Path("hdfs://hadoop001:9000/data/input"));
        FileOutputFormat.setOutputPath(job, new Path("hdfs://hadoop001:9000/out/WCoutput"));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

执行此代码,运行成功

19/01/25 21:07:25 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
19/01/25 21:07:26 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/25 21:07:27 INFO FileInputFormat: Total input paths to process : 1
19/01/25 21:07:27 INFO JobSubmitter: number of splits:1
19/01/25 21:07:27 INFO JobSubmitter: Submitting tokens for job: job_1548417790455_0008
19/01/25 21:07:27 INFO YarnClientImpl: Submitted application application_1548417790455_0008
19/01/25 21:07:27 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1548417790455_0008/
19/01/25 21:07:27 INFO Job: Running job: job_1548417790455_0008
19/01/25 21:07:49 INFO Job: Job job_1548417790455_0008 running in uber mode : false
19/01/25 21:07:49 INFO Job:  map 0% reduce 0%
19/01/25 21:08:05 INFO Job:  map 100% reduce 0%
19/01/25 21:08:21 INFO Job:  map 100% reduce 100%
19/01/25 21:08:22 INFO Job: Job job_1548417790455_0008 completed successfully
19/01/25 21:08:22 INFO Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=44
        FILE: Number of bytes written=222537
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=156
        HDFS: Number of bytes written=26
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=13929
        Total time spent by all reduces in occupied slots (ms)=13266
        Total time spent by all map tasks (ms)=13929
        Total time spent by all reduce tasks (ms)=13266
        Total vcore-seconds taken by all map tasks=13929
        Total vcore-seconds taken by all reduce tasks=13266
        Total megabyte-seconds taken by all map tasks=14263296
        Total megabyte-seconds taken by all reduce tasks=13584384
    Map-Reduce Framework
        Map input records=3
        Map output records=7
        Map output bytes=72
        Map output materialized bytes=44
        Input split bytes=112
        Combine input records=7
        Combine output records=3
        Reduce input groups=3
        Reduce shuffle bytes=44
        Reduce input records=3
        Reduce output records=3
        Spilled Records=6
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=255
        CPU time spent (ms)=4470
        Physical memory (bytes) snapshot=403714048
        Virtual memory (bytes) snapshot=5511376896
        Total committed heap usage (bytes)=282066944
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=44
    File Output Format Counters 
        Bytes Written=26

查看目标文件夹下输出的文件

[hadoop@hadoop001 lib]$ hadoop fs -ls /out/WCoutput               
Found 2 items
-rw-r--r--   1 zh supergroup          0 2019-01-25 21:08 /out/WCoutput/_SUCCESS
-rw-r--r--   1 zh supergroup         26 2019-01-25 21:08 /out/WCoutput/part-r-00000
[hadoop@hadoop001 lib]$ hadoop fs -text /out/WCoutput/part-r-00000
hello   4
welcome 1
world   2

至于scala代码为什么运行不出来,请各位大神解答,多谢!

上一篇下一篇

猜你喜欢

热点阅读