IDEA运行MapReduce的wordcount案例遇到的几个
2019-02-23 本文已影响0人
白面葫芦娃92
1、仿照官网的实例,改造为Scala代码
import java.util.StringTokenizer
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{IntWritable, Text}
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
import org.apache.hadoop.mapreduce.{Job, Mapper, Reducer}
object WordCount {
def main(args: Array[String]): Unit = {
val conf = new Configuration()
val job = Job.getInstance(conf,"word count")
job.setJarByClass(WordCount.getClass)
job.setMapperClass(classOf[TokenizerMapper])
job.setCombinerClass(classOf[IntSumReducer])
job.setReducerClass(classOf[IntSumReducer])
job.setOutputKeyClass(classOf[Text])
job.setOutputValueClass(classOf[IntWritable])
FileInputFormat.addInputPath(job,new Path("/data/ruozeinput.txt"))
FileOutputFormat.setOutputPath(job, new Path("/out/WCoutput"))
System.exit(
if (job.waitForCompletion(true)) 0 else 1
)
}
class TokenizerMapper extends Mapper [Object,Text,Text,IntWritable] {
val one = new IntWritable(1)
val word = new Text()
def map(key:Object,value:Text,context:Context): Unit = {
val itr = new StringTokenizer(value.toString)
while (itr.hasMoreTokens) {
word.set(itr.nextToken())
context.write(word,one)
}
}
}
class IntSumReducer extends Reducer[Text,IntWritable,Text,IntWritable] {
val result = new IntWritable()
def reduce (key:Text,values:Iterable[IntWritable],context:Context): Unit = {
var sum = 0
for (valu <- values) {
sum += valu.get()
}
result.set(sum)
context.write(key,result)
}
}
}
2、确认虚拟机已启动hdfs和yarn,IDEA的Resources文件夹中有core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml这几个配置文件,运行代码,发现报错ResourceManager连接不上
19/01/20 17:57:33 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/20 17:57:36 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/01/20 17:57:38 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
19/01/20 17:57:40 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
.......
3、尝试修改yarn-site.xml配置
原配置为:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
现修改为:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop001</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop001:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop001:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop001:8031</value>
</property>
</configuration>
4、尝试运行代码,发现依然连接不上,此时再jps,发现yarn挂掉了,重启yarn之后,第一次jps查看时有resourcemanager进程,稍过一会再jps发现resourcemanager又挂了
进入$HADOOP_HOME/logs目录查看该进程日志:
/************************************************************
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG: host = hadoop001/192.168.137.141
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.6.0-cdh5.7.0
STARTUP_MSG: classpath = ......skipping...
2019-01-20 17:08:22,641 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Update thread interrupted. Exiting.
2019-01-20 17:08:22,658 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2019-01-20 17:08:22,649 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted
2019-01-20 17:08:22,650 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
2019-01-20 17:08:22,650 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
2019-01-20 17:08:22,674 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
2019-01-20 17:08:22,676 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:278)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:990)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1090)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1222)
Caused by: java.net.BindException: Port in use: 0.0.0.0:8088
at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:951)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:887)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:273)
... 4 more
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:437)
at sun.nio.ch.Net.bind(Net.java:429)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:946)
... 6 more
2019-01-20 17:08:22,696 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at hadoop001/192.168.137.141
************************************************************/
参考文章:https://blog.csdn.net/caiandyong/article/details/50913268
发觉虚拟机内的yarn-site.xml文件依然是老配置,并没有修改,于是将虚拟机内yarn-site.xml的配置改为和IDEA的Resources文件夹中的yarn-site.xml一致,重启yarn,resourcemanager不再自动关闭了
5、再次尝试运行代码,resourcemanager连接成功,但是job失败了
19/01/20 18:11:17 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
19/01/20 18:11:22 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/20 18:11:22 WARN JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
19/01/20 18:11:22 INFO FileInputFormat: Total input paths to process : 1
19/01/20 18:11:23 INFO JobSubmitter: number of splits:1
19/01/20 18:11:24 INFO JobSubmitter: Submitting tokens for job: job_1547976418140_0004
19/01/20 18:11:24 INFO YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
19/01/20 18:11:24 INFO YarnClientImpl: Submitted application application_1547976418140_0004
19/01/20 18:11:24 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547976418140_0004/
19/01/20 18:11:24 INFO Job: Running job: job_1547976418140_0004
19/01/20 18:11:29 INFO Job: Job job_1547976418140_0004 running in uber mode : false
19/01/20 18:11:29 INFO Job: map 0% reduce 0%
19/01/20 18:11:29 INFO Job: Job job_1547976418140_0004 failed with state FAILED due to: Application application_1547976418140_0004 failed 2 times due to AM Container for appattempt_1547976418140_0004_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://hadoop001:8088/proxy/application_1547976418140_0004/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1547976418140_0004_02_000001
Exit code: 1
Exception message: /bin/bash: line 0: fg: no job control
Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
19/01/20 18:11:29 INFO Job: Counters: 0
加conf.set("mapreduce.app-submission.cross-platform", "true"),再次运行
19/01/20 23:45:30 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
19/01/20 23:45:31 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/20 23:45:31 WARN JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
19/01/20 23:45:31 INFO FileInputFormat: Total input paths to process : 1
19/01/20 23:45:31 INFO JobSubmitter: number of splits:1
19/01/20 23:45:31 INFO JobSubmitter: Submitting tokens for job: job_1547989123887_0006
19/01/20 23:45:32 INFO YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
19/01/20 23:45:32 INFO YarnClientImpl: Submitted application application_1547989123887_0006
19/01/20 23:45:32 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547989123887_0006/
19/01/20 23:45:32 INFO Job: Running job: job_1547989123887_0006
19/01/20 23:45:56 INFO Job: Job job_1547989123887_0006 running in uber mode : false
19/01/20 23:45:56 INFO Job: map 0% reduce 0%
19/01/20 23:46:10 INFO Job: Task Id : attempt_1547989123887_0006_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
... 8 more
19/01/20 23:46:22 INFO Job: Task Id : attempt_1547989123887_0006_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
... 8 more
错误提示找不到WordCountjava$TokenizerMapper这个类,于是,把这个类打成jar包,加入依赖中
19/01/20 21:05:04 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
19/01/20 21:05:05 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/20 21:05:07 INFO FileInputFormat: Total input paths to process : 1
19/01/20 21:05:07 INFO JobSubmitter: number of splits:1
19/01/20 21:05:08 INFO JobSubmitter: Submitting tokens for job: job_1547989123887_0001
19/01/20 21:05:09 INFO YarnClientImpl: Submitted application application_1547989123887_0001
19/01/20 21:05:09 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547989123887_0001/
19/01/20 21:05:09 INFO Job: Running job: job_1547989123887_0001
19/01/20 21:05:40 INFO Job: Job job_1547989123887_0001 running in uber mode : false
19/01/20 21:05:40 INFO Job: map 0% reduce 0%
19/01/20 21:05:57 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: scala.Function1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
19/01/20 21:06:08 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_1, Status : FAILED
Error: java.lang.ClassNotFoundException: scala.Function1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
19/01/20 21:06:26 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_2, Status : FAILED
Error: java.lang.ClassNotFoundException: scala.Function1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
19/01/20 21:07:03 INFO Job: map 100% reduce 100%
19/01/20 21:07:04 INFO Job: Job job_1547989123887_0001 failed with state FAILED due to: Task failed task_1547989123887_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/01/20 21:07:04 INFO Job: Counters: 12
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=73744
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=73744
Total vcore-seconds taken by all map tasks=73744
Total megabyte-seconds taken by all map tasks=75513856
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Process finished with exit code 1
根据此错误,推测应该是scala中有些函数无法运行(此问题暂时不知如何解决=_=!),下面用官网实例的java代码执行一下
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountjava {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("mapreduce.app-submission.cross-platform", "true");
Job job = Job.getInstance(conf, "word count java");
job.setJarByClass(WordCountjava.class);
// job.setJar("hdfs://hadoop001:9000/lib/hadoop-mapreduce.tar.gz");
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// FileInputFormat.addInputPath(job, new Path(args[0]));
// FileOutputFormat.setOutputPath(job, new Path(args[1]));
FileInputFormat.addInputPath(job, new Path("hdfs://hadoop001:9000/data/input"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://hadoop001:9000/out/WCoutput"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
执行此代码,运行成功
19/01/25 21:07:25 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
19/01/25 21:07:26 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/25 21:07:27 INFO FileInputFormat: Total input paths to process : 1
19/01/25 21:07:27 INFO JobSubmitter: number of splits:1
19/01/25 21:07:27 INFO JobSubmitter: Submitting tokens for job: job_1548417790455_0008
19/01/25 21:07:27 INFO YarnClientImpl: Submitted application application_1548417790455_0008
19/01/25 21:07:27 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1548417790455_0008/
19/01/25 21:07:27 INFO Job: Running job: job_1548417790455_0008
19/01/25 21:07:49 INFO Job: Job job_1548417790455_0008 running in uber mode : false
19/01/25 21:07:49 INFO Job: map 0% reduce 0%
19/01/25 21:08:05 INFO Job: map 100% reduce 0%
19/01/25 21:08:21 INFO Job: map 100% reduce 100%
19/01/25 21:08:22 INFO Job: Job job_1548417790455_0008 completed successfully
19/01/25 21:08:22 INFO Job: Counters: 49
File System Counters
FILE: Number of bytes read=44
FILE: Number of bytes written=222537
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=156
HDFS: Number of bytes written=26
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=13929
Total time spent by all reduces in occupied slots (ms)=13266
Total time spent by all map tasks (ms)=13929
Total time spent by all reduce tasks (ms)=13266
Total vcore-seconds taken by all map tasks=13929
Total vcore-seconds taken by all reduce tasks=13266
Total megabyte-seconds taken by all map tasks=14263296
Total megabyte-seconds taken by all reduce tasks=13584384
Map-Reduce Framework
Map input records=3
Map output records=7
Map output bytes=72
Map output materialized bytes=44
Input split bytes=112
Combine input records=7
Combine output records=3
Reduce input groups=3
Reduce shuffle bytes=44
Reduce input records=3
Reduce output records=3
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=255
CPU time spent (ms)=4470
Physical memory (bytes) snapshot=403714048
Virtual memory (bytes) snapshot=5511376896
Total committed heap usage (bytes)=282066944
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=44
File Output Format Counters
Bytes Written=26
查看目标文件夹下输出的文件
[hadoop@hadoop001 lib]$ hadoop fs -ls /out/WCoutput
Found 2 items
-rw-r--r-- 1 zh supergroup 0 2019-01-25 21:08 /out/WCoutput/_SUCCESS
-rw-r--r-- 1 zh supergroup 26 2019-01-25 21:08 /out/WCoutput/part-r-00000
[hadoop@hadoop001 lib]$ hadoop fs -text /out/WCoutput/part-r-00000
hello 4
welcome 1
world 2