Linux科技

Hadoop2.x 伪分布式

2019-03-04  本文已影响0人  Miracle001

简介

Big data:
  结构化数据:约束
  半结构化数据
  非结构化数据:没有元数据
    日志数据是非结构化数据
  搜索引擎:搜索组件、索引组件
    爬虫、蜘蛛程序--爬取数据是非结构化、半结构化

  分词器

  存储
  分析处理
  google的三篇论文:
    GFS  2003年  The Google File System
    MapReduce 2004年  Simplified Data Processing On Large Cluster
    BigTable  2006年 A Distributed Storage System for Structure Data

  山寨版:
  HDFS
  MapReduce
  HBase
  HDFS + MapReduce = Hadoop

Nutch爬取数据-->loosen:数据越大,处理速度越慢;想解决方案--google的论文发布

MapReduce是批处理程序,速度和性能差

NAS,SAN  共享存储
存储系统只有一个,io压力过大,不适用;  集中式传统的解决方案

分布式存储
有中心节点  有元数据存储  GFS/HDFS
无中心节点


NN:NameNode
SNN:Secondary  第二节点,避免NN down掉后,重读数据文件,耗时过长
数据持久化--事务日志-->image--磁盘镜像,保证元数据不丢失;
Hadoop2.0后使用zookeeper高可用,元数据存放在共享存储NFS上。


DN:DataNode
数据副本,保证数据完整;
heartbeat
数据块列表:
  数据为中心,存在哪些节点上;
  节点为中心,有哪些数据块;
Job Tracker  TaskTracker
数据在哪--程序就在哪
namenode 和 jobe tracker 在一起,容易造成系统瓶颈
datanote 和 tasktracker  在一起

函数式编程
  把一个函数当成另一个函数的参数
  Lisp,ML函数式编程语言:高级函数:
map, fold
  map:
    map(f())
    map:接受一个函数为参数,并将其应用于列表中的所有元素,从而生产一个结果列表;
  fold:
    接受两个参数:函数,初始值
      fold(g(), init)

mapreduce:
  mapper, reducer
  shuffle and sort  整理传输和排序
  k-v 数据
  同一个键只能发往同一个reducer
  可能会mapreduce多次
mapper-->combiner-->partitioner--reduccer
mapper 和 reducer  输入、输出的键不同
combiner  输入、输出的键相同

MRv1(hadoop1)-->MRv2(hadoop2)
MRv1:Cluster resource manager, Data processing
MRv2:
  YARN:Cluster resource manager
  MRv2:Data processing
    MR:batch 批处理
    Tez:execution engine  

    RM  resource manager
    NM  node ...
    AM  apply ...
    container  mr
    如下图1

hadoop生态系统  如下图2
sqoop  
  从其他关系型数据库中抽取数据导入到Hadoop中;
  将Hadoop中的数据抽取出来,结构化后,导入到关系型数据库中;
Flume
  日志收集存储到Hadoop中
hive
pig
HBase  列式存储
数据序列化:把非流式化数据转化为流式化数据,而且还可以还原回来。
storm  数据统计和分析

Hadoop Distribution:
  Cloudera:CDH  
  Hortonworks:HDP
  商业版
  Intel:IDH
  MapR

单机模型:测试用,程序是否可以应用到Hadoop里
伪分布式模型:运行于单机
分布式模型:集群模型

Hadoop:基于Java语言


1 2 3 4

Hadoop 伪分布式

centos 7 1804
NAT 192.168.25.14
仅主机  192.168.50.14
禁用防火墙
禁用selinux
yum -y install wget vim lrzsz net-tools ntpdate
yum -y install epel-release-latest-7.noarch.rpm
cat /etc/hosts
192.168.25.11 node1.fgq.com node1
192.168.25.12 node2.fgq.com node2
192.168.25.13 node3.fgq.com node3
192.168.25.14 node4.fgq.com node4
192.168.25.15 node5.fgq.com node5
crontab -e
*/5 * * * * ntpdate time3.aliyun.com && hwclock -w

[root@node4 ~]# mkdir -p /fgq/base-env/
[root@node4 ~]# cd /fgq/base-env/
下载jdk包  jdk-8u152-linux-x64.tar.gz
下载Hadoop包  hadoop-2.9.2.tar.gz
传到这个目录下,并解压
[root@node4 base-env]# tar zxf jdk-8u152-linux-x64.tar.gz 
[root@node4 base-env]# tar zxf hadoop-2.9.2.tar.gz
[root@node4 base-env]# ln -s jdk1.8.0_152 jdk
[root@node4 base-env]# ln -s hadoop-2.9.2 hadoop

[root@node4 ~]# vim /etc/profile
最下面添加如下信息:
export JAVA_HOME=/fgq/base-env/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
[root@node4 ~]# source /etc/profile
[root@node4 ~]# java -version
java version "1.8.0_152"
Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)

[root@node4 ~]# vim /etc/profile.d/hadoop.sh
export HADOOP_HOME=/fgq/base-env/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_MAPPERD_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
[root@node4 ~]# source /etc/profile.d/hadoop.sh

[root@node4 ~]# cd /fgq/base-env/hadoop
[root@node4 hadoop]# groupadd hadoop
[root@node4 hadoop]# useradd -g hadoop yarn
[root@node4 hadoop]# useradd -g hadoop hdfs
[root@node4 hadoop]# useradd -g hadoop mapred
[root@node4 hadoop]# mkdir -p /fgq/data/hadoop/hdfs/{nn,snn,dn}
[root@node4 hadoop]# chown -R hdfs:hadoop /fgq/data/hadoop/hdfs
[root@node4 hadoop]# ll /fgq/data/hadoop/hdfs

[root@node4 hadoop]# mkdir logs
[root@node4 hadoop]# chmod g+w logs  确保logs用户组有写权限
[root@node4 hadoop]# chown -R yarn:hadoop logs
[root@node4 hadoop]# chown -R yarn:hadoop ./*
[root@node4 hadoop]# ll

[root@node4 hadoop]# cd etc/hadoop/
[root@node4 hadoop]# vim core-site.xml
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://node4:8020</value>
                <final>true</final>
        </property>
</configuration>
[root@node4 hadoop]# vim hdfs-site.xml
<configuration>
   <property>
             <name>dfs.replication</name>
             <value>1</value>
   </property>
   <property>
             <name>dfs.namenode.name.dir</name>
             <value>file:///fgq/data/hadoop/hdfs/nn</value>
   </property>
   <property>
             <name>dfs.datanode.data.dir</name>
             <value>file:///fgq/data/hadoop/hdfs/dn</value>
   </property>
   <property>
             <name>fs.checkpoint.dir</name>
             <value>file:///fgq/data/hadoop/hdfs/snn</value>
   </property>
   <property>
             <name>fs.checkpoint.edits.dir</name>
             <value>file:///fgq/data/hadoop/hdfs/snn</value>
   </property>
</configuration>

[root@node4 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@node4 hadoop]# vim mapred-site.xml
<configuration>
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>
[root@node4 hadoop]# vim yarn-site.xml
<configuration>
   <property>
       <name>yarn.resourcemanager.address</name>
       <value>node4:8032</value>
   </property>
   <property>
       <name>yarn.resourcemanager.scheduler.address</name>
       <value>node4:8030</value>
   </property>
   <property>
       <name>yarn.resourcemanager.resource-tracker.address</name>
       <value>node4:8031</value>
   </property>
   <property>
       <name>yarn.resourcemanager.admin.address</name>
       <value>node4:8033</value>
   </property>
   <property>
       <name>yarn.resourcemanager.webapp.address</name>
       <value>node4:8088</value>
   </property>
   <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
   </property>
   <property>
       <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
   <property>
       <name>yarn.resourcemanager.scheduler.class</name>
       <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
   </property>
</configuration>
[root@node4 hadoop]# vim slaves
node4
[root@node4 hadoop]# su - hdfs

## 格式化
[hdfs@node4 ~]$ hdfs namenode -format
19/03/02 10:45:03 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = node4.fgq.com/192.168.25.14
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.9.2
...  ...
19/03/02 10:45:05 INFO common.Storage: Storage directory /fgq/data/hadoop/hdfs/nn has been successfully formatted.
19/03/02 10:45:05 INFO namenode.FSImageFormatProtobuf: Saving image file /fgq/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
19/03/02 10:45:05 INFO namenode.FSImageFormatProtobuf: Image file /fgq/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds .
19/03/02 10:45:05 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/03/02 10:45:05 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node4.fgq.com/192.168.25.14
************************************************************/
显示有 successfully 字样,表示OK
[hdfs@node4 ~]$ ls /fgq/data/hadoop/hdfs/nn/current/
fsimage_0000000000000000000  fsimage_0000000000000000000.md5  seen_txid  VERSION

## 启动namenode
[hdfs@node4 ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-namenode-node4.fgq.com.out
[hdfs@node4 ~]$ less /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-namenode-node4.fgq.com.log 
[hdfs@node4 ~]$ jps  #java的ps命令查看进程
1769 NameNode
1851 Jps
[hdfs@node4 ~]$ jps -h
illegal argument: -h
usage: jps [-help]
       jps [-q] [-mlvV] [<hostid>]

Definitions:
    <hostid>:      <hostname>[:<port>]
[hdfs@node4 ~]$ jps -v
1879 Jps -Denv.class.path=.:/fgq/base-env/jdk/lib/dt.jar:/fgq/base-env/jdk/lib/tools.jar -Dapplication.home=/fgq/base-env/jdk1.8.0_152 -Xms8m
1769 NameNode -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/fgq/base-env/hadoop-2.9.2/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/fgq/base-env/hadoop-2.9.2 -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=/fgq/base-env/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/fgq/base-env/hadoop-2.9.2/logs -Dhadoop.log.file=hadoop-hdfs-namenode-node4.fgq.com.log -Dhadoop.home.dir=/fgq/base-env/hadoop-2.9.2 -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/fgq/base-env/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS

## 启动secondarynamenode
[hdfs@node4 ~]$ hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-secondarynamenode-node4.fgq.com.out
[hdfs@node4 ~]$ jps
1990 Jps
1769 NameNode
1945 SecondaryNameNode

## 启动datanode
[hdfs@node4 ~]$ hadoop-daemon.sh start datanode
starting datanode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-datanode-node4.fgq.com.out
名称节点一般不作为数据节点,但此处是伪分布式
[hdfs@node4 ~]$ jps
1769 NameNode
1945 SecondaryNameNode
2073 DataNode
2155 Jps

[hdfs@node4 ~]$ hdfs dfs -ls /  #根路径下没有目录,创建一个目录
[hdfs@node4 ~]$ hdfs dfs -mkdir /test
[hdfs@node4 ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hdfs supergroup          0 2019-03-02 11:29 /test
注意属主和属组
注意:如果需要其他用户对hdfs有写入权限,需要在hdfs-site.xml文件中添加一项属性定义:
   <property>
         <name>dfs.permissions</name>
         <value>false</value>
   </property>

##上传文件
[hdfs@node4 ~]$ hdfs dfs -put /etc/fstab /test/fstab
[hdfs@node4 ~]$ hdfs dfs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x   - hdfs supergroup          0 2019-03-02 11:37 /test
-rw-r--r--   1 hdfs supergroup        501 2019-03-02 11:37 /test/fstab
/test/fatab这个文件是在远程的hdfs上的
本地文件系统位置:
[root@node4 ~]# vim /fgq/data/hadoop/hdfs/dn/current/BP-1435152656-192.168.25.14-1551494705143/current/finalized/subdir0/subdir0/blk_1073741825
文件过大,分片时,本地文件系统路径也可以查看访问,但是可能会存放于不同目录下
dfs访问接口查看:
[hdfs@node4 ~]$ hdfs dfs -cat /test/fstab

#
# /etc/fstab
# Created by anaconda on Thu Feb 28 17:13:02 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=aebd58c2-fdc1-44ad-b33a-9b6efdf9488c /                       xfs     defaults        0 0
UUID=2fe4fb6b-aab1-42f7-b024-45be2e7065f5 /boot                   xfs     defaults        0 0
UUID=79f97775-e6bb-494d-827a-1f5aa3423c6d swap                    swap    defaults        0 0

[hdfs@node4 ~]$ exit
logout

## 切换至yarn用户,启动yarn服务
[root@node4 hadoop]# su - yarn
[yarn@node4 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-resourcemanager-node4.fgq.com.out
[yarn@node4 ~]$ jps
3376 Jps
3141 ResourceManager
[yarn@node4 ~]$ yarn-daemon.sh start nodemanager
starting nodemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-nodemanager-node4.fgq.com.out
[yarn@node4 ~]$ jps
3141 ResourceManager
3525 Jps
3417 NodeManager

Web UI接口浏览

HDFS 和 YARN ResourceManager 各自提供了一个web接口
通过这些接口可以查看HDFS 集群以及YARN集群的相关状态信息
HDFS-NameNode  http://192.168.25.14:50070  如下图1
YARN-ResourceManager  http://192.168.25.14:8088  如下图2
注意:yarn-site.xml文件中 yarn.resourcemanager.webapp.address 属性的值如果定义为"localhost:8088",则其WebUI仅监听于127.0.0.1地址上的8088端口。
1 2 2

Hadoop运行程序

[root@node4 ~]# cd /fgq/base-env/hadoop/share/hadoop/mapreduce/
[root@node4 mapreduce]# ls
Hadoop-YARN自带了许多样例程序,其中的 hadoop-mapreduce-examples-2.9.2.jar 可用作 mapreduce程序,供测试用

注意:要切换用户至hdfs
[root@node4 ~]# su - hdfs
[hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar
需要指定参数--项目名称,如下:
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

[hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
19/03/02 15:49:46 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
19/03/02 15:49:47 INFO input.FileInputFormat: Total input files to process : 1
19/03/02 15:49:48 INFO mapreduce.JobSubmitter: number of splits:1
19/03/02 15:49:48 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/02 15:49:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551512925744_0001
19/03/02 15:49:49 INFO impl.YarnClientImpl: Submitted application application_1551512925744_0001
19/03/02 15:49:49 INFO mapreduce.Job: The url to track the job: http://node4:8088/proxy/application_1551512925744_0001/
19/03/02 15:49:49 INFO mapreduce.Job: Running job: job_1551512925744_0001
19/03/02 15:49:57 INFO mapreduce.Job: Job job_1551512925744_0001 running in uber mode : false
19/03/02 15:49:57 INFO mapreduce.Job:  map 0% reduce 0%
19/03/02 15:50:02 INFO mapreduce.Job:  map 100% reduce 0%
19/03/02 15:50:06 INFO mapreduce.Job:  map 100% reduce 100%
19/03/02 15:50:07 INFO mapreduce.Job: Job job_1551512925744_0001 completed successfully
19/03/02 15:50:07 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=591
        FILE: Number of bytes written=397951
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=594
        HDFS: Number of bytes written=433
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=2693
        Total time spent by all reduces in occupied slots (ms)=2021
        Total time spent by all map tasks (ms)=2693
        Total time spent by all reduce tasks (ms)=2021
        Total vcore-milliseconds taken by all map tasks=2693
        Total vcore-milliseconds taken by all reduce tasks=2021
        Total megabyte-milliseconds taken by all map tasks=2757632
        Total megabyte-milliseconds taken by all reduce tasks=2069504
    Map-Reduce Framework
        Map input records=11
        Map output records=54
        Map output bytes=625
        Map output materialized bytes=591
        Input split bytes=93
        Combine input records=54
        Combine output records=38
        Reduce input groups=38
        Reduce shuffle bytes=591
        Reduce input records=38
        Reduce output records=38
        Spilled Records=76
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=193
        CPU time spent (ms)=1250
        Physical memory (bytes) snapshot=462479360
        Virtual memory (bytes) snapshot=4232617984
        Total committed heap usage (bytes)=292028416
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=501
    File Output Format Counters 
        Bytes Written=433

[hdfs@node4 ~]$ hdfs dfs -lsr /test
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r--   1 hdfs supergroup        501 2019-03-02 11:37 /test/fstab
drwxr-xr-x   - hdfs supergroup          0 2019-03-02 15:50 /test/fstab_out
-rw-r--r--   1 hdfs supergroup          0 2019-03-02 15:50 /test/fstab_out/_SUCCESS
-rw-r--r--   1 hdfs supergroup        433 2019-03-02 15:50 /test/fstab_out/part-r-00000
生成一个 /etc/fstab_out 文件目录,目录下生成两个文件 _SUCCESS 和 part-r-00000 ,表示OK

## 查看统计字母的次数
[hdfs@node4 ~]$ hdfs dfs -cat /test/fstab_out/part-r-00000
#   7
'/dev/disk' 1
/   1
/boot   1
/etc/fstab  1
0   6
17:13:02    1
2019    1
28  1
Accessible  1
Created 1
Feb 1
See 1
Thu 1
UUID=2fe4fb6b-aab1-42f7-b024-45be2e7065f5   1
UUID=79f97775-e6bb-494d-827a-1f5aa3423c6d   1
UUID=aebd58c2-fdc1-44ad-b33a-9b6efdf9488c   1
anaconda    1
and/or  1
are 1
blkid(8)    1
by  2
defaults    3
filesystems,    1
findfs(8),  1
for 1
fstab(5),   1
info    1
maintained  1
man 1
more    1
mount(8)    1
on  1
pages   1
reference,  1
swap    2
under   1
xfs 2
报错
运行jar程序时,报错
[hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
19/03/02 15:16:11 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
19/03/02 15:16:13 INFO ipc.Client: Retrying connect to server: node4/192.168.25.14:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
报错:一直尝试链接,原因是 yarn 服务的 resourcemanager 没有启动

##### 解决
[root@node4 ~]# su - yarn
[yarn@node4 ~]$ jps
5440 Jps
3417 NodeManager
[yarn@node4 ~]$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-resourcemanager-node4.fgq.com.out
[yarn@node4 ~]$ jps
5478 ResourceManager
3417 NodeManager
5711 Jps


## 再次启动mapreduce程序,仍然报错
[hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
19/03/02 15:29:10 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
19/03/02 15:29:11 INFO input.FileInputFormat: Total input files to process : 1
19/03/02 15:29:11 INFO mapreduce.JobSubmitter: number of splits:1
19/03/02 15:29:11 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/02 15:29:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551511594876_0001
19/03/02 15:29:12 INFO impl.YarnClientImpl: Submitted application application_1551511594876_0001
19/03/02 15:29:12 INFO mapreduce.Job: The url to track the job: http://node4:8088/proxy/application_1551511594876_0001/
19/03/02 15:29:12 INFO mapreduce.Job: Running job: job_1551511594876_0001
19/03/02 15:29:17 INFO mapreduce.Job: Job job_1551511594876_0001 running in uber mode : false
19/03/02 15:29:17 INFO mapreduce.Job:  map 0% reduce 0%
19/03/02 15:29:17 INFO mapreduce.Job: Job job_1551511594876_0001 failed with state FAILED due to: Application application_1551511594876_0001 failed 2 times due to AM Container for appattempt_1551511594876_0001_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2019-03-02 15:29:17.428]Exception from container-launch.
Container id: container_1551511594876_0001_02_000001
Exit code: 1

[2019-03-02 15:29:17.429]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000ed300000, 35651584, 0) failed; error='Cannot allocate memory' (errno=12)


[2019-03-02 15:29:17.429]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000ed300000, 35651584, 0) failed; error='Cannot allocate memory' (errno=12)


For more detailed output, check the application tracking page: http://node4:8088/cluster/app/application_1551511594876_0001 Then click on links to logs of each attempt.
. Failing the application.
19/03/02 15:29:17 INFO mapreduce.Job: Counters: 0

显示内存不足,关机,增大内存到3G,再开机启动服务
启动
su - hdfs
hdfs namenode -format  第一次需要执行
hadoop-daemon.sh start namenode
hadoop-daemon.sh start secondarynamenode
hadoop-daemon.sh start datanode
jps
1769 NameNode
1945 SecondaryNameNode
2073 DataNode
2155 Jps
hdfs dfs -mkdir /test  第一次需要执行
hdfs dfs -put /etc/fstab /test/fstab  第一次需要执行
hdfs dfs -lsr /  第一次需要执行


su - yarn
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
jps
3141 ResourceManager
3525 Jps
3417 NodeManager

su - hdfs
yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out

hdfs dfs -lsr /test
hdfs dfs -cat /test/fstab_out/part-r-00000
上一篇下一篇

猜你喜欢

热点阅读