滴滴研发云AlphaCloud环境下hadoop2.7.7的安装

2019-07-25  本文已影响0人  井地儿

滴滴研发云AlphaCloud环境下hadoop2.7.7的安装

本文基于滴滴研发云AlphaCloud环境的个人学习笔记。

前期准备

安装包下载

为节约时间,建议提前下载好安装包。

集群规划

本文集群环境基于滴滴研发云,其他同学可以自行搭建虚拟机环境。

节点概况

ip host name system
master 10.96.81.166 jms-master-01 centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ]
node 10.96.113.243 jms-master-02 centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ]
node 10.96.85.231 jms-master-03 centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ]

用户规范

统一登录用户组及用户为hadoop。

[root@jms-master-01 ~] groupadd hadoop 
[root@jms-master-01 ~] useradd -r -g mysql mysql 
[root@jms-master-01 ~] cat /etc/group | grep hadoop 
hadoop:x:500: 
[root@jms-master-01 ~] cat /etc/passwd | grep hadoop 
hadoop:x:500:500::/home/hadoop:/bin/bash

目录规范

软件安装目录:/home/hadoop/tools
安装包存放目录:/home/hadoop/tools/package

[hadoop@jms-master-01 ~]$mkdir -p /home/hadoop/tools/package

系统配置

hosts配置

需要注意的是,要注释掉127.0.0.1的host映射。每个节点都需要配置。

[root@jms-master-01 ~]# vim /etc/hosts 
[root@jms-master-01 ~]# cat /etc/hosts 
#       127.0.0.1  jms-master-01 
10.96.81.166 jms-master-01 
10.96.113.243 jms-master-02 
10.96.85.231  jms-master-03

ssh免密

需要注意的是,本机也需要配置ssh免密。另外, 免密是和用户相关的,比如root下配置了免密,hadoop用户需要重新配置免密。每个节点都需要配置。

[hadoop@jms-master-01 ~]$ cat ~/.ssh/id_rsa.pub 
[hadoop@jms-master-01 ~]$ ssh-keygen -t rsa 
[hadoop@jms-master-01 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166 
[hadoop@jms-master-01 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243 
[hadoop@jms-master-01 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231 
[hadoop@jms-master-02 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166 
[hadoop@jms-master-02 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243 
[hadoop@jms-master-02 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231 
[hadoop@jms-master-03 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166 
[hadoop@jms-master-03 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243 
[hadoop@jms-master-03 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231

jdk安装

jdk安装目录:

[hadoop@jms-master-01 ~]$ mkdir -p /home/hadoop/tools/java

上传安装包jdk-8u191-linux-x64.tar.gz至目录/home/hadoop/tools/package

scp jdk-8u191-linux-x64.tar.gz hadoop@10.96.81.166:~/tools/package/

解压

tar -xzvf jdk-8u191-linux-x64.tar.gz -C /home/hadoop/tools/java/

配置环境变量

[root@jms-master-01 ~]# vim /etc/profile 
[root@jms-master-01 ~]# cat /etc/profile 

java home

export JAVA_HOME=/home/hadoop/tools/java/jdk1.8.0_191 
export PATH=$JAVA_HOME/bin:$PATH 
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 
export JRE_HOME=$JAVA_HOME/jre

另一种比较优雅的环境变量配置可选用

sudo vi /etc/profile.d/jdk-1.8.sh 
export JAVA_HOME=/home/hadoop/tools/java/jdk1.8.0_191 
export JRE_HOME=${JAVA_HOME}/jre 
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib 
export PATH=${JAVA_HOME}/bin:$PATH

配置刷新(root和hadoop用户都刷新立即生效),并验证

[root@jms-master-01 ~]# source /etc/profile 
[root@jms-master-01 ~]# su hadoop 
[hadoop@jms-master-01 ~]$ source /etc/profile 
[hadoop@jms-master-01 ~]$ java -version 
java version “1.8.0_191” 
Java(TM) SE Runtime Environment (build 1.8.0_191-b12) 
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

我们只在master节点安装,其他节点直接scp复制即可。

复制jdk

[hadoop@jms-master-01 ~]$ scp -r /home/hadoop/tools/java/jdk1.8.0_191 hadoop@10.96.113.243:/home/hadoop/tools/java/jdk1.8.0_191 
[hadoop@jms-master-01 ~]$ scp -r /home/hadoop/tools/java/jdk1.8.0_191 hadoop@10.96.85.231:/home/hadoop/tools/java/jdk1.8.0_191 

复制配置文件

[hadoop@jms-master-01 ~]$ scp /etc/profile root@10.96.113.243:/etc/profile 
[hadoop@jms-master-01 ~]$ scp /etc/profile root@10.96.85.231:/etc/profile

最后别忘了在另外两个节点上验证jdk是否安装成功。

[hadoop@jms-master-02 ~]$ java -version 
java version “1.8.0_191” 
Java(TM) SE Runtime Environment (build 1.8.0_191-b12) 
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) 
[hadoop@jms-master-03 ~]$ java -version 
java version “1.8.0_191” 
Java(TM) SE Runtime Environment (build 1.8.0_191-b12) 
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

至此所有节点jdk安装成功。

关闭防火墙(研发云环境可以略过)

查看防火墙状态

firewall-cmd    —state 

关闭防火墙

systemctl  stop   firewalld.service 

开启防火墙

systemctl  start   firewalld.service 

禁止开机启动启动防火墙

systemctl   disable   firewalld.service 

Hadoop安装

安装目录规划

项目 目录规划 备注
hadoop安装目录: /home/hadoop/tools/hadoop-2.7.7 建议统一创建软连接,规范配置
hadoop数据存放根目录: /home/hadoop/tools/hadoop_data 建议目录:/data
hdfs-site.xml dfs.namenode.name.dir /home/hadoop/tools/hadoop_data/hadoop/dfs/name 建议目录: /data/hadoop/dfs/name
hdfs-site.xml dfs.datanode.data.dir /home/hadoop/tools/hadoop_data/hadoop/dfs/data 建议目录: /data/hadoop/dfs/data
core-site.xml hadoop.tmp.dir /home/hadoop/tools/hadoop_temp 建议目录:/temp

其实数据目录最好安装在根目录下,这样避免了和用户目录之间的耦合。本文只是临时搭建,所以没有特殊要求。

上传解压

上传安装包并解压至master规划目录。

scp hadoop-2.7.7.tar.gz hadoop@10.96.81.166:~/tools/package/ 
[hadoop@jms-master-01 package]$ tar -xzvf hadoop-2.7.7.tar.gz -C /home/hadoop/tools/

环境变量

在/etc/profile文件中配置: hadoop home

export HADOOP_HOME=/home/hadoop/tools/hadoop-2.7.7 
export PATH=$PATH:$HADOOP_HOME/bin 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_OPTS=-Djava.library.path=$HADOOP_HOME/lib 
# open hadoop debug mode
export HADOOP_ROOT_LOGGER=DEBUG,console

如果在安装之后发现启动报错,可以用下面的参数开启hadoop的debug模式来查看详细的日志。

export HADOOP_ROOT_LOGGER=DEBUG,console 

还有一种比较优雅的环境变量配置方式是在/etc/profile.d/目录中配置。

sudo vi /etc/profile.d/hadoop-2.7.7.sh 
export HADOOP_HOME=/home/hadoop/tools/hadoop-2.7.7 
export PATH=“$HADOOP_HOME/bin:$PATH” 
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop 
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

修改hadoop配置文件

需要修改6个配置文件:

# Hadoop环境变量 
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/hadoop-env.sh 

# Yarn环境变量
/hadoop/home/hadoop/tools-2.7.7/etc/hadoop/yarn-env.sh 

# 注册slave节点 
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/slaves 

# Hadoop全局配置文件,可被其他文件的配置项覆盖 
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/core-site.xml 

# HDFS配置文件,该模板的属性继承于core-site.xml 
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/hdfs-site.xml 

# MapReduce的配置文件,该模板的属性继承于core-site.xml 
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/mapred-site.xml 

# yarn的配置文件,该模板的属性继承于core-site.xml 
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/yarn-site.xml

hadoop-env.sh,yarn-env.sh

添加jdk环境变量即可。

# The java implementation to use. 
export JAVA_HOME=${JAVA_HOME} 
export JAVA_HOME=/home/hadoop/tools/java/jdk1.8.0_191 
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hadoop/tools/hadoop-2.7.7/lib/native 
export HADOOP_OPTS=-Djava.library.path=/home/hadoop/tools/hadoop-2.7.7/lib
# 可选配置参数(不配不影响): 
export HDFS_NAMENODE_USER=“hadoop” 
export HDFS_DATANODE_USER=“hadoop” 
export HDFS_SECONDARYNAMENODE_USER=“hadoop” 
export YARN_RESOURCEMANAGER_USER=“hadoop” 
export YARN_NODEMANAGER_USER=“hadoop"

slaves

配置从节点(这里需要注意的是hadoop3.0+版本以后,slaves文件命名为workers了)

[hadoop@jms-master-01 package]$ cat /home/hadoop/tools/hadoop-2.7.7/etc/hadoop/slaves 
jms-master-02 
jms-master-03

core-site.xml

<configuration> 
     <property> 
             <name>fs.defaultFS</name> 
             <value>hdfs://jms-master-01:9000</value> 
    </property> 
    <property> 
             <name>io.file.buffer.size</name> 
             <value>131072</value> 
    </property> 
    <property> 
             <name>hadoop.tmp.dir</name> 
             <value>file:/home/hadoop/tools/hadoop_temp</value> 
              <description>Abase for other temporary directories.</description> 
    </property> 
</configuration>

hdfs-site.xml

<configuration> 
    <property> 
             <name>dfs.namenode.secondary.http-address</name> 
             <value>jms-master-01:9001</value> 
     </property> 
     <property> 
             <name>dfs.namenode.name.dir</name> 
             <value>file:/home/hadoop/tools/hadoop_data/hadoop/dfs/name</value> 
    </property> 
     <property> 
             <name>dfs.datanode.data.dir</name> 
             <value>file:/home/hadoop/tools/hadoop_data/hadoop/dfs/data</value> 
    </property> 
    <property> 
             <name>dfs.replication</name> 
             <value>3</value> 
    </property> 
    <property> 
             <name>dfs.webhdfs.enabled</name> 
             <value>true</value> 
    </property> 
</configuration>

mapred-site.xml

mapred-site.xml需要通过复制mapred-site.xml.template创建。

<configuration> 
   <property> 
        <name>mapreduce.framework.name</name> 
        <value>yarn</value> 
   </property> 
   <property> 
        <name>mapreduce.jobhistory.address</name> 
        <value>jms-master-01:10020</value> 
   </property> 
   <property> 
        <name>mapreduce.jobhistory.webapp.address</name> 
        <value>jms-master-01:19888</value> 
    </property> 
</configuration>

yarn-site.xml

<configuration> 
<!— Site specific YARN configuration properties —> 
       <property> 
               <name>yarn.nodemanager.aux-services</name> 
               <value>mapreduce_shuffle</value> 
       </property> 
       <property> 
               <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
               <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
       </property> 
       <property> 
               <name>yarn.resourcemanager.address</name> 
               <value>jms-master-01:8032</value> 
       </property> 
       <property> 
               <name>yarn.resourcemanager.scheduler.address</name> 
               <value>jms-master-01:8030</value> 
       </property> 
       <property> 
               <name>yarn.resourcemanager.resource-tracker.address</name> 
               <value>jms-master-01:8031</value> 
       </property> 
       <property> 
               <name>yarn.resourcemanager.admin.address</name> 
               <value>jms-master-01:8033</value> 
       </property> 
</configuration>

以上我们初步完成了master节点的hadoop安装。然后将master节点的hadoop解压包以及配置文件复制到其他两个节点。

scp -r /home/hadoop/tools/hadoop-2.7.7 hadoop@10.96.113.243:/home/hadoop/tools/ 
scp -r /home/hadoop/tools/hadoop-2.7.7 hadoop@10.96.85.231:/home/hadoop/tools/ 
scp /etc/profile.d/hadoop-2.7.7.sh root@10.96.113.243:/etc/profile.d/ 
scp /etc/profile.d/hadoop-2.7.7.sh root@10.96.85.231:/etc/profile.d/

别忘了source /etc/profile 使配置生效。然后检查是否安装成功:

[hadoop@jms-master-01 ~]$ hadoop version 
Hadoop 2.7.7 
Subversion Unknown -r c1aad84bd27cd79c3d1a7dd58202a8c3ee1ed3ac 
Compiled by stevel on 2018-07-18T22:47Z 
Compiled with protoc 2.5.0 
From source with checksum 792e15d20b12c74bd6f19a1fb886490 
This command was run using /home/hadoop/tools/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7.jar

启动hadoop

启动前,需要先在master节点格式化hdfs

/home/hadoop/tools/hadoop-2.7.7/bin/hdfs namenode -format testCluster

启动

/home/hadoop/tools/hadoop-2.7.7/sbin/start-dfs.sh 
/home/hadoop/tools/hadoop-2.7.7/sbin/start-yarn.sh

Jps查看启动状态

master节点

[hadoop@jms-master-01 ~]$ jps 
9651 NodeManager 
9364 SecondaryNameNode 
47268 Jps 
9029 NameNode 
9529 ResourceManager 
9150 DataNode

node节点

[hadoop@jms-master-02 ~]$ jps 
18643 Jps 
2410 DataNode 
2538 NodeManager 
[hadoop@jms-master-03 ~]$ jps 
16869 Jps 
2232 DataNode 
2360 NodeManager

管理界面

HDFS管理界面:http://10.96.81.166:50070/dfshealth.html#tab-overview
MR管理界面:http://10.96.81.166:8088/cluster/apps/RUNNING
打开连接查看管理界面详情.

运行一个woldcount

vim test1 
aaa bbb ccc ddd 
eee fff ggg hhh 
vim test2 
aaa bbb ccc ddd 111 
eee fff ggg hhh 111
hadoop fs -mkdir -p /user/hadoop/test/input 
hadoop fs -put test* /user/hadoop/test/input/
hdfs dfsadmin -safemode leave
yarn jar /home/hadoop/tools/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/hadoop/input /user/hadoop/output
[hadoop@jms-master-02 ~]$ yarn jar /home/hadoop/tools/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/hadoop/input /user/hadoop/output 
19/03/15 20:08:39 INFO client.RMProxy: Connecting to ResourceManager at jms-master-01/10.96.81.166:8032 
19/03/15 20:08:40 INFO input.FileInputFormat: Total input paths to process : 2 
19/03/15 20:08:40 INFO mapreduce.JobSubmitter: number of splits:2 
19/03/15 20:08:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552651623473_0002 
19/03/15 20:08:41 INFO impl.YarnClientImpl: Submitted application application_1552651623473_0002 
19/03/15 20:08:41 INFO mapreduce.Job: The url to track the job: http://jms-master-01:8088/proxy/application_1552651623473_0002/ 
19/03/15 20:08:41 INFO mapreduce.Job: Running job: job_1552651623473_0002 
19/03/15 20:08:48 INFO mapreduce.Job: Job job_1552651623473_0002 running in uber mode : false 
19/03/15 20:08:48 INFO mapreduce.Job:  map 0% reduce 0% 
19/03/15 20:08:57 INFO mapreduce.Job:  map 100% reduce 0% 
19/03/15 20:09:06 INFO mapreduce.Job:  map 100% reduce 100% 
19/03/15 20:09:07 INFO mapreduce.Job: Job job_1552651623473_0002 completed successfully 
19/03/15 20:09:07 INFO mapreduce.Job: Counters: 49 
        File System Counters 
                FILE: Number of bytes read=75 
                FILE: Number of bytes written=368945 
                FILE: Number of read operations=0 
                FILE: Number of large read operations=0 
                FILE: Number of write operations=0 
                HDFS: Number of bytes read=274 
                HDFS: Number of bytes written=31 
                HDFS: Number of read operations=9 
                HDFS: Number of large read operations=0 
                HDFS: Number of write operations=2 
        Job Counters 
                Launched map tasks=2 
                Launched reduce tasks=1 
                Data-local map tasks=2 
                Total time spent by all maps in occupied slots (ms)=12594 
                Total time spent by all reduces in occupied slots (ms)=6686 
                Total time spent by all map tasks (ms)=12594 
                Total time spent by all reduce tasks (ms)=6686 
                Total vcore-milliseconds taken by all map tasks=12594 
                Total vcore-milliseconds taken by all reduce tasks=6686 
                Total megabyte-milliseconds taken by all map tasks=12896256 
                Total megabyte-milliseconds taken by all reduce tasks=6846464 
        Map-Reduce Framework 
                Map input records=4 
                Map output records=8 
                Map output bytes=78 
                Map output materialized bytes=81 
                Input split bytes=228 
                Combine input records=8 
                Combine output records=6 
                Reduce input groups=4 
                Reduce shuffle bytes=81 
                Reduce input records=6 
                Reduce output records=4 
                Spilled Records=12 
                Shuffled Maps =2 
                Failed Shuffles=0 
                Merged Map outputs=2 
                GC time elapsed (ms)=306 
                CPU time spent (ms)=2490 
                Physical memory (bytes) snapshot=698810368 
                Virtual memory (bytes) snapshot=6430330880 
                Total committed heap usage (bytes)=556793856 
        Shuffle Errors 
                BAD_ID=0 
                CONNECTION=0 
                IO_ERROR=0 
                WRONG_LENGTH=0 
                WRONG_MAP=0 
                WRONG_REDUCE=0 
        File Input Format Counters 
                Bytes Read=46 
        File Output Format Counters 
                Bytes Written=31
[hadoop@jms-master-01 xiepengjie]$ hadoop fs -ls /user/hadoop/output 
19/03/15 20:16:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable 
Found 2 items 
-rw-r—r—   3 hadoop supergroup          0 2019-03-15 20:15 /user/hadoop/output/_SUCCESS 
-rw-r—r—   3 hadoop supergroup         54 2019-03-15 20:15 /user/hadoop/output/part-r-00000 
[hadoop@jms-master-01 xiepengjie]$ hadoop fs -cat /user/hadoop/output/part-r-00000 
19/03/15 20:16:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable 
111     2 
aaa     2 
bbb     2 
ccc     2 
ddd     2 
eee     2 
fff     2 
ggg     2 
hhh     2

至此,hadoop集群搭建完毕。

note/env

上一篇 下一篇

猜你喜欢

热点阅读