程序员我爱编程

Ubuntu搭建Hadoop

2017-03-29  本文已影响180人  Charles__Jiang

环境

服务器(虚拟机):

软件环境:

Step1:创建账号并授权

使用root账户创建 hadoop用户,并设置密码为 111111

adduser hadoop
输入密码:111111
确认密码:111111
以下步骤回车即可...

使用root账户给Hadoop用户授予root权限

vim /etc/sudoers
在 "root ALL=(ALL:ALL) ALL" 下添加
hadoop  ALL=(ALL:ALL) ALL

如下:
# User privilege specification
root    ALL=(ALL:ALL) ALL
hadoop  ALL=(ALL:ALL) ALL

注:保存退出使用 wq!

Step2:修改hosts地址

vim /etc/hosts
10.211.55.23  vm-master
10.211.55.25  vm-slave1
10.211.55.24  vm-slave2

注:使用ip地址使用ifconfig查看

Step3:安装SSH并设置免密码登录

若未安装SSH服务,使用:apt-get install ssh 即可。

使用hadoop用户生成公钥,私钥

su hadoop

cd ~

ssh-keygen -t rsa -P ""

一路回车即可...

执行完后,将在 /home/hadoop/.ssh 文件加中生成 id_rsa(私钥),id_rsa.pub(公钥)

将公钥添加到 authorized_keys(此文件用于保存允许以当前用户身份登录到ssh客户端用户的公钥内容)

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

修改此文件权限
chmod 600 ~/.ssh/authorized_keys

修改配置文件
sudo vim /etc/ssh/sshd_config, 取消下列注释:

RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile  .ssh/authorized_keys

重启SSH服务
sudo service ssh restart

使用haddop用户测试免密码登录localhost
ssh hadoop@localhost (弹出确认信息,输入yes即可)

Step4:安装SDK

更新软件
sudo apt-get update

安装软件
sudo apt-get install software-properties-common

添加源
sudo add-apt-repository ppa:webupd8team/java

再次更新
sudo apt-get update

安装JDK
sudo apt-get install oracle-java8-installer

检查是否安装成功:
java -version

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

注:若此步骤安装太慢,也可到 http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 下载自行安装。

Step5:安装vm-slave1,vm-slave2

重复以上操作

Step6:配置允许vm-master免密码登录vm-slave1, vm-slave2

在vm-master上操作

su hadoop

ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@vm-slave1

ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@vm-slave2

验证是否设置成功
在vm-master上,切换至hadoop用户
ssh hadoop@vm-slave1, 看是否可以免密码登录

Step7:下载安装Hadoop

wget http://statics.charlesjiang.com/hadoop-2.7.3.tar.gz

tar zxvf hadoop-2.7.3.tar.gz

sudo mv hadoop-2.7.3 /usr/local/hadoop

将hadoop文件夹属主改为hadoop
sudo chown -R hadoop:hadoop /usr/local/hadoop

Step8:配置Hadoop

涵盖配置文件:

/usr/local/hadoop/etc/hadoop/slaves
/usr/local/hadoop/etc/hadoop/core-site.xml
/usr/local/hadoop/etc/hadoop/hdfs-site.xml
/usr/local/hadoop/etc/hadoop/mapred-site.xml
/usr/local/hadoop/etc/hadoop/yarn-site.xml
/usr/local/hadoop/etc/hadoop/hadoop-env.sh
<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://vm-master:9000</value>
        </property>
</configuration>
注:此处value不能配置为localhost
cp mapred-site.xml.template  ./mapred-site.xml

vim mapred-site.xml

<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://vm-master:9000</value>
        </property>
        <property>
                <name>mapred.job.tracker</name>
                <value>hdfs://vm-master:9001</value>
        </property>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>
<configuration>
        <property>
                <name>dfs.name.dir</name>
                <value>/usr/local/hadoop/namenode</value>
        </property>
        <property>
                <name>dfs.data.dir</name>
                <value>/usr/local/hadoop/datanode</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>

<configuration>
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>vm-master:8032</value>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>vm-master:8030</value>
        </property>
        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>vm-master:8088</value>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>vm-master:8031</value>
        </property>
        <property>
                <name>yarn.resourcemanager.admin.address</name>
                <value>vm-master:8033</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>

vm-master
vm-slave1
vm-slave2
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

注:此处填写JDK绝对路径

Step9:配置hadoop用户环境变量

su hadoop
vim /home/hadoop/.bash_profile

具体配置如下:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark

JAVA_HOME=/usr/lib/jvm/java-8-oracle
JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
CLASSPATH=.:$JAVA_HOME/lib/tools.jar

PATH=$JAVA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH

export JAVA_HOME CLASSPATH PATH USER LOGNAME MAIL HOSTNAME

同时将vm-slave1, vm-slave2 做相同配置

执行配置文件
source /home/hadoop/.bash_profile

Step10:将vm-master的Hadoop拷贝至 vm-slave1, vm-slave2

scp -r /usr/local/hadoop/ hadoop@vm-slave1:/home/hadoop

scp -r /usr/local/hadoop/ hadoop@vm-slave2:/home/hadoop

分别在vm-slave1, vm-slave2中,将hadoop文件夹移至 /usr/local/hadoop

sudo mv hadoop /usr/local/hadoop

移动后再在vm-slave1, vm-slave2 将/usr/local/hadoop宿主改为hadoop

sudo chown -R hadoop:hadoop  /usr/local/hadoop/

Step11:格式化HDFS

在vm-master上操作

cd /usr/local/hadoop

./bin/hdfs namenode -format


输出类似如下信息则格式成功:

17/03/29 15:14:36 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/03/29 15:14:36 INFO util.ExitUtil: Exiting with status 0
17/03/29 15:14:36 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at vm-master.localdomain/127.0.1.1
************************************************************/

Step12:启动Hadoop

在vm-master上操作

sbin/start-dfs.sh
启动后,输入jps

类似如下信息则成功:
2403 DataNode
3188 Jps
3079 SecondaryNameNode
2269 NameNode

注:弹出确认信息,输入yes即可

Step12:启动Yarn

在vm-master上操作

sbin/start-yarn.sh 
启动后,输入jps

类似如下信息则成功:
3667 Jps
2403 DataNode
3237 ResourceManager
3079 SecondaryNameNode
2269 NameNode
3391 NodeManager

再在 vm-sleve1 或 vm-sleve2中输入 jps
显示类似如下信息:

2777 Jps
2505 DataNode
2654 NodeManager

Step13:验证安装

  1. 查看Hadoop状态
bin/hdfs dfsadmin -report

如下信息:

Configured Capacity: 198576648192 (184.94 GB)
Present Capacity: 180282531840 (167.90 GB)
DFS Remaining: 180282449920 (167.90 GB)
DFS Used: 81920 (80 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (3):

Name: 10.211.55.24:50010 (vm-slave2)
Hostname: vm-master
Decommission Status : Normal
Configured Capacity: 66192216064 (61.65 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6213672960 (5.79 GB)
DFS Remaining: 59978518528 (55.86 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.61%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 29 16:38:34 CST 2017


Name: 10.211.55.25:50010 (vm-slave1)
Hostname: vm-master
Decommission Status : Normal
Configured Capacity: 66192216064 (61.65 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6213672960 (5.79 GB)
DFS Remaining: 59978518528 (55.86 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.61%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 29 16:38:34 CST 2017


Name: 10.211.55.23:50010 (vm-master)
Hostname: vm-master
Decommission Status : Normal
Configured Capacity: 66192216064 (61.65 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 5866770432 (5.46 GB)
DFS Remaining: 60325412864 (56.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.14%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 29 16:38:34 CST 2017
  1. 查看HDFS管理页面

常见问题

  1. 系统乱码如下图:


    乱码

解决办法:

sudo vim /etc/environment
添加
LANG="en_US.UTF-8"
LANGUAGE="en_US:en"

sudo vim /var/lib/locales/supported.d/local
添加:
en_US.UTF-8 UTF-8

sudo vim /etc/default/locale
修改:
LANG="en_US.UTF-8"
LANGUAGE="en_US:en"

重启
sudo reboot
  1. 克隆虚拟机后网卡失效问题
    解决办法:
vim /etc/udev/rules.d/70-persistent-net.rules

删除含有eth0 的行,并将 eth1 改为eth0

重启即可
网卡

3.未找到环境变量 JAVA_HOME

vm-slave2: Error: JAVA_HOME is not set and could not be found.
vm-slave1: Error: JAVA_HOME is not set and could not be found.

解决办法:

将所有服务器的 hadoop-env.sh 中的 export JAVA_HOME= 设置为JDK绝对路径
  1. 无法打开 http://vm-master:8088/
检查 hosts文件,

127.0.0.1       localhost
#127.0.1.1      vm-master.localdomain   vm-master  【将此信息屏蔽】

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

10.211.55.23  vm-master
10.211.55.25  vm-slave1
10.211.55.24  vm-slave2

关闭yarn  sbin/stop-yarn.sh
关闭hdfs  sbin/stop-dfs.sh
启动hdfs  sbin/start-dfs.sh
启动yarn  sbin/start-yarn.sh

博客地址:http://www.charlesjiang.com/archives/45.html

上一篇 下一篇

猜你喜欢

热点阅读