(一)ubuntu上搭建hadoop开发环境

2018-06-22  本文已影响0人  qianlong21st

1 安装JDK

(1)下载jdk-8u65-linux-x64.tar.gz
(2) 解压缩上述软件包,并将之拷贝到~/soft/目录下

mkdir ~/soft/
tar -xzvf jdk-8u65-linux-x64.tar.gz
cp jdk1.8.0_65 ./soft

(3) 创建符号链接
ln -s ~/soft/jdk-1.8.0_65 ~/soft/jdk
(4)验证jdk安装是否成功

cd ~/soft/jdk/bin
./java -version

(5)配置JDK环境变量

$>sudo vim /etc/profile
export JAVA_HOME=/home/henry/soft/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
$>cd ~
$>java -version

2 安装hadoop

(1)下载hadoop-2.7.3.tar.gz
(2)解压缩后并将之拷贝到~/soft目录下

$>tar -xzvf hadoop-2.7.3.tar.gz
$>cp ./hadoop-2.7.3 ~/soft/

(3)创建符号连接
$>ln -s /soft/hadoop-2.7.3 ~/soft/hadoop
(4)验证hadoop安装是否成功

$>cd ~/soft/hadoop/bin
henry@s201:~/soft/hadoop/bin$ ./hadoop version
Hadoop 2.7.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41
Compiled by jenkins on 2016-01-26T00:08Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /home/henry/soft/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar

(5)配置hadoop环境变量

$>sudo vim /etc/profile
export HADOOP_HOME=/home/henry/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#使其立即生效
$>source /etc/profile

3 hadoop的三个启动模式

3.1 standalone(local)模式

3.2 Pseudodistributed mode(伪分布模式)

伪分布模式下,所有的节点均运行在同一台计算机上,其配置如下。
(1)进入${HADOOP_HOME}/etc/hadoop目录
(2)编辑core-site.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost/</value>
    </property>
</configuration>

(3)编辑hdfs-site.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

(4)编辑mapred-site.xml文件

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

(5)编辑yarn-site.xml文件

<?xml version="1.0"?>
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>localhost</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

(6)配置SSH是的能够使用SSH免密登录本机


SSH无密登录示意图
$>sudo apt-get install openssh-server
$>sudo apt-get install openssh-clients
henry@s201:~/soft/hadoop/etc/pseudo$ ps -Af | grep ssh
root        983      1  0 12:20 ?        00:00:00 /usr/sbin/sshd -D
henry      1306   1149  0 12:20 ?        00:00:00 gnome-keyring-daemon --start --components ssh
henry      7171   2140  0 13:21 pts/17   00:00:00 grep --color=auto ssh
henry@s201:~/.ssh$ ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts
$>cd ~/.ssh
$>cat id_rsa.pub >> authorized_keys

(7)对hdfs进行格式化

$>hadoop namenode -format

(8)修改hadoop配置文件,手动指定JAVA_HOME环境变量

[${hadoop_home}/etc/hadoop/hadoop-env.sh]
...
export JAVA_HOME=/home/henry/soft/jdk
...

(9)启动hadoop的所有进程

$>start-all.sh

(10)使用jps查看已经启动的进程

$>jps
            33702 NameNode
            33792 DataNode
            33954 SecondaryNameNode

            29041 ResourceManager
            34191 NodeManager

(11)查看hdfs文件系统

        $>hdfs dfs -ls /

(12)在hdfs文件系统上创建目录

hdfs dfs -mkdir -p /home/henry/hadoop

(13) 通过webui查看hadoop的文件系统
在浏览器中输入http://localhost:50070/来查看是否搭建成功。
(14)使用stop-all.sh结束进程
(15)小结

3.3 full distributed(完全分布式)

下图所示为本节将要配置的四台主机示意图,其中包含一个namenode节点,三个datanode节点。四台主机的IP地址和主机名分别为:
192.168.2.201 s201 ——namenode节点
192.168.2.202 s202 ——datanode节点
192.168.2.203 s203 ——datanode节点
192.168.2.204 s204 ——datanode节点

3.3.1 克隆3台虚拟机并设置IP地址、修改主机名

(1) 设置IP地址


其中路由器的IP地址为192.168.2.1,图示操作系统为VMware虚拟机中的Ubunut16.04系统,使用桥接模式。

(2)修改主机名

$> sudo gedit /etc/hostname
s201
$>sudo gedit /etc/hosts
127.0.0.1   localhost
192.168.2.201 s201
192.168.2.202 s202
192.168.2.203 s203
192.168.2.204 s204

(3)克隆3台ubuntu虚拟机并更改主机名

虚拟机->管理->克隆->完整克隆
更改克隆的三台虚拟机的主机名及IP地址:
s202—— 192.168.2.202
s203—— 192.168.2.203
s204—— 192.168.2.204

3.3.2 准备完全分布式主机的ssh

(1)删除所有主机上的/home/henry/.ssh/*
(2)在s201主机上生成密钥对

$>ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

(3)将s201的公钥文件id_rsa.pub远程复制到202 ~ 204主机上, 并放置/home/henry/.ssh/authorized_keys中

$>scp id_rsa.pub henry@s201:/home/henry/.ssh/authorized_keys
$>scp id_rsa.pub henry@s202:/home/henry/.ssh/authorized_keys
$>scp id_rsa.pub henry@s203:/home/henry/.ssh/authorized_keys
$>scp id_rsa.pub henry@s204:/home/henry/.ssh/authorized_keys

3.3.3 配置完全分布式的配置文件(${hadoop_home}/etc/hadoop/)并启动hadoop进程

(1)在s201主机上配置core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml,slaves,hadoop-env.sh文件
[core-site.xml]

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://s201/</value>
</property>
</configuration>

[hdfs-site.xml]

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
</configuration>

[mapred-site.xml]

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

[yarn-site.xml]

<?xml version="1.0"?>
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>s201</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

</configuration>

[slaves]

s202
s203
s204

[hadoop-env.sh]

...
export JAVA_HOME=/home/henry/soft/jdk
...

(2)删除s201主机上/home/henry/soft/hadoop/logs目录下的所有日志文件

$>cd /soft/hadoop/logs
$>rm -rf *

(3)同步/home/henry/soft/hadoop目录到s202,s203,s204主机

$>rsync -lr /home/henry/soft/hadoop henry@s202:/home/henry/soft/hadoop
$>rsync -lr /home/henry/soft/hadoop henry@s203:/home/henry/soft/hadoop
$>rsync -lr /home/henry/soft/hadoop henry@s204:/home/henry/soft/hadoop

(4)删除四台主机上的临时目录文件

$>cd /tmp
$>rm -rf hadoop-henry
$>ssh s202 rm -rf /tmp/hadoop-henry
$>ssh s203 rm -rf /tmp/hadoop-henry
$>ssh s204 rm -rf /tmp/hadoop-henry

(5)格式化文件系统

$>hadoop namenode -format

(6)启动hadoop进程

$>start-all.sh

(7)使用jps命令查看进程

henry@s201:/usr/local/bin$ xcall.sh jps
============= s201 jps =============
5828 ResourceManager
5462 NameNode
8553 Jps
5676 SecondaryNameNode
============= s202 jps =============
32978 Jps
30546 NodeManager
30415 DataNode
============= s203 jps =============
30812 NodeManager
32927 Jps
30687 DataNode
============= s204 jps =============
30289 DataNode
32833 Jps
30426 NodeManager

(8)使用webui查看显示页面


4 更改hadoop的存储目录

hadoop默认存储在/tmp/hadoop-username目录下,若想更改其目录需要在core-site.xml(所有主机中的该文件均需修改)中添加如下属性,并将其中的value值修改为指定目录即可。

<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>
上一篇下一篇

猜你喜欢

热点阅读