Hadoop读书笔记:CentOS装Hadoop集群
HADOOP集群搭建
1.集群简介
Hadoop集群具体来说包含两个集群:HDFS集群和YARN集群,两者逻辑上分离,但物理上常在一起。
HDFS集群:负责海量数据的存储,集群中的角色主要有 NameNode / DataNode。
YARN集群:负责海量数据运算时的资源调度,集群中的角色主要有ResourceManager / NodeManager。
本集群搭建案例,以3节点为例进行搭建。
2.服务器准备
本案例使用虚拟机服务器搭建HADOOP集群,所用软件及版本:
VMware® Workstation 12 Pro 12.5.6
CentOS 6.5 64 bit
JDK 1.8 linux 64bit
Hadoop 2.8.1
3.网络环境配置
采用NAT方式联网、网关地址自定如:192.168.220.1、3个节点IP地址如(192.168.220.128, 192.168.220.129, 192.168.220.130)、子网掩码:255.255.255.0
话不多说开始在VMware装CentOS系统
先打开VMware,我之前已经装好了再重新装个CentOS系统
![](https://img.haomeiwen.com/i8023563/ec0c4c8f8aca5a20.png)
在导航栏点击编辑 选择虚拟网络编辑器 就可以设置网段和网关
![](https://img.haomeiwen.com/i8023563/7afd989769d67822.png)
![](https://img.haomeiwen.com/i8023563/013257c3629c9b7d.png)
然后创建新的虚拟机(我之间已经创建了一个master虚拟器现在再建个master2)
![](https://img.haomeiwen.com/i8023563/428173dff78879d1.png)
![](https://img.haomeiwen.com/i8023563/3d39c93c5bd1e670.png)
![](https://img.haomeiwen.com/i8023563/b10e85b15209d521.png)
![](https://img.haomeiwen.com/i8023563/0a66a1ef9de115e3.png)
![](https://img.haomeiwen.com/i8023563/705018baca376152.png)
![](https://img.haomeiwen.com/i8023563/a02edfd6d85f8ab0.png)
![](https://img.haomeiwen.com/i8023563/92b6ad6519f9bf2f.png)
![](https://img.haomeiwen.com/i8023563/96b0a591a4290355.png)
![](https://img.haomeiwen.com/i8023563/ce1efe68ca3bbcaf.png)
安装CentOS系统
![](https://img.haomeiwen.com/i8023563/c2579c8273d3137e.png)
![](https://img.haomeiwen.com/i8023563/73fbec8c0f6453a7.png)
![](https://img.haomeiwen.com/i8023563/ef004bed4f154c22.png)
![](https://img.haomeiwen.com/i8023563/b41a594f5d64968b.png)
![](https://img.haomeiwen.com/i8023563/d077344f6c2760d4.png)
![](https://img.haomeiwen.com/i8023563/a3a721bd63a815ac.png)
![](https://img.haomeiwen.com/i8023563/0df6c990ae526332.png)
![](https://img.haomeiwen.com/i8023563/f4fc7c85b852c3fb.png)
![](https://img.haomeiwen.com/i8023563/fd4004ee141e36c8.png)
![](https://img.haomeiwen.com/i8023563/07c7531a8994f4b6.png)
![](https://img.haomeiwen.com/i8023563/93d4e0879ef0c458.png)
![](https://img.haomeiwen.com/i8023563/01c3c2b8b55cf8a9.png)
![](https://img.haomeiwen.com/i8023563/72efd69eaed6f1d1.png)
![](https://img.haomeiwen.com/i8023563/432a67add1141b76.png)
![](https://img.haomeiwen.com/i8023563/fddb6f0dd56745e1.png)
![](https://img.haomeiwen.com/i8023563/5f899a659a5b2abc.png)
![](https://img.haomeiwen.com/i8023563/e2675c5704caa1c2.png)
![](https://img.haomeiwen.com/i8023563/0ceea0ca647fecfa.png)
![](https://img.haomeiwen.com/i8023563/049b17c5a3323dcb.png)
![](https://img.haomeiwen.com/i8023563/f9c2e5e74b399520.png)
![](https://img.haomeiwen.com/i8023563/6526148422b08fd8.png)
![](https://img.haomeiwen.com/i8023563/1a0cdc65148e1318.png)
![](https://img.haomeiwen.com/i8023563/d2f8caffe7bebf26.png)
![](https://img.haomeiwen.com/i8023563/161137e2e37fd1ec.png)
![](https://img.haomeiwen.com/i8023563/a05af80e4df1b6f8.png)
![](https://img.haomeiwen.com/i8023563/795b64ffc4483835.png)
![](https://img.haomeiwen.com/i8023563/64fb20a67e1c60c7.png)
恭喜你已经成功安装好CentOS系统!
4.准备SSH连接工具
Xshell SecureCRT(我用着这个)
SecureCRT是一款支持SSH(SSH1和SSH2)的终端仿真程序,简单地说是Windows下登录UNIX或Linux服务器主机的软件。
查看用户的IP地址:
![](https://img.haomeiwen.com/i8023563/e13d50726a3704ae.png)
用SecureCRT工具连接虚拟机只要配置下IP地址 用户名和密码就可以,具体细节就不介绍。
![](https://img.haomeiwen.com/i8023563/a454fffec17e228c.png)
修改主机名及其对应的IP地址
sudo vi /etc/hosts(之前配过一次就把之前的master信息拿过来)
需要在slave01和slave02机子上弄同样的配置
(还需要ping一下看能不能同如:ping slave01)
![](https://img.haomeiwen.com/i8023563/76a3cefd9f67ce38.png)
修改用户权限
先切换到root用户 su root
修改 /etc/sudoers 文件,找到下面两行
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
加上一行比如我的用户是hadoop所以:hadoop ALL=(ALL) ALL
每台机子关闭防火墙
关闭虚拟机防火墙:
关闭命令: service iptables stop
永久关闭防火墙:chkconfig iptables off
两个命令同时运行,运行完成后查看防火墙关闭状态
service iptables status
1 关闭防火墙-----service iptables stop
2 启动防火墙-----service iptables start
3 重启防火墙-----service iptables restart
4 查看防火墙状态--service iptables status
5 永久关闭防火墙--chkconfig iptables off
6 永久关闭后启用--chkconfig iptables on
先查看防火墙状态
service iptables status(需要换root用户或前面加上sudo)
![](https://img.haomeiwen.com/i8023563/d723169fa3accaf6.png)
永久关闭selinux: vi /etc/selinux/config(普通用户前面加个sudo)
找到SELINUX 行修改成为:SELINUX=disabled:
![](https://img.haomeiwen.com/i8023563/9c2b1a33109cd46b.png)
关闭防火墙-----service iptables stop(普通用户加个sudo),sudo chkconfig iptables off 重启后生效
重启后再查看状态sudo service iptables status
![](https://img.haomeiwen.com/i8023563/78406c80b7b94ba8.png)
![](https://img.haomeiwen.com/i8023563/d022286206200ef6.png)
配置JDK
下载jdk就不多说了我下的是(jdk-8u131-linux-x64.tar.gz)
安装在哪个路径下可以自己定(我装下/home/hadoop(用户) 路径下 mkdir java 也可以装下别的路径下)
![](https://img.haomeiwen.com/i8023563/98d953aee081f247.png)
put(空) +文件路径 上传
![](https://img.haomeiwen.com/i8023563/d76539fdbad8ec02.png)
![](https://img.haomeiwen.com/i8023563/bc368cdf3b60ffd0.png)
解压 tar -zxvf jdk-8u131-linux-x64.tar.gz
修改配置文件:sudo vi /etc/profile
在这行下面加上:export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL
export JAVA_HOME=/home/hadoop/java/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
![](https://img.haomeiwen.com/i8023563/2c65c284726ff2c4.png)
重启:source /etc/profile 然后查看 java -version
![](https://img.haomeiwen.com/i8023563/3fb361c610991c17.png)
配置SSH免密登录
进入.ssh目录 ( ll -a 才能看见)
![](https://img.haomeiwen.com/i8023563/20d86588d3fab234.png)
ssh-keygen -t rsa 一路回车 生成两个文件一个公钥一个私钥:cp id_rsa.pub authorized_keys
![](https://img.haomeiwen.com/i8023563/21181502f1cd3c58.png)
修改authorized_keys权限:chmod 644 authorized_keys
此时重启ssh服务:sudo service sshd restart
ssh master 第一次连接要输入yes
![](https://img.haomeiwen.com/i8023563/073dded41373e4bb.png)
与其它节点实现免密登录
把master节点中把authorized_keys分发到各个结点上(会提示输入密码):
scp /home/hadoop/.ssh/authorized_keys slave01:/home/hadoop/.ssh
scp /home/hadoop/.ssh/authorized_keys slave02:/home/hadoop/.ssh
然后在各个节点对authorized_keys执行(一定要执行该步,否则会报错):chmod 644 authorized_keys
ssh slave01
![](https://img.haomeiwen.com/i8023563/04b2c214d6dc45e0.png)
装Hadoop
同样用sftp,先在/home/hadoop目录下创建hadoop文件
![](https://img.haomeiwen.com/i8023563/c71a97e189299bb5.png)
修改配置文件:
进入hadoop-2.8.1/etc/hadoop目录下
hadoop-env.sh:
export JAVA_HOME=/home/hadoop/java/jdk1.8.0_131 (jdk安装的路径 如果不知道 echo $JAVA_HOME)
![](https://img.haomeiwen.com/i8023563/de4e11ab065f5326.png)
core-site.xml:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.8.1/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
![](https://img.haomeiwen.com/i8023563/586da6dcb38fe3e2.png)
hdfs-site.xml:
<configuration>
<property>
<name>dfs.namenode.secondary.http.address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.8.1/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.8.1/tmp/dfs/data</value>
</property>
</configuration>
![](https://img.haomeiwen.com/i8023563/ce60478f840ffc49.png)
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
<property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
![](https://img.haomeiwen.com/i8023563/f25d61fe253d7318.png)
slaves:
master
slave01
slave02
系统配置文件:sudo vi /etc/profile 修改后 source /etc/profile
export HADOOP_HOME=/home/hadoop/hadoop/hadoop-2.8.1
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
![](https://img.haomeiwen.com/i8023563/004d12dcb4dc549b.png)
将hadoop分发到各个节点
scp -r /home/hadoop/hadoop slave01:/home/hadoop
scp -r /home/hadoop/hadoop slave02:/home/hadoop
(再配置系统配置,就是把HADOOP环境变量配下)
在master节点格式化hdfs
hdfs namenode -format
![](https://img.haomeiwen.com/i8023563/6dcc61fa9e7bd326.png)
启动HDFS
start-dfs.sh
启动YARN
start-yarn.sh
分别在各个主机上执行 jps 查看服务情况
![](https://img.haomeiwen.com/i8023563/e2c9a765a42ad1e8.png)
![](https://img.haomeiwen.com/i8023563/b3d947fff2d1a1cc.png)
![](https://img.haomeiwen.com/i8023563/71e8676f357b1206.png)
web 访问页面
http://master:50070/
![](https://img.haomeiwen.com/i8023563/540aa3b131ba0966.png)
恭喜成功了!
这是第二遍配置总结下第一遍配置遇到的一些坑:
1.第一遍配置讲每个主机的普通用户都设置不一样SSH无密连接老是不成功,需要将每个主机普通用户设为一样
2.防火墙忘记关掉
3.生成秘钥的那个权限需要修改
4.我第一次将hadoop目录放在/usr下 用户权限是root的 一些操作老是不成功 改放在/home目录下权限是普通用户的就成功了