hadoop环境搭建
一、集群节点配置及服务
主机名 | 角色 | 运行服务 | 安装目录 |
---|---|---|---|
tinygao1 | Master | NameNode ResourceManager |
/data/program/hadoop-2.8.0/ |
tinygao2 | Slave | DataNode NodeManager |
/data/program/hadoop-2.8.0/ |
tinygao3 | Slave | DataNode NodeManager |
/data/program/hadoop-2.8.0/ |
二、安装环境准备
1. 设置主机名和配置hosts
- 修改主机名执行指令:
# hostnamectl set-hostname tinygao1
# hostnamectl status #查看是否修改成功“Static hostname: tinygao1”
或修改文件/etc/hostname,并重启
- 修改hosts
192.168.17.128 tinygao1
192.168.17.129 tinygao2
192.168.17.130 tinygao3
2. 免密码登陆
client端把公钥保存到服务端,以便于服务端比对“随机字符”加解密后是否一致来判断client端的来源合法性,减少了输入密码的步骤。
举例:tinygao1免密登陆到tinygao2和tinygao3。
tinygao1:
# ssh-keygen -t ras
# 一直回车后会生成两个文件: id_rsa(私钥) id_rsa.pub(公钥)
# ssh-copy-id root@tinygao2
# ssh-copy-id root@tinygao3
3. 关闭防火墙
两种办法:
- 添加信任网段
# iptables -A INPUT -i ens33 -s 192.168.17.0/24 -j ACCEPT
- 直接关闭(暴力)
# systemctl stop firewalld
4. 设置环境变量(以hadoop为例)
# touch /etc/profile.d/hadoop.sh
# 输入如下并保存:
HADOOP_HOME=/data/program/hadoop-2.8.0
PATH=$PATH:$HADOOP_HOME/bin
# . /etc/profile #生效
# evn | grep hadoop #查看是否在环境变量中
三、集群搭建
1. 修改配置文件(简单配置)
在各节点的$HADOOP_HOME/etc/hadoop目录下,修改
-
slaves文件内容如下(保证在tinygao1上一次可以启动下面机器的服务,需要ssh免密登陆):
tinygao2 tinygao3
-
core-site.xml
可以参考:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml。这边简单配置如下
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/data/data/hadoop/tmpdir</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://tinygao1:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> </configuration>
-
hdfs-site.xml
可以参考:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/data/hadoop/namenode</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>20</value> </property> </configuration>
-
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>Execution framework set to Hadoop YARN.</description> </property> <property> <name>mapreduce.jobhistory.address</name> <value>tinygao1:10020</value> <description>MapReduce JobHistory Server host:port, default port is 10020</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>tinygao1:19888</value> <description>MapReduce JobHistory Server Web UI host:port, default port is 19888.</description> </property> </configuration>
-
yarn-site.xml
可以参考:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
<?xml version="1.0"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>tinygao1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>tinygao1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>tinygao1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>tinygao1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>tinygao1:8088</value> </property> </configuration>
2.启动Hadoop服务
-
tinygao1进行hadoop格式,相当于元数据格式,会在/data/data/hadoop/namenode产生current目录
# hdfs namenode –format
# cd sbin
# ./start-dfs.sh #本机启动namenode、读取slaves文件启动远程(tinygao2、tinygao3)datanode
# ./start-yarn.sh分别使用jps -ml 指令可以查看到
tinygao1
10288 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager 9897 org.apache.hadoop.hdfs.server.namenode.NameNode 10109 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
tinygao2
2755 org.apache.hadoop.hdfs.server.datanode.DataNode 2874 org.apache.hadoop.yarn.server.nodemanager.NodeManager
tinygao3
11184 org.apache.hadoop.yarn.server.nodemanager.NodeManager 10994 org.apache.hadoop.hdfs.server.datanode.DataNode