hadoop 完全分布式HA高可用自动切换集群环境搭建

2019-11-05  本文已影响0人  余长生

参考了很多博客,根据一些博客综合之后自己搭建成功了,写了这篇文章记录一下过程,希望给各位朋友一些帮助

环境准备

自己准备虚拟机环境配置好虚拟机[vm搭建] (http://note.youdao.com/noteshare?id=1cd5582fbe7351bb04e45f9e740ecd6f)

linux系统:CentOS-7-x86_64-Minimal-1810
jdk版本:jdk8+
zookeeper: zookeeper-3.4.14
hadoop: hadoop-2.7.7

jdk-8u231
zookeeper-3.4.14
hadoop-2.7.7

卸载自带jdk,使用jdk1.8+以上

rpm -qa | grep java
使用rpm进行卸载
rpm -e java-xxx
rpm -e --nodeps java-xxx   #强制卸载

一、分布式集群规划

节点名称 IP地址 NAMENODE(NN) DATANODE(DN) JJN ZKFC ZK
hadoop1 172.16.0.161 namenode 1 datanode1 journalnode zkfc zookeeper
hadoop2 172.16.0.162 namenode 2 datanode2 journalnode zkfc zookeeper
hadoop3 172.16.0.163 datanode3 journalnode zookeeper

二、网络IP规划

2.1 修改主机名

以centos7为例修改主机名

vi /etc/hostname

在三台机器上分别执行
hostnamectl set-hostname hadoop1
hostnamectl set-hostname hadoop2
hostnamectl set-hostname hadoop3

补充:centos6修改主机名
vi /etc/sysconfig/network  
将里面原来的信息修改如下:
NETWORKING=yes
HOSTNAME=hadoop1

2.2 修改相对应的hosts
查看ip  ip addr

vi /etc/hosts 添加主机映射
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6


172.16.0.161 hadoop1   #由于我的虚拟机ip被占用所有改成了172.16.0.170
172.16.0.162 hadoop2
172.16.0.163 hadoop3

重启系统 reboot

重启网路  service network start

三、关闭防火墙

Centos7系统默认防火墙不是iptables,而是firewall,那就得使用以下方式关闭防火墙了。

systemctl stop firewalld.service            #停止firewall
systemctl disable firewalld.service        #禁止firewall开机启动

补充:Centos6关闭防火墙
service iptables status             #查看防火墙状态
service iptables stop               #关闭防火墙,但是重启后会恢复原来状态
chkconfig iptables --list           #查看系统中防火墙的自动
chkconfig iptables off              #关闭防火墙自启动
chkconfig iptables --list           #再次查看防火墙自启动的情况,所有启动状态都变成额off

四、设置ssh免密登录

关于ssh免密码的设置,要求每两台主机之间设置免密码,自己的主机与自己的主机之间也要求设置免密码。 这项操作可以在root用户下执行,执行完毕公钥在/home/root/.ssh/id_rsa.pub

[root@hadoop1 ~]# ssh-keygen -t rsa
[root@hadoop1 ~]# ssh-copy-id hadoop1
[root@hadoop1 ~]# ssh-copy-id hadoop2
[root@hadoop1 ~]# ssh-copy-id hadoop3

免密登录有时候没有成功遇到permission错误,可能是ids_rsa权限给大了,降低下就可以了

问题:Permissions 0644 for '/root/.ssh/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key "/root/.ssh/id_rsa": bad permissions
root@192.168.108.130's password:
Permission denied, please try again.
没有改错,我是用了命令 chmod 600 /root/.ssh/id_rsa ,而且我看到权限也降低了,但是再次执行脚本还是出来上面这个错误,我输入root密码后就卡住了

五、安装jdk

安装软件全部安装到/usr/local/hadoop目录下,统一管理

  1. 在/usr/local下 创建目录 hadoop , 并赋予权限
    mkdir hadoop
    sudo chmod -R 777 /usr/local/hadoop

  2. 解压jdk-8u231
    tar -zxvf jdk-8u231-linux-x64.tar.gz

  3. 配置环境变量vi /etc/profile

export JAVA_HOME=/usr/local/hadoop/jdk1.8.0_211
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
  1. 刷新配置,使其生效
    source /etc/profile

六、安装zookeeper

  1. 在/usr/local/hadoop/ 目录下解压zookeeper
    tar -zxvf zookeeper-3.4.14.tar.gz
  2. 配置环境变量vi /etc/profile
export ZOOKEEPER_HOME=/usr/local/hadoop/zookeeper-3.4.14/
export PATH=$PATH:$ZOOKEEPER_HOME/bin
  1. 刷新配置,使其生效
    source /etc/profile

  2. zookeeper集群安装
    4.1 进入/usr/local/hadoop/zookeeper-3.4.14/conf目录下 cp zoo_sample.cfg zoo.cfg
    4.2 修改zoo.cfg
    注意:需要在相对应的目录下创建myid

例如在/data/zookeeper/ 目录下,命令如下:
cd /data/zookeeper
echo 1 > myid

#1,2,3分别表示dataDir目录(/data/zookeeper/myid)中的内容
server.1=hadoop1:2888:3888
server.2=hadoop2:2888:3888
server.3=hadoop3:2888:3888

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
# 此处填写自己的目录位置
dataDir=/data/zookeeper
dataLogDir=/logs/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

七、安装hadoop

  1. 在/usr/local/hadoop/ 目录下解压hadoop-2.7.7
    tar -zxvf hadoop-2.7.7.tar.gz
  2. 配置环境变量vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.7
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
  1. 刷新配置,使其生效
    source /etc/profile

  2. 修改hadoop配置文件,进入/usr/local/hadoop/hadoop-2.7.7/etc/hadoop
    4.1 修改 hadoop-env.sh 中的JAVA_HOME,设置成绝对路径

#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/hadoop/jdk1.8.0_211

注意:配置中的注释在搭建环境中最好去掉,以免出现莫名其妙的问题,此处只是为了便于了解属性的含义
4.2 core-site.xml

<configuration>
<!-- 指定文件系统的主节点-->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster</value>
  </property>
<!-- hadoop日志路径-->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/data/hadoop/tmp</value>
  </property>

  <property>
    <name>io.file.buffer.size</name>
    <value>4096</value>
  </property>
<!-- 指定可以在任何IP访问-->
  <property>
    <name>hadoop.proxuuser.hduser.hosts</name>
    <value>*</value>
  </property>
<!-- 指定所有用户可以访问 -->
  <property>
    <name>hadoop.proxyuser.hduser.groups</name>
    <value>*</value>
  </property>
<!-- zookeeper集群地址 -->
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
  </property>
</configuration>

4.3 hdfs-site.xml

<configuration>
<!-- HA配置-->
<!-- 指定hdfs的集群名为mycluster -->
  <property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
  </property>

  <property>
    <name>dfs.ha.namenodes.mycluster</name>
    <value>nn1,nn2</value>
  </property>
<!-- namenode1 RPC端口-->
  <property>
    <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    <value>hadoop1:9000</value>
  </property>
<!-- namenode2 RPC端口-->
  <property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>hadoop2:9000</value>
  </property>
<!-- namenode1 HTTP端口-->
  <property>
    <name>dfs.namenode.http-address.mycluster.nn1</name>
    <value>hadoop1:50070</value>
  </property>
<!-- namenode2 HTTP端口-->
  <property>
    <name>dfs.namenode.http-address.mycluster.nn2</name>
    <value>hadoop2:50070</value>
  </property>
<!-- HA故障切换 -->
  <property>
    <name>dfs.ha.automic-failover.enabled.cluster</name>
    <value>true</value>
  </property>
<!-- journalnode配置-->
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster</value>
  </property>
  <property>
    <name>dfs.namenode.edits.journal-plugin.qjournal </name>
    <value>org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager</value>
  </property>
<!-- 发生failover时,Standby的节点要执行一系列方法把原来的Active节点中不健康的NameNode服务杀掉,这个叫fence过程。sshfence会公国ssh远程调用fuser命令去找到Active节点的NameNode服务并杀死它-->
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>
sshfence
shell(/bin/true)
    </value>
  </property>
<!-- SSH私钥 -->
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
  </property>
<!-- JournalNode 文件存储地址-->
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/data/hadoop/ha/jn</value>
  </property>
  <property>
    <name>dfs.permissions.enable</name>
    <value>false</value>
  </property>
<!-- 负责故障切换实现类 -->
  <property>
    <name>dfs.client.failover.proxy.provider.mycluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.namenode.name.dir.restore</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///data/hadoop/dfsdata/name</value>
  </property>

  <property>
    <name>dfs.blocksize</name>
    <value>67108864</value>
  </property>

  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///data/hadoop/dfsdata/data</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
<!-- 指定web可以方位hdfs目录 -->
  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
  </property>
</configuration>

4.4 mapred-site.xml

拷贝mapred-queues.xml.template 为 mapred-site.xml
cp mapred-queues.xml.template mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

4.5 yarn-site.xml

<!-- resourcemanager 失联后重新链接的时间 -->
<configuration>
  <property>
    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    <value>2000</value>
  </property>
<!-- 开启resourcemanager HA,默认为false -->
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
<!-- 开启resourcemanager 命名 -->
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>hadoop2</value>
  </property>

  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>hadoop3</value>
  </property>
<!-- 开启resourcemanager故障自动切换,指定机器-->
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
    <!--在 hadoop1 上配置 rm1,在 hadoop2 上配置 rm2, 注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在 YARN 的另一个机器上一定要修改,其他机器上不配置此项-->
  <property>
    <name>yarn.resourcemanager.ha.id</name>
    <value>rm1</value>
  </property>
<!-- 开启resourcemanager故障自动恢复-->
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
<!-- 用户持久存储的类-->
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>
<!-- zookeeper集群地址-->
  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
  </property>
<!-- 失联等待链接时间-->
  <property>
    <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
    <value>5000</value>
  </property>
<!-- 集群ID -->
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>mycluster</value>
  </property>
<!-- 开启resourcemanager故障自动恢复-->
  <property>
    <name>yarn.resourcemanager.address.rm1</name>
    <value>hadoop1:8132</value>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address.rm1</name>
    <value>hadoop1:8130</value>
  </property>

    <!-- RM 的网页接口地址:端口-->
  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>hadoop1:8088</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>hadoop1:8131</value>
  </property>

<!-- RM 管理接口地址:端口-->
  <property>
    <name>yarn.resourcemanager.admin.address.rm1</name>
    <value>hadoop1:8033</value>
  </property>

  <property>
    <name>yarn.resourcemanager.ha.admin.address.rm1</name>
    <value>hadoop1:23142</value>
  </property>

  <property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>hadoop2:8132</value>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>hadoop2:8130</value>
  </property>


  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>hadoop2:8088</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>hadoop2:8131</value>
  </property>

  <property>
    <name>yarn.resourcemanager.admin.address.rm2</name>
    <value>hadoop2:8033</value>
  </property>

  <property>
    <name>yarn.resourcemanager.ha.admin.address.rm2</name>
    <value>hadoop2:23142</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/data/hadoop/dfsdata/yarn/local</value>
  </property>

  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/data/hadoop/dfsdata/logs</value>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>1024</value>
    <discription>每个节点可用内存,单位 MB</discription>
  </property>

  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>258</value>
    <discription>单个任务可申请最少内存,默认 1024MB</discription>
  </property>

  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>512</value>
    <discription>单个任务可申请最大内存,默认 8192MB</discription>
  </property>

  <property>
    <name>yarn.nodemanager.webapp.address</name>
    <value>0.0.0.0:8042</value>
  </property>
</configuration>

4.6 修改slaves文件

vi slaves
在其中添加

hadoop1
hadoop2
hadoop3

八、每个节点创建相对应的目录,分发hadoop所需文件到其他节点

  1. 创建目录, 并赋予读写权限
mkdir -p /data/zookeeper
mkdir -p /logs/zookeeper
mkdir -p /data/hadoop/dfsdata/name
mkdir -p /data/hadoop/dfsdata/data
mkdir -p /data/hadoop/dfsdata/logs
mkdir -p /data/hadoop/dfsdata/yarn/local
mkdir -p /data/hadoop/ha/jn

sudo chmod -R 777 /data
  1. 分发文件, 进入/usr/local目录下
scp -r hadoop root@hadoop2:/usr/local
scp -r hadoop root@hadoop3:/usr/local

九、启动

  1. 启动zookeeper集群, 按顺序启动
[root@hadoop1 zookeeper-3.4.14]# ./bin/zkServer.sh start

[root@hadoop2 zookeeper-3.4.14]# ./bin/zkServer.sh start

[root@hadoop3 zookeeper-3.4.14]# ./bin/zkServer.sh start

  1. 初始化hadoop,进入/usr/local/hadoop/hadoop-2.7.7
    2.1 格式化zk集群
/bin/hdfs zkfc -formatZK

2.2 开启journalnode进程,启动journalnode集群,在hadoop1,hadoop2,hadoop3上执行

[root@hadoop1 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start journalnode

[root@hadoop2 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start journalnode

[root@hadoop3 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start journalnode

2.3 在namenode1上执行格式化namenode

[root@hadoop1 hadoop-2.7.7]# ./bin/hadoop namenode -format

2.4 启动datanode,在hadoop1,hadoop2,hadoop3上执行

[root@hadoop1 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start datanode
[root@hadoop2 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start datanode
[root@hadoop3 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start datanode

2.5 启动namenode
2.5.1 namenode1

[root@hadoop1 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start namenode

2.5.2 namenode2

[root@hadoop2 hadoop-2.7.7]# ./bin/hdfs namenode -bootstrapStandby
[root@hadoop2 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start namenode

此时namenode1和namenode2同时处于 standby状态
http://172.16.0.170:50070/dfshealth.html#tab-overview

image.png

http://172.16.0.164:50070/dfshealth.html#tab-overview

image.png

2.6启动zkfc服务

在namenode1和namenode2上同时执行以下命令:
[root@hadoop1 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start zkfc
[root@hadoop2 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start zkfc

启动zkfc服务后,namenode1和namenode2会自动选举出active节点


image.png
image.png

十、验证

  1. 在/data目录下创建文件hello.txt
[root@hadoop1 hadoop-2.7.7]# cd /data
[root@hadoop1 hadoop-2.7.7]# echo hello world > hello.txt
[root@hadoop1 hadoop-2.7.7]# cd /usr/local/hadoop/hadoop-2.7.7
[root@hadoop1 hadoop-2.7.7]# ./bin/hdfs dfs -mkdir /test
[root@hadoop1 hadoop-2.7.7]# ./bin/hdfs dfs -put /data/hello.txt /test
[root@hadoop1 hadoop-2.7.7]# ./bin/hdfs dfs -cat /test/hello.txt

  1. HA故障自动切换
    [root@hadoop1 hadoop-2.7.7]# jps
    [root@hadoop1 hadoop-2.7.7]# kill -9 pid #namenode pid

通过页面查看节点状态
http://172.16.0.170:50070/dfshealth.html#tab-overview
已经访问不了了

image.png
上一篇下一篇

猜你喜欢

热点阅读