Spark开发--HA集群

2019-12-03  本文已影响0人  无剑_君

一、说明

Spark Standalone 集群是Master--Slaves架构的集群模式,和大部分的Master--Slaves 结构集群一样,存在着Master单点故障的问题。

  1. 基于文件系统的单点恢复
      主要用于开发或测试环境,spark提供目录保存spark Application 和worker的注册信息,并将他们的恢复状态写入该目录中,这时,一旦Master发生故障,就可以通过重新启动Master进程(sbin/strart--master.sh),恢复已运行的spark Application 和 worker 的注册信息。(就是需要自己亲自再去启动master)。
  2. 基于zookeeper的 Standby Masters
      主要用于生产模式。其基本原理是通过zookeeper来选举一个Master,其他的Master处于Standby状态。将spark集群连接到同一个zookeeper实例并启动多个Master,利用zookeeper提供的选举和状态保存功能,可以使一个Master被选举成活着的master,而其他Master处于Standby状态。如果现任Master宕机,另一个Master会通过选举产生并恢复到旧的Master状态,然后恢复状态。整个恢复过程可能要1-2分钟。

二、环境设置

  1. 集群规划
服务器 IP地址 软件 服务 备注
master 192.168.247.131 JDK、Scala、Spark、zookeeper master 主机
slave1 192.168.247.132 JDK、Scala、Spark、zookeeper worker、master 主备、从机
slave2 192.168.247.130 JDK、Scala、Spark worker 从机
  1. 主机配置
192.168.247.131  master
192.168.247.132  slave1
192.168.247.130  slave2

  1. 配置免密
# 生成公私钥(所有主机)
root@master:~# cd .ssh
root@master:~/.ssh# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:hHzY7V1EXy+RiWPOkZhJ+BekfIZGpbNwQI8pyE8QJyQ root@master
The key's randomart image is:
+---[RSA 2048]----+
|  E.=...ooo*o++o.|
|   o * +.O++B.ooo|
|    o * B.@+o+o o|
|     o + =.*+. . |
|      . S o..    |
|                 |
|                 |
|                 |
|                 |
+----[SHA256]-----+
root@master:~/.ssh# 
# 复制公钥到slave1
root@master:~/.ssh# ssh-copy-id slave1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@slave1's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'slave1'"
and check to make sure that only the key(s) you wanted were added.
# 复制公钥到slave2
root@master:~/.ssh# ssh-copy-id slave2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@slave2's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'slave2'"
and check to make sure that only the key(s) you wanted were 

# 免密测试
root@master:~/.ssh# ssh slave1
root@master:~/.ssh# ssh slave2

三、前置条件

1、安装JDK

root@master:~# apt install openjdk-8-jdk -y
# 验证
root@master:~# java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

  1. 安装JScala
# 下载
root@slave2:~# root@master:~# wget https://downloads.lightbend.com/scala/2.12.10/scala-2.12.10.tgz

# 解压
root@master:~# tar -zxvf scala-2.12.10.tgz -C /usr/local

3、安装zookeeper

# 创建文件(复制模板)
root@master:~# cp /zookeeper/zookeeper-3.4.12/conf/zoo_sample.cfg /zookeeper/zookeeper-3.4.12/conf/zoo.cfg
# 修改配置:
root@master:~# vi /zookeeper/zookeeper-3.4.12/conf/zoo.cfg

# 内容
tickTime=2000
syncLimit=5
dataDir=/zookeeper/tmp
clientPort=2181

# 创建data和datalog两个目录
root@master:~# mkdir -p /zookeeper/data
root@master:~# mkdir -p /zookeeper/datalog

# 打包分发文件
root@master:/usr/local# tar -cvf zookeeper.tar zookeeper-3.4.14/
root@master:/usr/local# scp zookeeper.tar root@slave1:/root
root@master:/usr/local# scp zookeeper.tar root@slave2:/root

root@master:~# cd /zookeeper/data
root@master:/zookeeper/data# echo 1 > myid
root@slave1:/zookeeper/data# echo 2 > myid
root@slave2:/zookeeper/data# echo 3 > myid
  1. 配置环境变量

三、下载安装

下载地址:http://spark.apache.org/downloads.html

# 下载
[hadoop@hadoop1 ~]$ ls
apps     data      exam        inithive.conf  movie     spark-2.3.0-bin-hadoop2.7.tgz  udf.jar
cookies  data.txt  executions  json.txt       projects  student                        zookeeper.out
course   emp       hive.sql    log            sougou    temp

# 解压
[hadoop@hadoop1 ~]$ tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz -C apps/

# 创建一个软连接
[hadoop@hadoop1 ~]$ cd apps/
[hadoop@hadoop1 apps]$ ls
hadoop-2.7.5  hbase-1.2.6  spark-2.3.0-bin-hadoop2.7  zookeeper-3.4.10  zookeeper.out
[hadoop@hadoop1 apps]$ ln -s spark-2.3.0-bin-hadoop2.7/ spark
[hadoop@hadoop1 apps]$ ll

四、配置

(1)配置文件spark-env.sh

# 复制spark-env.sh.template并重命名为spark-env.sh
root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh
# 在文件最后添加配置内容
root@master:~# vi /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh

export JAVA_HOME=/usr/local/jdk1.8.0_231
export SPARK_WORKER_MEMORY=500m
export SPARK_WORKER_CORES=1
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,master:2181,master:2181 -Dspark.deploy.zookeeper.dir=/spark"

zookeeper会保存spark集群的所有的状态信息,包括所有的Workers信息,所有的Applactions信息,所有的Driver信息,如果集群 。

参数说明:
1.spark.deploy.recoveryMode:恢复模式(Master 重新启动的模式):有三种:(1):zookeeper(2):FileSystem(3):none。
-Dspark.deploy.recoveryMode=ZOOKEEPER #说明整个集群状态是通过zookeeper来维护的,整个集群状态的恢复也是通过zookeeper来维护的。就是说用zookeeper做了spark的HA配置,Master(Active)挂掉的话,Master(standby)要想变成Master(Active)的话,Master(Standby)就要像zookeeper读取整个集群状态信息,然后进行恢复所有Worker和Driver的状态信息,和所有的Application状态信息;
2.spark.deploy.zookeeper.url:zookeeper的server地址。
-Dspark.deploy.zookeeper.url将所有配置了zookeeper,并且在这台机器上有可能做master(Active)的机器都配置进来;
3.spark.deploy.zookeeper.dir:保存集群元数据信息的文件,目录。包括Worker,Driver和Application。
-Dspark.deploy.zookeeper.dir是保存spark的元数据,保存了spark的作业运行状态;

注意:
在普通模式下启动spark集群,只需要在主机上面执行start-all.sh就可以了。
在高可用模式下启动spark集群,现需要在任意一台节点上启动start-all,然后在另外一台节点上单独启动master。命令:start-master.sh
(2)复制slaves.template成slaves

root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/slaves.template  /usr/local/spark-2.4.4-bin-hadoop2.7/conf/slaves
root@master:~# vi /usr/local/spark-2.4.4-bin-hadoop2.7/conf/slaves

# 添加如下内容
master
slave1
slave2

(3)将安装包分发给其他节点

# 分发spark-env.sh文件
root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
spark-env.sh                                                                                                                                                           100% 4556     6.2MB/s   00:00 
root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
spark-env.sh                                                                                                                                                           100% 4556     9.2MB/s   00:00 

# 分发slaves文件
root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/slaves root@slave1:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
slaves                                                                                                                                                                 100%  877   109.2KB/s   00:00 
root@master:~# scp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/slaves root@slave2:/usr/local/spark-2.4.4-bin-hadoop2.7/conf/
slaves                                                                                                                                                                 100%  877     3.1MB/s   00:00 

(4)配置环境变量

# 所有节点均要配置
export JAVA_HOME=/usr/local/jdk1.8.0_231
export SCALA_HOME=/usr/local/scala-2.13.1
export CLASSPATH=.:${JAVA_HOME}/lib
export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7
export ZOOKEEPER_HOME=/usr/local/zookeeper-3.4.14
export PATH=$PATH:${JAVA_HOME}/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$ZOOKEEPER_HOME/bin

五、启动

1、先启动zookeeper集群
所有节点均要执行

root@master:~# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.14/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@master:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: follower

root@slave1:~# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.14/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@slave1:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: leader

root@slave2:~# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.14/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@slave2:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: follower

2、启动Spark集群
任意一个节点执行即可

# 必须使用start-all.sh
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# ./sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
master: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
86337 Jps
86247 Worker
86041 Master
62157 QuorumPeerMain

# 启动备用master
root@slave1:~# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-slave1.out
root@slave1:~# jps
60006 QuorumPeerMain
84871 Master
86089 Jps
84527 Worker

root@slave2:~# start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.4.4-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-slave2.out
root@slave2:~# jps
82962 QuorumPeerMain
113554 Jps
112855 Master
106846 Worker

3、Web查看状态


主节点
备节点1
备节点2

六、验证测试

1、查看Web界面Master状态
master为:ALIVE
slave1为:STANDBY
slave2为:STANDBY

2、验证HA的高可用
手动干掉hadoop1上面的Master进程,观察是否会自动进行切换

root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# kill -9 90584
root@master:/usr/local/spark-2.4.4-bin-hadoop2.7# jps
86247 Worker
93513 Jps
62157 QuorumPeerMain


备机切换为主
备机2

注意:
切换需要一点时间,大约1~2秒。等待后刷新。

七、常见问题

上一篇 下一篇

猜你喜欢

热点阅读