spark集群安装
2021-12-02 本文已影响0人
大胖圆儿小姐
一、安装概述
本文将继续配置我的虚拟机,此文章需要基于hadoop平台安装成功才可spark集群,如需参考请点击链接:https://www.jianshu.com/p/1bbfbb3968b6,我的虚拟机的情况也在此篇文章说明了,jdk及hadoop是安装spark集群的依赖环境,此处不再赘述了。spark所选版本3.0.3,而spark 3.0+是基于scala 2.12版本编译的,所以还需要安装scala2.12。
二、软件选择
- spark版本的选择,附spark官网的地址,http://spark.apache.org/downloads.html
image.png - scala版本的选择,附scala官网的地址,https://www.scala-lang.org/download/2.12.2.html
image.png
三、scala环境的安装,三台机器都要执行以下操作
- 首先上传scala压缩包上传到hadoop用户的根目录
[hadoop@hadoop01 ~]# ll
total 633388
drwxrwxrwx. 11 hadoop hadoop 173 Nov 13 09:08 hadoop-2.10.1
-rw-r--r--. 1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
-rw-r--r--. 1 hadoop hadoop 19596088 Nov 30 17:14 scala-2.12.2.tgz
[hadoop@hadoop01 ~]#
- 非必要步骤,如后续上传文件出现此权限问题,也可使用此方法。当用户权限不是hadoop时,可使用chown命令修改用户权限,此命令需要在root用户下执行
[hadoop@hadoop01 ~]$ exit
exit
You have new mail in /var/spool/mail/root
[root@hadoop01 ~]# cd /home/hadoop/
[root@hadoop01 hadoop]# ll
total 633388
drwxrwxrwx. 11 hadoop hadoop 173 Nov 13 09:08 hadoop-2.10.1
-rw-r--r--. 1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
-rw-r--r--. 1 hadoop hadoop 19596088 Nov 30 17:14 scala-2.12.2.tgz
[root@hadoop01 hadoop]$ chown -R hadoop:hadoop scala-2.12.2.tgz
[root@hadoop01 hadoop]# su hadoop
- 解压scala文件
[hadoop@hadoop01 ~]$ tar -zxvf scala-2.12.2.tgz
[hadoop@hadoop01 ~]$ cd scala-2.12.2
[hadoop@hadoop01 scala-2.12.2]$ pwd
/home/hadoop/scala-2.12.2
- 修改scala的环境变量
[hadoop@hadoop01 scala-2.12.2]$ vim ~/.bashrc
export HADOOP_HOME=/home/hadoop/hadoop-2.10.1
export SCALA_HOME=/home/hadoop/scala-2.12.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin
- 使环境变量生效
[hadoop@hadoop01 scala-2.12.2]# source ~/.bashrc
- 验证scala是否安装成功
[root@hadoop01 scala-2.12.2]# scala -version
Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.
三、spark环境的安装
- 首先上传spark压缩包上传到hadoop01虚拟机hadoop用户的根目录
[hadoop@hadoop01 ~]$ ll
total 633388
drwxrwxrwx. 11 hadoop hadoop 173 Nov 13 09:08 hadoop-2.10.1
-rw-r--r--. 1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
drwxrwxr-x. 6 hadoop hadoop 50 Apr 13 2017 scala-2.12.2
-rw-r--r--. 1 hadoop hadoop 19596088 Nov 30 17:14 scala-2.12.2.tgz
-rw-r--r--. 1 hadoop hadoop 220400553 Nov 30 17:14 spark-3.0.3-bin-hadoop2.7.tgz
You have new mail in /var/spool/mail/root
- 解压spark文件
[hadoop@hadoop01 ~]$ tar -zxvf spark-3.0.3-bin-hadoop2.7.tgz
[hadoop@hadoop01 ~]$ cd spark-3.0.3-bin-hadoop2.7
[hadoop@hadoop01 spark-3.0.3-bin-hadoop2.7]$ pwd
/home/hadoop/spark-3.0.3-bin-hadoop2.7
- 配置slaves文件
[hadoop@hadoop01 spark-3.0.3-bin-hadoop2.7]$ cd conf
[hadoop@hadoop01 conf]$ mv slaves.template slaves
[hadoop@hadoop01 conf]$ vim slaves
#将localhost修改成一下三个节点的名称
hadoop01
hadoop02
hadoop03
- 配置spark-env.sh文件
[hadoop@hadoop01 conf]$ mv spark-env.sh.template spark-env.sh
[hadoop@hadoop01 conf]$ vim spark-env.sh
export MASTER=spark://172.16.100.26:7077
export SPARK_MASTER_IP=172.16.100.26
- 配置spark的环境变量
[hadoop@hadoop01 conf]$ vim ~/.bashrc
export JAVA_HOME=/usr/local/java
export HADOOP_HOME=/home/hadoop/hadoop-2.10.1
export SCALA_HOME=/home/hadoop/scala-2.12.2
export SPARK_HOME=/home/hadoop/spark-3.0.3-bin-hadoop2.7
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin
- 使环境变量生效
[hadoop@hadoop01 conf]$ source ~/.bashrc
- 将hadoop01节点的spark软件拷贝到hadoop02、hadoop03节点的根目录
[hadoop@hadoop01 ~]$ scp -r spark-3.0.3-bin-hadoop2.7 hadoop@hadoop02:~
[hadoop@hadoop01 ~]$ scp -r spark-3.0.3-bin-hadoop2.7 hadoop@hadoop03:~
- 修改hadoop02、hadoop03节点的环境变量,操作命令按照5、6的方法
- 启动spark集群
[hadoop@hadoop01 sbin]$ ./stop-all.sh
hadoop02: stopping org.apache.spark.deploy.worker.Worker
hadoop03: stopping org.apache.spark.deploy.worker.Worker
hadoop01: no org.apache.spark.deploy.worker.Worker to stop
stopping org.apache.spark.deploy.master.Master
[hadoop@hadoop01 sbin]$ ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop01.out
hadoop01: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out
hadoop03: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop03.out
hadoop02: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop02.out
-
在网页上通过ui界面验证启动结果,http://172.16.100.26:8080
image.png
四、我遇到的问题
- 在安装jdk时,将环境变量配置在/etc/profile中,想要把jdk当成全局变量去使用,所以我在3.5步骤去配置当前用户的环境变量时,没有设置JAVA_HOME,于是启动报了如下错误:
[hadoop@hadoop01 sbin]$ ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop01.out
hadoop03: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop03.out
hadoop02: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop02.out
hadoop01: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out
hadoop01: failed to launch: nice -n 0 /home/hadoop/spark-3.0.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop01:7077
hadoop01: JAVA_HOME is not set
hadoop01: full log in /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out
很明显,是由于读取不到java的环境变量,于是我在当前目录的.bashrc文件中配置了,成功启动,想不通为什么会这样,全局变量中设置的环境变量怎么就不生效呐?姑且把它当成spark的遗留小问题吧,等我想起来再来探寻答案!!!