vmware12 pro & ubuntu16.04 &
最详细的教你怎么装spark教程!没有之一!
install vmware
一路回车即可(自定义安装路径,许可证秘钥自己搜)
install ubuntu
step1:创建虚拟机——自定义——下一步
1.PNGstep2:下一步
2.PNGstep3:稍后安装操作系统——下一步
3.PNGstep4:linux——自己版本系统——下一步
4.PNGstep5:自定义名称和路径——下一步
5.PNG
step6:cpu核心数分配——下一步
6.PNGstep7:内存分配——2GB(8GB宿主机)——下一步
7.PNG
step8:NAT网络模式,共享主机ip连接网络——下一步
8.PNGstep9:下一步
9.PNGstep10:下一步
10.PNGstep11:下一步
11.PNGstep12:30GB(1T宿主机)——下一步
12.PNGstep13:在这里考虑的自己需要同时运行3台虚拟机再加上自己的宿主机,自定义硬件修改内存为1GB——完成。
13.PNG
step14:
14.PNGstep15:编辑虚拟机设置——CD/DVD——使用ISO映像文件
15.PNGstep16:选项——添加宿主机与虚拟机的共享文件——添加——回车即可——确定——完成
![17.PNG](http:https://img.haomeiwen.com/i4112897/dfab8c8cb1f7ff66.PNG?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)step17:开启此虚拟机
18.PNGstep18:确定
19.PNG20.PNG
step19:install-Ubuntu
21.PNG
step20: continue
22.PNG
step21:默认分区方法
23.PNG
step:22:选择时间地区——键盘模式
24.PNG 26.PNG
键盘选择时continue按键不可见,拖动安装窗口使其可见
27.PNG
step23:define user info
28.PNG
continue
29.PNG
安装完
30.PNG
step24:安装vmtools
虚拟连接internet
手动启动蓝色的服务:由于使用的是学校的锐捷客户端,就算你把服务启动还是会把你的服务kill掉,但是使用无线网络就不会。
31.PNG 32.PNG
准备工作
root登录
后面总是需要root权限,所以直接用root登录系统。
安装搜狗入法
1.fctix
我的理解是:一个类似输入法管理平台的东西,安装方法很多种。
2.sougou
下载deb包:cd 到目录下
dpkg -i xxx.deb(重启系统)
谷歌拼音很好用
lantern
这里buildspark时需要翻墙,github找到对应的版本即可。
下载 jdk scala hadoop spark
根据spark官网的要求下载好对应的版本不然自己坑自己!
下载直接在宿主机下载,下载好的文件放入前面设置的共享文件夹里面。这样直接可以在虚拟机里面用。
个人解压到home目录下,这里自己创建了一个apps文件夹,解压命令如下。
tar -xzvf spa* -C /root/apps/
tar -xzvf jdk* -C /root/apps/
...
环境变量
gedit /etc/profile文件中,配置好记得source一下(你也可在在bashrc .profile等配置文件配置环境变量,这里为了避免不必要麻烦直接粗暴在profile里面配置,对所有用户和bash等都起作用)
############################user environment#####################################
# jdk
export JAVA_HOME=/root/Software/jdk1.8
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
# scala
export SCALA_HOME=/root/Software/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
export PATH=$SCALA_HOME/bin:$PATH
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# set hadoop env
export HADOOP_HOME=/root/Software/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export OPTS="-Djava.library.path=${HADOOP_HOME}/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native/:$LD_LIBRARY_PATH
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# set spark environment
export SPARK_HOME=/root/Software/spark-2.1.0-bin-hadoop2.7
export PATH=${SPARK_HOME}/sbin:$PATH
export PATH=${SPARK_HOME}/bin:$PATH
然后运行 java -version scala检验是否成功,至于spark和hadoop后面再验证。
SSH 免秘钥登录
- 安装服务
apt-get install openssh-server
- 安装客户端
apt-get install openssh-client
apt-get install ssh
- 修改配置文件和关闭防火墙
root@sparkmaster:~# gedit /etc/ssh/sshd_config
# PermitRootLogin prohibit-password
PermitRootLogin yes
root@sparkmaster:~# ufw disable
-
克隆两台虚拟机(选中虚拟机——右键——管理——克隆)
34.PNG
- 分别启动3台虚拟机、修改3台机器名字
集群管理机器:master
节点:slave1 slave2
gedit /etc/hostname
- 配置网络
点击机器右上角的网络——edit connections——add——create——(类型选着以太网)——ipv4Seting——
master
36.PNG
slave1
37.PNG
slave2
address:192.168.2.112(重启所有虚拟机生效),切换连接网络并查看配置是否成功(命令如下)
ifconfig
- 编辑 /etc/hosts文件,删除原有的机器名以及IP地址,例如本机sparkmaster,添加下面的,localhost不要删除
192.168.2.100 master
192.168.2.111 slave1
192.168.2.112 slave2
检验是否成功:在master机器上,ctrl+c 结束测试,同样在slave上都进行如此操作,ping master and another salve(确保自己NAT服务开启,并虚拟机网络都切换到先前配置的网络下进行测试)
root@master:~# ping slave1
PING slave1 (192.168.2.111) 56(84) bytes of data.
64 bytes from slave1 (192.168.2.111): icmp_seq=1 ttl=64 time=0.470 ms
64 bytes from slave1 (192.168.2.111): icmp_seq=2 ttl=64 time=0.528 ms
64 bytes from slave1 (192.168.2.111): icmp_seq=3 ttl=64 time=0.534 ms
64 bytes from slave1 (192.168.2.111): icmp_seq=4 ttl=64 time=0.569 ms
64 bytes from slave1 (192.168.2.111): icmp_seq=5 ttl=64 time=0.541 ms
64 bytes from slave1 (192.168.2.111): icmp_seq=6 ttl=64 time=0.486 ms
64 bytes from slave1 (192.168.2.111): icmp_seq=7 ttl=64 time=0.211 ms
64 bytes from slave1 (192.168.2.111): icmp_seq=8 ttl=64 time=0.250 ms
64 bytes from slave1 (192.168.2.111): icmp_seq=9 ttl=64 time=0.543 ms
^C
--- slave1 ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8171ms
rtt min/avg/max/mdev = 0.211/0.459/0.569/0.126 ms
root@master:~# ping slave2
PING slave2 (192.168.2.112) 56(84) bytes of data.
64 bytes from slave2 (192.168.2.112): icmp_seq=1 ttl=64 time=0.884 ms
64 bytes from slave2 (192.168.2.112): icmp_seq=2 ttl=64 time=0.329 ms
64 bytes from slave2 (192.168.2.112): icmp_seq=3 ttl=64 time=0.366 ms
64 bytes from slave2 (192.168.2.112): icmp_seq=4 ttl=64 time=0.559 ms
64 bytes from slave2 (192.168.2.112): icmp_seq=5 ttl=64 time=0.500 ms
64 bytes from slave2 (192.168.2.112): icmp_seq=6 ttl=64 time=0.497 ms
- 启动所有虚拟机ssh服务
root@master:~# /etc/init.d/ssh start
[ ok ] Starting ssh (via systemctl): ssh.service.
- 台虚拟生成公钥(一直回车即可)
ssh-keygen -t rsa
- 讲slave的公钥给master
root@slave2:~# cd /root/.ssh
root@slave2:~/.ssh# ls
id_rsa id_rsa.pub
root@slave2:~/.ssh# scp id_rsa.pub root@master:~/.ssh/id_rsa.pub.slave2
The authenticity of host 'master (192.168.2.100)' can't be established.
ECDSA key fingerprint is SHA256:OLQqNgObhdHVF4jgUi+IqsOTfgrWblbmXlcdqcF6IDg.
Are you sure you want to continue connecting (yes/no)?
Host key verification failed.
lost connection
root@slave2:~/.ssh# scp id_rsa.pub root@master:~/.ssh/id_rsa.pub.slave2
The authenticity of host 'master (192.168.2.100)' can't be established.
ECDSA key fingerprint is SHA256:OLQqNgObhdHVF4jgUi+IqsOTfgrWblbmXlcdqcF6IDg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master,192.168.2.100' (ECDSA) to the list of known hosts.
root@master's password:
id_rsa.pub 100% 393 0.4KB/s 00:00
root@slave2:~/.ssh#
root@slave1:~/.ssh# scp id_rsa.pub root@master:~/.ssh/id_rsa.pub.slave1
The authenticity of host 'master (192.168.2.100)' can't be established.
ECDSA key fingerprint is SHA256:OLQqNgObhdHVF4jgUi+IqsOTfgrWblbmXlcdqcF6IDg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master,192.168.2.100' (ECDSA) to the list of known hosts.
root@master's password:
id_rsa.pub 100% 393 0.4KB/s 00:00
root@slave1:~/.ssh#
check
root@master:~/.ssh# ls
id_rsa id_rsa.pub id_rsa.pub.slave1 id_rsa.pub.slave2
root@master:~/.ssh#
- 合并认证并分发
root@master:~/.ssh# cat id_rsa.pub* >> authorized_keys
root@master:~/.ssh# ls
authorized_keys id_rsa id_rsa.pub id_rsa.pub.slave1 id_rsa.pub.slave2
root@master:~/.ssh# cat id_rsa.pub* >> authorized_keys
root@master:~/.ssh# ls
authorized_keys id_rsa id_rsa.pub id_rsa.pub.slave1 id_rsa.pub.slave2
root@master:~/.ssh# scp authorized_keys root@slave1:~/.ssh/
The authenticity of host 'slave1 (192.168.2.111)' can't be established.
ECDSA key fingerprint is SHA256:OLQqNgObhdHVF4jgUi+IqsOTfgrWblbmXlcdqcF6IDg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave1,192.168.2.111' (ECDSA) to the list of known hosts.
root@slave1's password:
authorized_keys 100% 1179 1.2KB/s 00:00
root@master:~/.ssh# scp authorized_keys root@slave2:~/.ssh/
The authenticity of host 'slave2 (192.168.2.112)' can't be established.
ECDSA key fingerprint is SHA256:OLQqNgObhdHVF4jgUi+IqsOTfgrWblbmXlcdqcF6IDg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave2,192.168.2.112' (ECDSA) to the list of known hosts.
root@slave2's password:
authorized_keys 100% 1179 1.2KB/s 00:00
root@master:~/.ssh#
检查slave下面是否都有authorized_keys文件。在进行下面操作,各台机器之间可以任意切换而不要秘钥
root@master:~/.ssh# ssh slave1
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.8.0-36-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
427 packages can be updated.
205 updates are security updates.
root@slave1:~# ssh master
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.8.0-36-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
427 packages can be updated.
205 updates are security updates.
root@master:~# ssh slave2
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.8.0-36-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
427 packages can be updated.
205 updates are security updates.
root@slave2:~# ssh master
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.8.0-36-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
427 packages can be updated.
205 updates are security updates.
Last login: Tue Oct 10 10:04:24 2017 from 192.168.2.111
再谈环境变量(spark hadoop)
hadoop
- ~/apps/hadoop-2.7.3/etc/hadoop/hadoop-env.sh, 虚拟机都添加
export JAVA_HOME=/root/apps/jdk1.8.0_121
- ~/apps/hadoop-2.7.3/etc/hadoop/yarn-env.sh, 加入
export JAVA_HOME=/root/apps/jdk1.8.0_121
- ~/apps/hadoop-2.7.3/etc/hadoop/slaves,注释掉localhost追加
所有slave的名称或者ip地址
- core-site.xml,和上面文件同目录下,需要修改的是缓存地址值,并在hadoop更目录新建tmp文件
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/root/apps/hadoop-2.7.3/tmp</value>
</property>
</configuration>
- hdfs-site.xml:在hadoop更目录下新建hdfs文件并在其下建立两个子文件data和name,设置数据备份值3(自行定义)
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/root/apps/hadoop-2.7.3/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/root/apps/hadoop-2.7.3/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
- 复制mapred-site.xml.template 并新建mapred-site.xml文件
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
到这里hadoop终于配置完了,前面所有虚拟机都得配置,我的建议先配置一台scp到每个slave。前面入坑了!!!3台机器都手动配置的。下面检验结果
初始化:只需要在第一启动时运行,如果出事话失败再次初始化记得删除并新建前面的hdfs文件!
bin/hadoop namenode -format
启动所有hadoop进程
sbin/start-all.sh
查看进程
root@master:~/apps/hadoop-2.7.3# jps
4337 ResourceManager
4616 Jps
4188 SecondaryNameNode
3678 NameNode
master:8088
38.PNGspark
路径:~/apps/spark-2.1.0-bin-hadoop2.7/conf
- 复制spark-env.sh.template,新建spark-env.sh
#!/usr/bin/env bash
export SCALA_HOME=/root/apps/scala-2.11.8
export JAVA_HOME=/root/apps/jdk1.8.0_121
export HADOOP_HOME=/root/apps/hadoop-2.7.3
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/root/apps/spark-2.1.0-bin-hadoop2.7
SPARK_DRIVER_MEMORY=512M
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
- 复制slaves.template 内容,新建slaves,注释掉localhost添加slaves
# localhost
slave1
slave2
同样每台虚拟机spark都需要这么配置
- 启动spark(保持前面hadoop已经启动)
root@master:~/apps/spark-2.1.0-bin-hadoop2.7# sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /root/apps/spark-2.1.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /root/apps/spark-2.1.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /root/apps/spark-2.1.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
slave1: failed to launch: nice -n 0 /root/apps/spark-2.1.0-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
slave1: full log in /root/apps/spark-2.1.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave2: failed to launch: nice -n 0 /root/apps/spark-2.1.0-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
slave2: full log in /root/apps/spark-2.1.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
root@master:~/apps/spark-2.1.0-bin-hadoop2.7# jps
5168 Jps
4337 ResourceManager
5083 Master
4188 SecondaryNameNode
3678 NameNode
root@master:~/apps/spark-2.1.0-bin-hadoop2.7#
master:8080
39.PNGmaster:4040 仅仅在有任务的时候才能打开
cd spark_home/bin
./spark-shell
ide的选择这里选择idea,以前就用过强烈推荐
1.下载好先压缩包
2.执行安装文件(授权并执行)
root@ubuntu:~/Software/idea-IC-171.4694.23/bin# ls
appletviewer.policy fsnotifier-arm idea.sh printenv.py
format.sh idea64.vmoptions idea.vmoptions restart.py
fsnotifier idea.png inspect.sh
fsnotifier64 idea.properties log.xml
root@ubuntu:~/Software/idea-IC-171.4694.23/bin# chmod +x ./idea.sh
root@ubuntu:~/Software/idea-IC-171.4694.23/bin# ./idea.sh
Sep 09, 2017 1:58:47 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
Sep 09, 2017 1:58:47 PM java.util.prefs.FileSystemPrefere
3.根据提示指引安装,最后一部是安装scala插件,记得点上,如果忘记也还可以进去ide后再次安装。