Spark开发--Local模式
2019-12-02 本文已影响0人
无剑_君
一、Local模式
Local 模式是最简单的一种Spark运行方式,它采用单节点多线程(cpu)方式运行,local模式是一种OOTB(开箱即用)的方式,只需要在spark-env.sh导出JAVA_HOME,无需其他任何配置即可使用,因而常用于开发和学习。
本地单机模式下,所有的Spark进程均运行于同一个JVM中,并行处理则通过多线程来实现。在默认情况下,单机模式启动与本地系统的CPU核心数目相同的线程。如果要设置并行的级别,则以local[N]的格式来指定一个master变量,N表示要使用的线程数目。
方式:
./spark-shell --master local[n]
n代表线程数
二、前置条件
1、Java8安装
下载地址:http://openjdk.java.net/
https://adoptopenjdk.net/releases.html
root@master:~# apt install openjdk-8-jdk -y
# 验证
root@master:~# java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
# JDK路径
root@master:~# whereis java
java: /usr/bin/java /usr/share/java /usr/share/man/man1/java.1.gz
# 搜索java系统命令的位置
root@master:~# which java
/usr/bin/java
# 下载安装
root@master:~# wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u232-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u232b09.tar.gz
2、Scala scala 2.13.1
1) 下载
下载地址:https://www.scala-lang.org/download/
信息查看
下载
# 下载
root@slave2:~# root@master:~# wget https://downloads.lightbend.com/scala/2.12.10/scala-2.12.10.tgz
# 解压
root@master:~# tar -zxvf scala-2.12.10.tgz -C /usr/local
注:如果要开发请使用2.12.10版本。
2) 配置环境变量
root@master:~# vi /etc/profile
#末尾添加
# 环境变量
export SCALA_HOME=/usr/local/scala-2.12.10
export PATH=$PATH:$SCALA_HOME/bin
# 立即生效
root@master:~# source /etc/profile
# 验证
root@master:~# scala -version
Scala code runner version 2.12.10-- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc.
三、下载安装Spark
- 下载安装
下载地址:http://spark.apache.org/downloads.html
下载地址
# 下载
root@master:~# wget https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
# 解压
root@master:~# tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /usr/local
- 配置环境变量
# 配置环境变量
export SCALA_HOME=/usr/local/scala-2.12.10
export HADOOP_HOME=/usr/local/hadoop-2.9.2
export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7
export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
# 环境变量立即生效
root@master:~# source /etc/profile
- 配置Spark
# 复制配置文件
root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh
# 配置服务器IP
root@master:~# vi /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh
# 添加内容
SPARK_LOCAL_IP=192.168.247.131
四、测试运行
- 指令示例
1)单线程
root@master:~# spark-shell --master local
19/12/01 18:03:58 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.1.1; using 192.168.247.131 instead (on interface ens33)
19/12/01 18:03:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/12/01 18:03:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.247.131:4040
Spark context available as 'sc' (master = local, app id = local-1575194657764).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
# 进程查看
root@master:~# jps
128720 SparkSubmit
129448 Jps
Web查看:http://192.168.247.131:4040/
Web查看
2)多线程
# 代表会有4个线程(每个线程一个core)来并发执行应用程序。
root@master:~# spark-shell --master local[4]
19/12/01 18:06:31 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.1.1; using 192.168.247.131 instead (on interface ens33)
19/12/01 18:06:31 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/12/01 18:06:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.247.131:4040
Spark context available as 'sc' (master = local[4], app id = local-1575194812191).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
Web查看:http://192.168.247.131:4040/
Web查看
本地运行该模式非常简单,只需要把Spark的安装包解压后,改一些常用的配置即可使用,而不用启动Spark的Master、Worker守护进程( 只有集群的Standalone方式时,才需要这两个角色),也不用启动Hadoop的各服务(除非你要用到HDFS),这是和其他模式的区别。
3)运行示例
该算法是利用蒙特·卡罗算法求圆周率PI,通过计算机模拟大量的随机数,最终会计算出比较精确的π。
10 为创建10个Executor 线程。
# Spark 测试程序计算圆周率
root@master:~# run-example SparkPi 10 --master local[2]
19/12/01 18:15:39 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.1.1; using 192.168.247.131 instead (on interface ens33)
19/12/01 18:15:39 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/12/01 18:15:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/12/01 18:15:41 INFO SparkContext: Running Spark version 2.4.4
19/12/01 18:15:41 INFO SparkContext: Submitted application: Spark Pi
19/12/01 18:15:42 INFO SecurityManager: Changing view acls to: root
19/12/01 18:15:42 INFO SecurityManager: Changing modify acls to: root
19/12/01 18:15:42 INFO SecurityManager: Changing view acls groups to:
19/12/01 18:15:42 INFO SecurityManager: Changing modify acls groups to:
19/12/01 18:15:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
19/12/01 18:15:42 INFO Utils: Successfully started service 'sparkDriver' on port 35983.
19/12/01 18:15:42 INFO SparkEnv: Registering MapOutputTracker
19/12/01 18:15:43 INFO SparkEnv: Registering BlockManagerMaster
19/12/01 18:15:43 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/01 18:15:43 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/01 18:15:43 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-be449821-e746-46b1-82a1-a85348a1d7c4
19/12/01 18:15:43 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
19/12/01 18:15:43 INFO SparkEnv: Registering OutputCommitCoordinator
19/12/01 18:15:43 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/12/01 18:15:44 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.247.131:4040
19/12/01 18:15:44 INFO SparkContext: Added JAR file:///usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/scopt_2.11-3.7.0.jar at spark://192.168.247.131:35983/jars/scopt_2.11-3.7.0.jar with timestamp 1575195344159
19/12/01 18:15:44 INFO SparkContext: Added JAR file:///usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar at spark://192.168.247.131:35983/jars/spark-examples_2.11-2.4.4.jar with timestamp 1575195344167
19/12/01 18:15:44 INFO Executor: Starting executor ID driver on host localhost
19/12/01 18:15:44 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46305.
19/12/01 18:15:44 INFO NettyBlockTransferService: Server created on 192.168.247.131:46305
19/12/01 18:15:44 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/12/01 18:15:44 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.247.131, 46305, None)
19/12/01 18:15:44 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.247.131:46305 with 413.9 MB RAM, BlockManagerId(driver, 192.168.247.131, 46305, None)
19/12/01 18:15:44 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.247.131, 46305, None)
19/12/01 18:15:44 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.247.131, 46305, None)
19/12/01 18:15:46 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
19/12/01 18:15:46 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
19/12/01 18:15:46 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
19/12/01 18:15:46 INFO DAGScheduler: Parents of final stage: List()
19/12/01 18:15:46 INFO DAGScheduler: Missing parents: List()
19/12/01 18:15:46 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
19/12/01 18:15:46 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
19/12/01 18:15:47 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB)
19/12/01 18:15:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.247.131:46305 (size: 1256.0 B, free: 413.9 MB)
19/12/01 18:15:47 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
19/12/01 18:15:47 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
19/12/01 18:15:47 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
19/12/01 18:15:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:47 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
19/12/01 18:15:47 INFO Executor: Fetching spark://192.168.247.131:35983/jars/spark-examples_2.11-2.4.4.jar with timestamp 1575195344167
19/12/01 18:15:47 INFO TransportClientFactory: Successfully created connection to /192.168.247.131:35983 after 100 ms (0 ms spent in bootstraps)
19/12/01 18:15:47 INFO Utils: Fetching spark://192.168.247.131:35983/jars/spark-examples_2.11-2.4.4.jar to /tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5/userFiles-55c5ee8c-34c0-4f1f-ac8a-3a2ab8fa186a/fetchFileTemp4550729342539992157.tmp
19/12/01 18:15:48 INFO Executor: Adding file:/tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5/userFiles-55c5ee8c-34c0-4f1f-ac8a-3a2ab8fa186a/spark-examples_2.11-2.4.4.jar to class loader
19/12/01 18:15:48 INFO Executor: Fetching spark://192.168.247.131:35983/jars/scopt_2.11-3.7.0.jar with timestamp 1575195344159
19/12/01 18:15:48 INFO Utils: Fetching spark://192.168.247.131:35983/jars/scopt_2.11-3.7.0.jar to /tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5/userFiles-55c5ee8c-34c0-4f1f-ac8a-3a2ab8fa186a/fetchFileTemp8485759279339378034.tmp
19/12/01 18:15:48 INFO Executor: Adding file:/tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5/userFiles-55c5ee8c-34c0-4f1f-ac8a-3a2ab8fa186a/scopt_2.11-3.7.0.jar to class loader
19/12/01 18:15:48 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 867 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
19/12/01 18:15:48 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 867 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1056 ms on localhost (executor driver) (1/10)
19/12/01 18:15:48 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
19/12/01 18:15:48 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 125 ms on localhost (executor driver) (2/10)
19/12/01 18:15:48 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 824 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
19/12/01 18:15:48 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 78 ms on localhost (executor driver) (3/10)
19/12/01 18:15:48 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 867 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO Executor: Running task 4.0 in stage 0.0 (TID 4)
19/12/01 18:15:48 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 77 ms on localhost (executor driver) (4/10)
19/12/01 18:15:48 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 867 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 98 ms on localhost (executor driver) (5/10)
19/12/01 18:15:48 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
19/12/01 18:15:48 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 824 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 150 ms on localhost (executor driver) (6/10)
19/12/01 18:15:48 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
19/12/01 18:15:48 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 824 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 58 ms on localhost (executor driver) (7/10)
19/12/01 18:15:48 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
19/12/01 18:15:48 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 824 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 55 ms on localhost (executor driver) (8/10)
19/12/01 18:15:48 INFO Executor: Running task 8.0 in stage 0.0 (TID 8)
19/12/01 18:15:48 INFO Executor: Finished task 8.0 in stage 0.0 (TID 8). 824 bytes result sent to driver
19/12/01 18:15:48 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 7866 bytes)
19/12/01 18:15:48 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 42 ms on localhost (executor driver) (9/10)
19/12/01 18:15:48 INFO Executor: Running task 9.0 in stage 0.0 (TID 9)
19/12/01 18:15:49 INFO Executor: Finished task 9.0 in stage 0.0 (TID 9). 910 bytes result sent to driver
19/12/01 18:15:49 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 64 ms on localhost (executor driver) (10/10)
19/12/01 18:15:49 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
19/12/01 18:15:49 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.359 s
19/12/01 18:15:49 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.735655 s
Pi is roughly 3.1434631434631433
19/12/01 18:15:49 INFO SparkUI: Stopped Spark web UI at http://192.168.247.131:4040
19/12/01 18:15:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/12/01 18:15:49 INFO MemoryStore: MemoryStore cleared
19/12/01 18:15:49 INFO BlockManager: BlockManager stopped
19/12/01 18:15:49 INFO BlockManagerMaster: BlockManagerMaster stopped
19/12/01 18:15:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/12/01 18:15:49 INFO SparkContext: Successfully stopped SparkContext
19/12/01 18:15:49 INFO ShutdownHookManager: Shutdown hook called
19/12/01 18:15:49 INFO ShutdownHookManager: Deleting directory /tmp/spark-deeed5e7-f7ef-4e86-9e50-fdee889328fe
19/12/01 18:15:49 INFO ShutdownHookManager: Deleting directory /tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5