spark standalone 动态资源分配

2018-05-27 本文已影响0人金刚_30bf

版本： 2.3.0

standalone 模式下，在conf/spark-default.conf文件中配置：

参数 spark.dynamicAllocation.enabled ：配置为 true 。
参数spark.shuffle.service.enabled ：配置为true 。
（配置动态分配必须启动外部shuffle service 扩展，对于standalone模式，只需要配置为true即可）

参数 spark.dynamicAllocation.schedulerBacklogTimeout ：任务排队后首次触发executor申请的时间。默认1s。

参数spark.dynamicAllocation.sustainedSchedulerBacklogTimeout ：首次申请executor未成功后，循环触发申请的时间间隔。默认同首次触发时间。

executor在首次申请时会有一个，后面申请将呈指数增长。

参数spark.dynamicAllocation.executorIdleTimeout : executor 空闲时间达到该指标时会返回。默认60s。
参数spark.dynamicAllocation.cachedExecutorIdleTimeout ：空闲executor退出之前，缓存的数据保留时间，默认 infinity。

如下参数也可以设置，
spark.dynamicAllocation.minExecutors：动态分配的最小执行者
spark.dynamicAllocation.maxExecutors：动态分配的最大执行者
spark.dynamicAllocation.initialExecutors：动态分配的初始化执行者数，默认为最小执行数相同。若指定了 --num-executors (or spark.executor.instances) ，且比该数值大，则会使用指定的值。

配置外部shuffle的端口，默认为7337
spark.shuffle.service.port=7338
（由于我同时配置了yarn的 spark_shuffle ， nondemanager 已经占有了该端口，故换成其他端口！）

使用命令：

./bin/spark-shell --master spark://node202.hmbank.com:7077

通过webui 可以看到，当前executor的个数为0 。

执行命令触发action ：

val r1 = sc.textFile("spark://hmcluster/password/input/ruokouling/*.txt")
r1.count()

可以看到 exector 的个数迅速增加到6个。

图片.png

由于截图时间比较晚，可以看到，这些executor已经dead 被回收了。

spark-default.conf

# Example:
spark.master                     spark://node202.hmbank.com:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://hmcluster/user/spark/eventLog
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              1g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
# 配置动态分配  
# 参数 spark.dynamicAllocation.enabled ： 配置为 true  。
# 参数spark.shuffle.service.enabled ： 配置为true  。
# （配置动态分配必须启动外部shuffle service 扩展， 对于standalone模式，只需要配置为true即可）

# 参数 spark.dynamicAllocation.schedulerBacklogTimeout ： 任务排队后首次触发executor申请的时间。  默认1s。

# 参数spark.dynamicAllocation.sustainedSchedulerBacklogTimeout ： 首次申请executor未成功后，循环触发申请的时间间隔 。  默认同首次触发时间。

# executor在首次申请时会有一个， 后面申请将呈指数增长。  

# 参数spark.dynamicAllocation.executorIdleTimeout  : executor 空闲时间达到该指标时会返回。 默认60s。 
# 参数spark.dynamicAllocation.cachedExecutorIdleTimeout ： 空闲executor退出之前，缓存的数据保留时间， 默认 infinity。 
# 参下数也可以设置， 
# spark.dynamicAllocation.minExecutors： 动态分配的最小执行者
# spark.dynamicAllocation.maxExecutors： 动态分配的最大执行者
# spark.dynamicAllocation.initialExecutors：动态分配的初始化执行者数，默认为最小执行数相同。 若指定了 `--num-executors` (or `spark.executor.instances`) ，且比该数值大， 则会使用指定的值。 
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true

# 7337 端口是 external shuffle service 的默认端口 
spark.shuffle.service.port=7338

spark standalone 动态资源分配

猜你喜欢

热点阅读