spark文档记录

2018-08-27  本文已影响0人  大大大大大大大熊

Receiver的可靠性

备份数据的时候,如果发生故障,缓冲中的数据不会发送AC,然后在之后source会重新发送

repartition(numPartitions)  #Changes the level of parallelism in this DStream by creating more or fewer partitions.  可以设置DStream的并行度
reduceByKey(func, [numTasks])   #使用默认的spark.default.parallelism设置的task并行度,也可以使用
numTasks参数设置

注意:本地模式运行的时候,master不能设置成local or local[1],这个会导致只有一个cpu在接收数据,没有cpu进行处理数据,至少是2或者更多。
(Do not run Spark Streaming programs locally with master configured as local or local[1]. This allocates only one CPU for tasks and if a receiver is running on it, there is no resource left to process the received data. Use at least local[2] to have more cores.)


WordCount的结果写入文件:

wordCounts.foreachRDD(rdd ->{
              if(!rdd.isEmpty()){
                 rdd.coalesce(1).saveAsTextFile("E:\\SS_result"); #文件夹
              }
          });

文件夹内容:

SS_result

打开part-00000:

part-00000

spark-submit的使用

# Run application locally on 8 cores
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100

# Run on a Spark standalone cluster in client deploy mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000

# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000

一些常用的 options(选项)有 :

上一篇 下一篇

猜你喜欢

热点阅读