Spark第一次任务提交(Scala和Python)

2021-12-16 本文已影响0人抬头挺胸才算活着

Spark提交scala程序的方法
1、在Spark-shell中像在python交互窗口一样敲scala代码，会提前有一个sc的变量代表SparkContext
2、使用类似spark-submit --class Hello HelloWorld.jar的命令提交一个jar包，其中Hello是入口类，HelloWorld.jar是打包好的jar包，并说明启动类，这个jar包里面的程序需要自己声明SparkContext变量
Spark提交python程序的方法
1、同上，但是打开的是pyspark
2、同上，但是提交的是xxx.py文件即可
spark-submit xxx.py
用Scala编写WordCount程序并打包成Jar

object WordCount {
  def main(args: Array[String]): Unit = {
    // setMaster设置主机URL，local表示本地，2表示线程个数
    val sparkConf: SparkConf = new SparkConf().setMaster("local[2]").setAppName("WordCount")
    val sc = new SparkContext(sparkConf)

    val lines: RDD[String] = sc.textFile("C:/java/spark_practise/src/main/resources/input/word.txt")
    wordCount1(lines)
    sc.stop()
  }

  def wordCount1(lines : RDD[String]): Unit = {
    val words: RDD[String] = lines.flatMap(_.split(" "))
    val wordToOne: RDD[(String, Int)] = words.map((_, 1))
    val wordToCount: RDD[(String, Int)] = wordToOne.reduceByKey(_ + _)
    wordToCount.foreach(println(_))
  }
}

Python脚本

import sys

from pyspark import SparkContext, SparkConf

if __name__ == "__main__":

  # create Spark context with Spark configuration
  conf = SparkConf().setAppName("Word Count - Python").set("spark.hadoop.yarn.resourcemanager.address", "192.168.0.104:8032")
  sc = SparkContext(conf=conf)

  # read in text file and split each document into words
  words = sc.textFile("C:/java/spark_practise/src/main/resources/input/word.txt").flatMap(lambda line: line.split(" "))

  # count the occurrence of each word
  wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda a,b:a +b)

  print("spark python output................")
  print(wordCounts.collect())

Spark第一次任务提交(Scala和Python)

猜你喜欢

热点阅读