Spark中使用Scala实现WordCount业务

2020-09-28  本文已影响0人  羋学僧

Spark中使用Scala实现WordCount业务

创建一个Project


sbt选择1.0.4

Scala选择2.11.8

配置路径

Project Sources


Dependencies

新建object

MyScalaWordCount.scala

本地模式

object MyScalaWordCount {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("MyScalaWordCount").setMaster("local");

    //创建一个SparkContext对象
    val sc = new SparkContext(conf)

    //执行WordCount
    val result = sc.textFile("hdfs://bigdata02:9000/wordcount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
    //打印在屏幕上
    result.foreach(println)
    //释放资源
    sc.stop()
}

}


导出Jar包在服务器上运行

MyScalaWordCount.scala

生成jar包

object MyScalaWordCount {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("MyScalaWordCount");

    //创建一个SparkContext对象
    val sc = new SparkContext(conf)

    //执行WordCount
    val result = sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
    //打印在屏幕上
    result.foreach(println)
    //释放资源
    sc.stop()
}

}

打包操作

Project Sources


Artifacts
详细步骤参考

Build Artifacts


导出成功

上传Jar包到服务器并执行

cd /home/bigdata/apps/spark-2.1.0-bin-hadoop2.7

./bin/spark-submit --master spark://bigdata02:7077 --class nx.MyScalaWordCount /home/bigdata/data/SparkScalaWork.jar hdfs://bigdata02:9000/wordcount.txt hdfs://bigdata02:9000/output/spark/wc0928
hdfs dfs -cat /output/spark/wc0928/part-00000
上一篇下一篇

猜你喜欢

热点阅读