19.Hadoop:Spark Helloworld实验

2020-07-12  本文已影响0人  負笈在线

本节主要内容:

Spark Helloworld实验:

运用Spark,计算词语出现的次数

wordcount数据准备

       # echo "Hello World Bye World" > /file0

       # echo "Hello Hadoop Goodbye Hadoop" > /file1

       # sudo -u hdfs hdfs dfs -mkdir -p /usr/spark/wordcount/input

       # sudo -u hdfs hdfs dfs -put file* /user/spark/wordcount/input

       # sudo -u hdfs hdfs dfs -chmod 1777 /user/spark/wordcount/input

       # sudo -u hdfs hdfs dfs -chown -R spark:spark /user/spark/wordcount/input

进入spark-shell运行脚本

       ## sudo -u spark spark-shell

Setting default log level to "WARN".

scala>

scala> val file = sc.textFile("hdfs://cluster1/user/spark/wordcount/input") 定义变量file,指向源文件地址

scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) 调用file的flatMap方法,将每一行用空格分割,并取出单词,然后根据单词统计,并累加

scala> counts.saveAsTextFile("hdfs://cluster1/user/spark/wordcount/output")  定义文件的输出

在pig中查看

       # sudo -u hdfs pig

grunt> ls

hdfs://cluster1/user/spark/wordcount/output/_SUCCESS<r 3> 0

hdfs://cluster1/user/spark/wordcount/output/part-00000<r 3> 28

hdfs://cluster1/user/spark/wordcount/output/part-00001<r 3> 23

grunt> cat part-00000

(Bye,1)

(Hello,2)

(World,2)

grunt> cat part-00001

(Goodbye,1)

(Hadoop,2)

grunt>

上一篇下一篇

猜你喜欢

热点阅读