spark应用开发HelloWorld

2017-12-24  本文已影响0人  wangqiaoshi

准备

代码列子
1.安装scala插件
开发工具 intellij-IDEA

image.png
2.构建文件
在这里的例子,构建工具采用的是maven,sbt我们在实践中,发现拉取依赖包慢,而且每次更新或者添加依赖的时候,都会遍历检查所有的依赖,非常耗cpu,影响开发,建议maven.

插件

<scala.version>2.11.8</scala.version>
 <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
            <execution>
                <goals>
                    <goal>compile</goal>
                    <goal>testCompile</goal>
                </goals>
            </execution>
        </executions>
        <configuration>
            <scalaVersion>${scala.version}</scalaVersion>
            <args>
                <arg>-target:jvm-1.5</arg>
            </args>
        </configuration>
    </plugin>

<plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <configuration>
            <scalaVersion>${scala.version}</scalaVersion>
        </configuration>
</plugin>
//依赖:
<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.2.1</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.2.1</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>2.2.1</version>
    </dependency>

3.开发代码
数据people.json

{"name":"zhangsan","age":25}
{"name":"wangwu","age":20}
{"name":"lisi","age":28}
{"name":"mazi","age":18}

新建HelloWorld scala object.

 val spark = SparkSession
      .builder()
      .master("local[2]")
      .appName("hello world")
      .config("spark.some.config.option", "some-value")
      .getOrCreate()
    import spark.implicits._

    val peopleDF = spark.read.json("src/main/resources/people.json")

    val newPeopleDF = peopleDF.map(row=>{
      val name = row.getAs[String]("name")
      val age = row.getAs[Long]("age")
      (name,age-18)
    }).toDF("name","理黄花大闺女的年龄差")

    newPeopleDF.show()

输出:


image.png
上一篇下一篇

猜你喜欢

热点阅读