spark知识点
2017-06-16 本文已影响0人
lansane
1、自定义函数UDF
import org.apache.spark.sql.functions._
val myDF = sqlContext.parquetFile("hdfs:/to/my/file.parquet")
val coder: (Int => String) = (arg: Int) => {if (arg < 100) "little" else "big"}
val sqlfunc = udf(coder)
myDF.withColumn("Code", sqlfunc(col("Amt")))
2、自定义聚合函数UDAF
http://www.jianshu.com/p/833b72adb2b6