【Spark】Spark DataFrame schema转换方

2017-08-24  本文已影响102人  PowerMe

比如原始表的schema如下:


image.png

现在想将该DataFrame 的schema转换成:
id:String,
goods_name:String
price: Array<String>

  1. sql 转换
    spark.sql("create table speedup_tmp_test_spark_schema_parquet12 using parquet as select cast(id as string),cast(goods_name as string),cast(price as array<string>) from tmp_test_spark_schema_parquet")

  2. case class 变换
    case class newSchemaClass(id: String, goods_name: String, price: Array[String])

// 原dataframe
val df = spark.sql("select * from tmp_test_spark_schema_parquet")

// 新dataframe
val newDF = df .rdd.map { r =>
newSchemaClass(r(0).toString, r(1).toString, r.getSeqInt.map(_.toString).toArray)
}.toDF()

// 获取具体数据
newDF.collect()(2).getListString

上一篇 下一篇

猜你喜欢

热点阅读