Spark SQL 开窗函数

2020-03-23  本文已影响0人  麦穗一足
  1. 谈到 SQL 的开窗函数,要说到HIVE了,因为这个是HIVE支持的特性,但是在Spark SQL中支持HIVE 的。那么让我们看一看开窗函数是什么吧。
  2. 什么是开窗函数呢 ?
  1. 首先我们提一个需求。下面是一张班级表 其中name为学生姓名,class 为班级班级,score 为成绩,那么我们提出一个需求:得出每个班级内成绩最高的学生信息。表名为 A。


    image.png
select  a.name, b.class, b.max from A  a 
     (select name,class,max(score) max from A group by class ) b
where  a.socre = b.score 
  1. 开窗函数 (rank()、dense_rank()、row_number())
select name,class,score ,rank() over(partition by class order by sorce)
select * from 
(select name,class,score ,rank() over(partition by class order by sorce)) as t
where t.rank = 1
object OverFunction extends App {

  val sparkConf = new SparkConf().setAppName("over").setMaster("local[*]")

  val spark = SparkSession.builder().config(sparkConf).getOrCreate()

  import spark.implicits._
  println("//***************  原始的班级表  ****************//")
  val scoreDF = spark.sparkContext.makeRDD(Array( Score("a", 1, 80),
    Score("b", 1, 78),
    Score("c", 1, 95),
    Score("d", 2, 74),
    Score("e", 2, 92),
    Score("f", 3, 99),
    Score("g", 3, 99),
    Score("h", 3, 45),
    Score("i", 3, 55),
    Score("j", 3, 78))).toDF("name","class","score")
  scoreDF.createOrReplaceTempView("score")
  scoreDF.show()

  println("//***************  求每个班最高成绩学生的信息  ***************/")
  println("    /*******  开窗函数的表  ********/")
  spark.sql("select name,class,score, rank() over(partition by class order by score desc) rank from score").show()

  println("    /*******  计算结果的表  *******")
  spark.sql("select * from " +
    "( select name,class,score,rank() over(partition by class order by score desc) rank from score) " +
    "as t " +
    "where t.rank=1").show()

  //spark.sql("select name,class,score,row_number() over(partition by class order by score desc) rank from score").show()

  println("/**************  求每个班最高成绩学生的信息(groupBY)  ***************/")

  spark.sql("select class, max(score) max from score group by class").show()

  spark.sql("select a.name, b.class, b.max from score a, " +
    "(select class, max(score) max from score group by class) as b " +
    "where a.score = b.max").show()

  spark.stop()
}
  1. 常用的函数
上一篇下一篇

猜你喜欢

热点阅读