Hive中row_number的使用

2017-03-21  本文已影响7432人  跨界师

1、hive的分组和组内排序---语法

语法:
row_number() over (partition by 字段a order by 计算项b desc ) rank

2、hive的分组和组内排序 --- 实例

要取top10品牌,各品牌的top10渠道,各品牌的top10渠道中各渠道的top10档期

1、取top10品牌

select “品牌” , sum/count/其他() as num from "table_name" order by num desc limit 10;

2、取top10品牌下各品牌的top10渠道

select a.* from (select "品牌","渠道",sum/count() as num, row_number () over (partition by "品牌" order by num desc) rank from “table_name” where 品牌限制条件 group by “品牌”,“渠道” ) a having a.rank <= 10;

3、 取top10品牌下各品牌的top10渠道中各渠道的top10档期

select a.* from (select "品牌","渠道","档期",sum/count/其他() as num row_number() over (partition by "档期" order by num desc) rank from "table_name" where 品牌限制条件 group by “品牌”,“渠道) a Having a.rank <= 10;

row_number的使用在hive和spark的实时计算中常常会用到计算分区中的排序问题,所以使用好row_number是很重要的。

上一篇 下一篇

猜你喜欢

热点阅读