hive-sql求最大连续值
2021-10-12 本文已影响0人
堂哥000
- 原始测试数据准备
| name | seq_num |
|---|---|
| a | 1 |
| a | 2 |
| b | 4 |
| b | 5 |
| c | 1 |
| c | 3 |
| c | 4 |
- 创建测试表
create table if not exists tmp.rows_preced_test_20210526
(
name string comment '用户名'
,seq_num int comment '使用产品编码'
)partitioned by (pt string comment 'YYMMDD数据入库时间')
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS orcfile
- 结果计算
select name, max(l_cnt) as m_cnt from
(
select name ,continue ,count(1) as l_cnt
from
(
select name
,seq_num
-- 伪列排序与原始值差值的绝对值
,abs(row_number() over(partition by name order by name,seq_num ) - seq_num ) as continue
from
( -- 对原始数据初步处理, 求次数无需group by , 重复数据算一次需groupby 去重
select name,seq_num
from normal_stg_tmp.rows_preced_test_20210526
group by name,seq_num
) a
) a group by name,continue
) b group by name
- 预计过程结果
| name | seq_num | 伪列值 | 伪列与原始值差值 |
|---|---|---|---|
| a | 1 | 1 | 0 |
| a | 2 | 2 | 0 |
| b | 4 | 1 | 3 |
| b | 5 | 2 | 3 |
| c | 1 | 1 | 0 |
| c | 3 | 2 | 1 |
| c | 4 | 3 | 1 |
对 伪列与原始值差值 计算每个name 下的count(1) 最大值可得连续最大值
可得
连续最大值.png