hive处理json数据-get_json_object(),k

2020-05-30  本文已影响0人  阿童89

数据分析小伙伴应该经常会遇到各种奇葩的源数据,尤其是业务复杂的公司,底层表的备注信息里会有大量的业务信息(大多是json格式),因为产品经理和开发人员的习惯和喜好不同,备注字段的格式和内容也是千奇百怪。。。本文列经常遇到的备注信息和hive常用的json解析函数,以后如果碰到新的奇葩内容,我也会持续进行更新。。。

假设现有源数据:analyst.test_biao

area goods_id info1 info2 info3
a,b 1,2,3 [{"source":"7fresh","monthSales":4900}] score=9.9,arg={"source":"7fresh","monthSales":4900}] source=7fresh,monthSales=4900

1、get_json_object()函数

1.1 提取info列的source值

碰到info1这种其实很幸运了,此时应该先用regexp_repalce()函数将'['和']'替换为空('['符号需要转义,所以这里需要写成'['),然后再用get_json_object()函数取出source的值

select 
        get_json_object(regexp_replace(regexp_replace(info1,'\\[',''),']',''),$source)  as info1_source 
from  analyst.test_biao 

运行结果:

info1_source
7fresh

1.2 提取info2列source的值

info2不是一个标准的json字符串,此时需要通过正则表达式先取出arg={...}这部分内容,然后再用get_json_object()函数取出source的值

select 
        get_json_object(regexp_extract(info2,'arg=(\{.*\})'),$source)  as info1_source 
from  analyst.test_biao 

运行结果:

info1_source
7fresh

2、keyvalue()函数

2.1 提取info3列source的值

select 
        keyvalue(info3,'=',source)  as info1_source 
from  analyst.test_biao 

运行结果:

info1_source
7fresh

3、explode()和 LATERAL VIEW()函数

explode()函数常常和LATERAL VIEW()函数一起使用,比如想要goods_id列按逗号拆分成3行,可以运行下面的代码:

select 
        goods_id2
       ,info1
from  analyst.test_biao 
LATERAL VIEW(explode(split(goods_id,','))) goods as goods_id2 

运行结果:

goods_id2 info1
1 {"source":"7fresh","monthSales":4900}]
2 {"source":"7fresh","monthSales":4900}]
3 {"source":"7fresh","monthSales":4900}]

也可以按照area和goods_id两列进行拆分,拆分后的area和goods_id两列做笛卡尔积

select 
        goods_id2
       ,info1
from  analyst.test_biao 
LATERAL VIEW(explode(split(area,','))) area1  as area1 
LATERAL VIEW(explode(split(goods_id,','))) goods_id2  as goods_id2 

运行结果:

area1 goods_id2 info1
a 1 {"source":"7fresh","monthSales":4900}]
a 2 {"source":"7fresh","monthSales":4900}]
a 3 {"source":"7fresh","monthSales":4900}]
b 1 {"source":"7fresh","monthSales":4900}]
b 2 {"source":"7fresh","monthSales":4900}]
b 3 {"source":"7fresh","monthSales":4900}]
上一篇下一篇

猜你喜欢

热点阅读