hive处理json数据-get_json_object(),k
2020-05-30 本文已影响0人
阿童89
数据分析小伙伴应该经常会遇到各种奇葩的源数据,尤其是业务复杂的公司,底层表的备注信息里会有大量的业务信息(大多是json格式),因为产品经理和开发人员的习惯和喜好不同,备注字段的格式和内容也是千奇百怪。。。本文列经常遇到的备注信息和hive常用的json解析函数,以后如果碰到新的奇葩内容,我也会持续进行更新。。。
假设现有源数据:analyst.test_biao
area | goods_id | info1 | info2 | info3 |
---|---|---|---|---|
a,b | 1,2,3 | [{"source":"7fresh","monthSales":4900}] | score=9.9,arg={"source":"7fresh","monthSales":4900}] | source=7fresh,monthSales=4900 |
1、get_json_object()函数
1.1 提取info列的source值
碰到info1这种其实很幸运了,此时应该先用regexp_repalce()函数将'['和']'替换为空('['符号需要转义,所以这里需要写成'['),然后再用get_json_object()函数取出source的值
select
get_json_object(regexp_replace(regexp_replace(info1,'\\[',''),']',''),$source) as info1_source
from analyst.test_biao
运行结果:
info1_source |
---|
7fresh |
1.2 提取info2列source的值
info2不是一个标准的json字符串,此时需要通过正则表达式先取出arg={...}这部分内容,然后再用get_json_object()函数取出source的值
select
get_json_object(regexp_extract(info2,'arg=(\{.*\})'),$source) as info1_source
from analyst.test_biao
运行结果:
info1_source |
---|
7fresh |
2、keyvalue()函数
2.1 提取info3列source的值
select
keyvalue(info3,'=',source) as info1_source
from analyst.test_biao
运行结果:
info1_source |
---|
7fresh |
3、explode()和 LATERAL VIEW()函数
explode()函数常常和LATERAL VIEW()函数一起使用,比如想要goods_id列按逗号拆分成3行,可以运行下面的代码:
select
goods_id2
,info1
from analyst.test_biao
LATERAL VIEW(explode(split(goods_id,','))) goods as goods_id2
运行结果:
goods_id2 | info1 |
---|---|
1 | {"source":"7fresh","monthSales":4900}] |
2 | {"source":"7fresh","monthSales":4900}] |
3 | {"source":"7fresh","monthSales":4900}] |
也可以按照area和goods_id两列进行拆分,拆分后的area和goods_id两列做笛卡尔积
select
goods_id2
,info1
from analyst.test_biao
LATERAL VIEW(explode(split(area,','))) area1 as area1
LATERAL VIEW(explode(split(goods_id,','))) goods_id2 as goods_id2
运行结果:
area1 | goods_id2 | info1 |
---|---|---|
a | 1 | {"source":"7fresh","monthSales":4900}] |
a | 2 | {"source":"7fresh","monthSales":4900}] |
a | 3 | {"source":"7fresh","monthSales":4900}] |
b | 1 | {"source":"7fresh","monthSales":4900}] |
b | 2 | {"source":"7fresh","monthSales":4900}] |
b | 3 | {"source":"7fresh","monthSales":4900}] |