hive查询

2020-01-10 本文已影响0人 echolvan

hive很大部分和sql语法一致。这里就介绍hive特有的（会有的是sql里的内容）
在hive中有数据表是分区表，在分区表中，查询一定要用where条件语句对分区字段进行限制

执行顺序
from → where → group by → having → select → order by

优先级	命令
1	FROM <left table>
2	ON <join condition>
3	JOIN
4	WHERE
5	GROUP BY
6	WITH
7	HAVING<having condition>
8	SELECT
9	DISTINCT
10	ORDER BY
11	LIMIT

常用函数

如何把时间戳转换为日期？

from_unixtime(timestamp, format_str)
eg:from_unixtime(1983686556, 'yyyy-MM-dd hh:mm:ss')

将日期转换为时间戳

unix_timestamp(date_str)

计算时间间隔

datediff(string enddate, string startdate)
结束日期减去开始日期的天数

日期增加函数

date_add(string startdate, int days)

日期减少函数

date_sub(string startdate, int days)

条件函数

case when

-- 案例：统计用户每个年龄段人数
select 
  case when age<20 then '<20' 
          when age>=20 and age<30 then '20-30'
          when age>=30 and age<40 then '30-40'
          else '40<' end as age_type,
   count(distinct user_id) user_num
   from user_info
   group by 
      case when age<20 then '<20'
              when age>=20 and age<30 then '20-30'
              when age>=30 and age<40 then '30-40'
             else '40<' end;

在上面这个案例中一定要注意select中的case when语句一定要和group by下的一致。
if

select sex,
      if(level>5, '高级用户', '普通用户') as level_type,
      count(distinct user_id) user_num
from user_info
      group by sex,
      if(level>5, '高级用户', '普通用户');

字符串函数

substr(str raw_string, int start, int end)
如果不指定截取长度，则从开始一直截取到最后！

-- 案例需求：每个月激活的用户数
select substr(firstactivetime, 1, 7) as month, count(distinct user_id) user_num
from user_info
group by substr(firstactivetime, 1, 7);

如果我们是一个json字符串怎么办？
get_json_object(str raw_string, '$.key')

-- 不同手机品牌的用户数
select get_json_object(extra1, '$.phonebrand') as phone_brand,
count(distinct user_id) user_num
from user_info
group by get_json_object(extra1, '$.phonebrand');

如何取出map类型的数据呢

案例如上题一致

select extra2['phonebrand'] as phone_brand, count(distinct user_id) user_num
from user_info
group by extra2['phonebrand'];

取map就像取python里的字典一样