2021-04-26

2021-04-26 本文已影响0人虎不知

1. Hive总结-常用函数

学习书目：https://zhuanlan.zhihu.com/p/82601425

1.1. 基础无脑型：

1.1.1. 字符串长度函数：length

hive> select length('abced') from dual;          5

1.1.2. 字符串反转函数：reverse

hive> select reverse('abcedfg') from dual;        gfdecba

1.1.3. 字符串转大写函数：upper,ucase

hive> select ucase('abCd') from dual;    ABCD

1.1.4. 字符串连接函数：concat。

hive> select concat('abc','def') from dual;        abcdef

支持任意个输入字符串

1.1.5. 字符串转小写函数：lower,lcase

hive> select lcase('abCd') from dual;        abcd

1.1.6. 去空格函数：trim；

hive> select trim(' abc ') from dual;        abc

说明：去除字符串两边的空格

hive> select ltrim(' abc ') from dual;        abc

1.1.6.1. 左边去空格函数：ltrim；

hive> select rtrim(' abc ') from dual;        abc

1.1.6.2. 右边去空格函数：rtrim

带分隔符字符串连接函数：concat_ws

hive> select concat_ws(',','abc','def','gh') from dual;        abc,def,gh

1.1.7. 空格字符串函数：space

说明：返回长度为n的空字符串

hive> select space(10) from dual;hive> select length(space(10)) from dual;        10

1.1.8. 重复字符串函数：repeat

hive> select repeat('abc',5) from dual;        abcabcabcabcabc

1.2. 查找替换截取

1.2.1. 字符串截取函数：substr,substring

hive> select substr('abcde',3) from dual;        cdehive> select substring('abcde',3) from dual;        cdehive>  select substr('abcde',-1) from dual;         e

说明：返回字符串A从start位置到结尾的字符串

1.2.2. 字符串截取函数：substr,substring

hive> select substr('abcde',3,2) from dual;        cdhive> select substring('abcde',3,2) from dual;        cdhive>select substring('abcde',-2,2) from dual;        de

说明：返回字符串A从start位置开始，长度为len的字符串

1.2.3. 正则表达式替换函数：regexp_replace

hive> select regexp_replace('foobar', 'oo|ar', '') from dual;        fbhive> select regexp_replace(split(labels,'\\.')[0], '\\.|\\{|\\}|\\"', '') as labels;

(此处使用了转移字符：双,)

说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符\,类似oracle中的regexp_replace函数。

1.2.4. 正则表达式解析函数：regexp_extract

hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 1) from dual;          thehive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) from dual;         barhive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 0) from dual;        foothebarhive> select regexp_extract('中国abc123!','[\\u4e00-\\u9fa5]+',0) from dual; //实用：只匹配中文hive> select regexp_replace('中国abc123','[\\u4e00-\\u9fa5]+','') from dual; //实用：去掉中文

语法: regexp_extract(string subject, string pattern, int index)

返回值: string

说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。

第三个参数:

0 是显示与之匹配的整个字符串

1 是显示第一个括号里面的

2 是显示第二个括号里面的字段

注意，在有些情况下要使用转义字符，等号要用双竖线转义，这是java正则表达式的规则。

1.3. 字符串内容解析

1.3.1. URL解析函数：parse_url

语法: parse_url(string urlString, string partToExtract [, string keyToExtract])

返回值: string

说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') from dual;        facebook.comhive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') from dual;        v1

1.3.2. json解析函数：get_json_object

语法: get_json_object(string json_string, string path)

返回值: string

说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。

hive>select  get_json_object('{"nation":"china"}','$.nation') from dual;china

1.3.3. 首字符ascii函数：ascii ；返回值: int

hive> select ascii('abcde') from dual;        97

说明：返回字符串str第一个字符的ascii码

1.3.4. 左补足函数：lpad（常用来不足长度不足的字符串，然后进行截取）

hive> select lpad('abc',10,'td') from dual;tdtdtdtabc

1.3.4.1. 右补足函数：rpad

hive> select rpad('abc',10,'td') from dual;        abctdtdtdt

1.3.5. 分割字符串函数: split，跟底层MR中的split方法功能一样。返回值: array

hive> select split('abtcdtef','t') from dual;        ["ab","cd","ef"]

1.3.6. 集合查找函数: find_in_set ；返回值: int

说明: 返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0

hive> select find_in_set('ab','ef,ab,de') from dual;        2hive> select find_in_set('at','ef,ab,de') from dual;        0

1.3.7. 在一个字符串中搜索指定的字符,返回发现指定的字符的位置: INSTR(string C1,string C2,int I,int J)；

sql hive> select instr("abcde",'b') from dual; 2

1.3.8. 使用两个分隔符将文本拆分为键值对：str_to_map(text[, delimiter1, delimiter2]) 返回：map

Delimiter1将文本分成K-V对，Delimiter2分割每个K-V对。对于delimiter1默认分隔符是','，对于delimiter2默认分隔符是'='

sql hive> select str_to_map('aaa:123&bbb:456', '&', ':') from dual; {"bbb":"456","aaa":"123"}

1.4. 时间函数

1.4.1. unix_timestamp() 返回当前时间戳。另外，current_timestamp() 也有同样作用。

hive> select unix_timestamp();1568552090hive> select unix_timestamp('2020-01-01 01:01:01');1577811661hive> select unix_timestamp('2020-01-01','yyyy-MM-dd');1577808000

1.4.2. from_unixtime(int/bigint timestamp)

返回 timestamp 时间戳对应的日期，格式为 yyyy-MM-dd HH:mm:ss。

hive> select from_unixtime(1577811661);2020-01-01 01:01:01hive> select from_unixtime(1577811661,'yyyy/MM/dd HH');2020/01/01 01

1.4.3.current_date() ，当前日期时间

hive> select current_date();2021-04-25hive> select current_timestamp();2021-04-25 16:56:49.274hive> select unix_timestamp();1619341037hive> select from_unixtime(unix_timestamp(),'yyyy-MM-dd');2021-04-25hive> select from_unixtime(unix_timestamp(),'yyyyMMdd');20210425hive> select from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:dd:ss');2021-04-25 17:25:51

1.4.4. date_add()

hive> select date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),1);2021-04-24hive> select date_add(current_date,-1);2021-04-24

1.4.5. date_format()

时间戳<互转>日期：from_unixtime(), to_unix_timestamp

hive> select from_unixtime(1517725479,'yyyy-MM-dd HH:dd:ss');2018-02-04 14:04:39hive> select to_unix_timestamp('2017-01-01 12:12:12','yyyy-MM-dd HH:dd:ss');1484193612

date_format 输出标准时间格式：

select from_unixtime(unix_timestamp());hive> select to_date('2017-01-01 12:12:12');2017-01-01hive> select date_format(current_timestamp(),'yyyy-MM-dd HH:mm:ss');2021-04-25 17:21:26hive> select date_format(current_date(),'yyyyMMdd');20210425

utc时间转换:

hive> select to_utc_timestamp(current_timestamp(),8);2021-04-25 17:22:52.29hive> select from_utc_timestamp(current_timestamp(),8);

2021-04-26

1. Hive总结-常用函数

1.1. 基础无脑型：

1.1.1. 字符串长度函数：length

1.1.2. 字符串反转函数：reverse

1.1.3. 字符串转大写函数：upper,ucase

1.1.4. 字符串连接函数：concat。

1.1.5. 字符串转小写函数：lower,lcase

1.1.6. 去空格函数：trim；

1.1.6.1. 左边去空格函数：ltrim；

1.1.6.2. 右边去空格函数：rtrim

1.1.7. 空格字符串函数：space

1.1.8. 重复字符串函数：repeat

1.2. 查找替换截取

1.2.1. 字符串截取函数：substr,substring

1.2.2. 字符串截取函数：substr,substring

1.2.3. 正则表达式替换函数：regexp_replace

1.2.4. 正则表达式解析函数：regexp_extract

1.3. 字符串内容解析

1.3.1. URL解析函数：parse_url

1.3.2. json解析函数：get_json_object

1.3.3. 首字符ascii函数：ascii ；返回值: int

1.3.4. 左补足函数：lpad（常用来不足长度不足的字符串，然后进行截取）

1.3.4.1. 右补足函数：rpad

1.3.5. 分割字符串函数: split，跟底层MR中的split方法功能一样。返回值: array

1.3.6. 集合查找函数: find_in_set ；返回值: int

1.3.7. 在一个字符串中搜索指定的字符,返回发现指定的字符的位置: INSTR(string C1,string C2,int I,int J)；

1.3.8. 使用两个分隔符将文本拆分为键值对：str_to_map(text[, delimiter1, delimiter2]) 返回：map

1.4. 时间函数

1.4.1. unix_timestamp() 返回当前时间戳。另外，current_timestamp() 也有同样作用。

1.4.2. from_unixtime(int/bigint timestamp)

返回 timestamp 时间戳对应的日期，格式为 yyyy-MM-dd HH:mm:ss。

1.4.3.current_date() ，当前日期时间

1.4.4. date_add()

1.4.5. date_format()

时间戳<互转>日期：from_unixtime(), to_unix_timestamp

date_format 输出标准时间格式：

utc时间转换:

猜你喜欢

热点阅读