Hive2:Hive SQL实践

2020-04-29 本文已影响0人勇于自信

hive建表语句：

1.建内部表

hive (badou)> create table udata(userid int,item_id int,rating int,`timestamp` timestamp)row format delimited fields terminated by ' ';

OK
Time taken: 2.254 seconds

hive (badou)> show tables;

OK
udata

create table  if not exists inner_test (
aisle_id string,                                      
aisle_name string     
)
row format delimited fields terminated by ',' lines terminated by '\n'  
stored as textfile  
location '/data/inner';

2.建外部表

create external table  if not exists ext_test (
aisle_id string,                                      
aisle_name string     
)
row format delimited fields terminated by ',' lines terminated by '\n'  
stored as textfile  
location '/data/ext';

3.建分区表

create table partition_test(
order_id string,                                      
user_id string,                                      
eval_set string,                                      
order_number string,  
order_hour_of_day string,                                      
days_since_prior_order string
)partitioned by(order_dow string)
row format delimited fields terminated by '\t';

hive导入数据：

1.从文件导入数据到表

hive (badou)> load data local inpath '/home/dongdong/hive/u.data' overwrite into table udata;

Loading data to table badou.udata
OK
Time taken: 2.335 seconds

2.给分区表插入数据

insert overwrite table partition_test partition (order_dow='1')
select order_id,user_id,eval_set,order_number,order_hour_of_day,days_since_prior_order from orders where order_dow='1' limit 10;

Hive删除表字段

CREATE TABLE test (
creatingTs BIGINT,
a STRING,
b BIGINT,
c STRING,
d STRING,
e BIGINT,
f BIGINT
);

如果需要删除 column f 列，可以使用以下语句：

ALTER TABLE test REPLACE COLUMNS (
creatingTs BIGINT,
a STRING,
b BIGINT,
c STRING,
d STRING,
e BIGINT
);

Hive添加表字段

创建测试表：

use mart_flow_test;
create table if not exists mart_flow_test.detail_flow_test
(
union_id          string    comment '设备唯一标识'
) comment '测试表'
partitioned by (
    partition_date    string    comment '日志生成日期'
) stored as orc;

（2）新增字段：use mart_flow_test;

alter table detail_flow_test add columns(original_union_id string);

（3）修改注释：use mart_flow_test;

alter table detail_flow_conversion_base_raw change column original_union_id original_union_id string COMMENT'原始设备唯一性标识’;

Hive删除表中数据

方式一：仅删除表中数据，保留表结构

truncate table 表名;

（truncate用于删除所有的行，这个行为在hive元存储删除数据是不可逆的）
或 delete from 表名 where 1 = 1 ;

（delete用于删除特定条件下的行，使用where 1=1 删除所有行 SQL中where 1 = 1 的使用）

truncate 不能删除外部表！因为外部表里的数据并不是存放在Hive Meta store中

方式二：删除整个表

drop table 表名;
如果要永久性删除，不准备再恢复：
drop table 表名 purge;

不用终端的方式执行sql：

hive -f create_table.sql

hive实现wordcount的sql语句

select word,count(*)
from (
select 
explode(split(sentence,''))
as word
from article
)t
group by word

hive的udf，udtf内置函数

udf实践：

1.java代码并打包成jar：

2.将jar添加到hive中，并创建临时函数：

3.使用函数及结果：

udtf实践：

1.编写java代码并打包成jar：

2.添加到hive，并创建临时函数：

3.载入数据并创建hive表：

4.使用函数及结果：

transform实践（可用shell语言，python语言开发）

1.shell实践：

1.编写awk文件：

2.加入hive

3.使用awk脚本，及结果：

2.python实践：

使用1：

把py文件添加到hive中：

使用及结果：

wordcount实践：

1.创建hive表

2.导入数据：

创建另一张表（存储）：

新建map.py，red.py并add到hive:

使用及结果：

把wordcount结果插入到指定表中：