电商项目(Hive实现)
2021-11-29 本文已影响0人
V_6619
- 外部表的使用
- 基于ETL的数据加载到数据仓库
- 使用Hive进行统计分析
*对比 MR 和 Hive
查看表:
desc formatted <table_name>
MANAGED_TABLE:内部表
删除内部表:HDFS上的数据被删除 & Meta也被删除
创建外部表:
CREATE EXTERNAL TABLE emp_external(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
location '/external/emp/';
删除外部表: HDFS上的数据不被删除 & Meta被删除
上次ETL的数据:
create external table vvtrack_info(
ip string,
country string,
province string,
city string,
url string,
time string,
page string
) partitioned by (day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
location '/project/vvtrackinfo/';
LOAD DATA INPATH 'hdfs://hadoop000:8020/project/input/etl' OVERWRITE INTO TABLE track_info partition(day='2013-07-21');
统计浏览量
select count(*) from vvtrack_info where day='2013-07-21';
统计省份:
select province, count(*) from vvtrack_info where day='2013-07-21 group by province;
省份统计表
create table vvtrack_info_province_stat(
province string,
cnt bigint
) partitioned by (day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
insert overwrite table vvtrack_info_province_stat partition(day='2013-07-21')
select province,count(*) as cnt from vvtrack_info where day='2013-07-21' group by province ;
1)ETL
2)把ETL输出的数据加载到track_info分区表里
3)各个维度统计结果的数据输出到各自维度的表里(track_info_province_stat)
4)将数据导出(optional)
查看函数
show functions
查看具体函数用法
desc function <function_name>
查看具体例子
desc function extended <function_name>