22.基于hive的应用实例(wordcount和基站掉话率)

2019-12-18  本文已影响0人  文茶君

一实现wordcount

1.装入数据.png
hello world hive
hadoop hello hive
hello
world
2.把数据装入hdfs.png

创建表
因为先有数据再有表
所以创建外部表
不知一行有几行单词,把整行当成一个列来存

create external table wc(
line string
);
location '/usr/'
3.创建外部表.png

创建结果表

create table wc_result(
word string,
ct int
);
4查看数据结构1.png

查看是数组结构


5查看数据结构2.png

再切割

from (select explode(split(line,' ')) word(这里给别名)  from wc) t1
 insert into wc_result
select t1.word,count(t1.word) group by t1.word;
6执行MR
7.结果.png

基站掉话率

基站掉话率,找出掉线率最高的前10名
1.数据格式
record_time,imei,cell,ph_num,call_num,drop_num,duration,drop_rate,net_type,erl
2011-07-13 00:00:00+08,356966,29448-37062,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,352024,29448-51331,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,353736,29448-51331,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,353736,29448-51333,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,351545,29448-51333,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,353736,29448-51343,1,0,0,8,0,G,0
2011-07-13 00:00:00+08,359681,29448-51462,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,354707,29448-51462,0,0,0,0,0,G,0
2011-07-13 00:00:00+08,356137,29448-51470,0,0,0,0,0,G,0

排名需求
record time:通话时间
imei:基站编号
cell:手机编号
drop_num:掉话的描述
duration:通话持续总秒数

建表

create table call_monitor (
record_time string,
imei string,
cell string,
ph_num string,
call_num string,
drop_num int,
duration int,
drop_rate double,
net_type string,
erl string
)
row format delimited fields terminated by ',';
2.装上数据文件 3.创建表并加载数据 4查看前几行数据,结果正常(出现null是因为结果类型不匹配,设置的是int)
create table call_result(
imei string,
drop_num int,
duration int,
drop_rate double
)

查询

5执行却报错.png

出错原因在于sdura是别名,却在此写sdrop/sdura。

from call_monitor cm
insert into call_result 
select cm.imei,sum(cm.drop_num) sdrop,
sum(cm.duration) sdura,
sum(cm.drop_num)/sum(cm.duration) s_rate group by  cm.imei order by s_rate desc;

6运行.png 7,结果.png
上一篇下一篇

猜你喜欢

热点阅读