Hive日志分析实战
2018-03-14 本文已影响66人
明明德撩码
对日志进行分析思路
![](https://img.haomeiwen.com/i4176128/6c32949505b674c8.png)
日志内容格式
"27.38.5.159" "-" "31/Aug/2015:00:04:37 +0800" "GET /course/view.php?id=27 HTTP/1.1" "303" "440" - "http://www.ibeifeng.com/user.php?act=mycourse" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" "-" "learn.ibeifeng.com"
注意440 后面的- 没有被引号包围。创建
创建利用正则格式化数据的表
CREATE TABLE bf_log_src (
host STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
size STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^]) ([^]) ([^]) (-|\[^\]\]) ([^ "]|"[^"]") (-|[0-9]) (-|[0-9])(?: ([^ "]|".") ([^ "]|"."))?"
)
STORED AS TEXTFILE;
导入数据
![](https://img.haomeiwen.com/i4176128/87bf4fe4523b0720.png)
针对不同业务创建子表
![](https://img.haomeiwen.com/i4176128/c90f770bd934dfe6.png)
![](https://img.haomeiwen.com/i4176128/f7592057ae2f8e90.png)
向子表导入数据
![](https://img.haomeiwen.com/i4176128/82f4538c451f4ce2.png)
执行测试语句
![](https://img.haomeiwen.com/i4176128/12fdd30a67406bd0.png)
![](https://img.haomeiwen.com/i4176128/05f2558492370778.png)
用户自定义时间转换函数(UDF)
![](https://img.haomeiwen.com/i4176128/9f761dd40fd386c8.png)
![](https://img.haomeiwen.com/i4176128/b1feb976d54e5b76.png)
![](https://img.haomeiwen.com/i4176128/ec40126c854da55c.png)
打jar并加入到hive中
![](https://img.haomeiwen.com/i4176128/7a733bf60c09b7fd.png)
创建日期格式转换函数
![](https://img.haomeiwen.com/i4176128/3e77493ba7cdde14.png)
使用日期格式转换函数,并重新覆盖数据
![](https://img.haomeiwen.com/i4176128/dc95408f83fa6f8c.png)
分析时间
分析用户访问网站的时间段
针对销售来说,合理安排值班,销售课程
![](https://img.haomeiwen.com/i4176128/673107d19d9f37bb.png)
-
执行结果