Hive的serde,beeline,jdbc
2018-11-29 本文已影响18人
geekAppke
Hive SerDe - Serializer and Deserializer
- SerDe 用于做序列化和反序列化。
- 构建在数据存储和执行引擎之间,对两者实现解耦。
- Hive通过
ROW FORMAT DELIMITED
以及SERDE
进行内容的读写
row_format
: DELIMITED
[FIELDS TERMINATED BY char [ESCAPED BY char]]
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]
[LINES TERMINATED BY char]
: SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
Hive正则匹配
CREATE TABLE logtbl (
host STRING,
identity STRING,
t_user STRING,
time STRING,
request STRING,
referer STRING,
agent STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[0-9]*)"
)
STORED AS TEXTFILE;
把各个字段取出来,提取时间,修改提取规则
满足读取规则就展示,不满足这行全是NULL
写数据你随便,hive是度检查
上传数据还检查格式,浪费时间、资源
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
([^ ]*) ([^ ]*) ([^ ]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[0-9]*)
hive> load data local inpath '/root/log' into table logtbl;
hive> select * from logtbl;
OK
192.168.57.4 - - 29/Feb/2016:18:14:35 +0800 GET /bg-upper.png HTTP/1.1 304 -
192.168.57.4 - - 29/Feb/2016:18:14:35 +0800 GET /bg-nav.png HTTP/1.1 304 -
Tomcat日志
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
Hive Beeline
- Beeline 要与HiveServer2配合使用
- 服务端启动hiveserver2
- 客户的通过beeline两种方式连接到hive
node002既是服务端,有时客户端,开多个连接
[root@node002 ~]# hiveserver2
让它后台启动!
查看10000端口有没有启动
# ss -nal
[root@node002 ~]# beeline -u jdbc:hive2://node002:10000/default -n root
0: jdbc:hive2://node002:10000/default>!quit
默认 用户名、密码不验证(但不能不输!)权限控制!
[root@node002 ~]# beeline
beeline> !connect jdbc:hive2://node002:10000/default root 123
0: jdbc:hive2://node002:10000/default> select * from psn;
+---------+-----------+-------------------------+-------------------------------------------+--+
| psn.id | psn.name | psn.likes | psn.address |
+---------+-----------+-------------------------+-------------------------------------------+--+
| 1 | 小明1 | ["lol","book","movie"] | {"shenzhen":"luohu","shanghai":"pudong"} |
| 2 | 小明2 | ["lol","book","movie"] | {"shenzhen":"luohu","shanghai":"pudong"} |
| 3 | 小明3 | ["lol","book","movie"] | {"shenzhen":"luohu","shanghai":"pudong"} |
| 4 | 小明4 | ["lol","movie"] | {"shenzhen":"luohu","shanghai":"pudong"} |
| 5 | 小明5 | ["lol","book","movie"] | {"shenzhen":"luohu","shanghai":"pudong"} |
| 6 | 小明6 | ["lol","book","movie"] | {"shenzhen":"luohu","shanghai":"pudong"} |
| 7 | 小明7 | ["lol","book","game"] | {"shenzhen":"luohu","shanghai":"pudong"} |
| 8 | 小明8 | ["lol","book","movie"] | {"shenzhen":"luohu","shanghai":"pudong"} |
| 9 | 小明9 | ["lol","book","movie"] | {"shenzhen":"luohu","shanghai":"pudong"} |
+---------+-----------+-------------------------+-------------------------------------------+--+
9 rows selected (0.219 seconds)
0: jdbc:hive2://node002:10000/default> !quit
Closing: 0: jdbc:hive2://node002:10000/default
[root@node002 ~]#
更加以易读的格式输出而已,2种不同客户端的访问方式!
beeline的本质是jdbc
Hive JDBC
Hive JDBC运行方式
服务端启动hiveserver2后,在java代码中通过调用hive的jdbc访问默认端口10000进行连接、访问