HIVE初步（二）:基本操作

2019-04-02 本文已影响1人迷糊的小竹笋

shell命令

hive -h
usage: hive
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -h,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)
eg：
hive -e "show databases;"

参数配置

主要配置：
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties
1.hive主要配置文件(作用域：所有任务)：
hive-default.xml：默认配置文件
hive-site.xml：用户配置文件，其中的配置项会将hive-default.xml中的配置覆盖掉
2.通过命令行中配置（作用域：此次操作，即session）

bin/hive -hiveconf hive.root.logger=INFO, console

3.通过HQL配置（作用域：此session）

set mapred.reduce.tasks=100;

函数

1.内置函数
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringOperators
2.自定义函数（UDF）
可以理解成，在逻辑较复杂的情况下，可以用其他方式（eg：python）替代HQL去处理数据。
如，有一个数据表person_tb在person.txt中，我们可以通过复杂的select操作对数据进行筛选处理，也可以通过一个cat person.txt|python deal.py得到目标数据（deal.py即为UDF）。在hive中执行这个UDF，需要借助transform。
a.首先需要通过 ADD FILE 指令添加至 Hive 中进行注册
b.用transform ... as ...进行调用

hive > ADD FILE deal.py
hive > SELECT TRANSFORM (<columns>)
USING 'python deal.py'
AS (<columns>)
FROM person_tb;

HIVE初步（二）:基本操作

shell命令

参数配置

函数

猜你喜欢

热点阅读