Hive学习笔记(5)导入数据

2019-04-02  本文已影响0人  井地儿

load data

load data的实质是把文件复制或移到表的目录中,从而把数据导入Hive的表(或分区)。

insert overwrite

这种插入会覆盖原有数据。

insert overwrite table target_table partition ( dt = '20190402') select col1, col2 from source_table;

insert into

这种插入会追加数据。

insert into table target_table partition (dt = '20190402') select col1, col2 from source_table;

动态分区插入

insert overwrite table target_table partition(dt) select col1, col2, dt from source_table;

通常情况下,动态分区插入是关闭的,要开启需要设置 hive.exec.dynamic.partition为true。

多表插入

更加清晰的插入sql写法:

from source_table
insert overwrite table target_table select col1, col2;

如果从源表读取数据插入到多个表中,可以在上述语句中使用多个insert语句,这样效率更高,因为只需要扫描一遍源表即可。

from source_table
insert overwrite table target_table1 select col1;
insert overwrite table target_table2 select col2;

create table ... as select (CTAS)

新表的列的定义和select查询的源表列保持一致。

create table new_table as select col1, col2 from source_table;

hive> create table new_table as select id, name from partition_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190402221824_a9feb375-8dd1-4f9c-aaba-015f6ab58cb2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1552651623473_0006, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0006/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-02 22:18:31,075 Stage-1 map = 0%,  reduce = 0%
2019-04-02 22:18:36,347 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.48 sec
MapReduce Total cumulative CPU time: 2 seconds 480 msec
Ended Job = job_1552651623473_0006
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://jms-master-01:9000/user/hive/warehouse/.hive-staging_hive_2019-04-02_22-18-24_643_3904332483007604326-1/-ext-10002
Moving data to directory hdfs://jms-master-01:9000/user/hive/warehouse/new_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.48 sec   HDFS Read: 3896 HDFS Write: 171 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 480 msec
OK
Time taken: 13.836 seconds
hive> select * from new_table;
OK
1   郭靖
2   黄蓉
3   杨康
4   穆念慈
5   东邪
6   西毒
7   黄老邪
8   杨铁心
Time taken: 0.146 seconds, Fetched: 8 row(s)
hive> desc new_table;
OK
id                      int
name                    string
Time taken: 0.197 seconds, Fetched: 2 row(s)

CTAS是原子操作,如果select查询失败,则新表不会被创建。

上一篇 下一篇

猜你喜欢

热点阅读