HIVE(2)
2023-02-20 本文已影响0人
山猪打不过家猪
3. Hive Table Operations
3.1 Load
-
you should use load when you finish the data cleaning.
image.png
3.1.1 filepath

3.1.2 LOCAL
- add LOCAL: It means finding files in the local system; relative path or unrelative path
file:///user/hive/project/data1..
- No LOCAL field: It uses fs.default.name(unexpected is HDFS)
-
where is LOCAL
local file system refers to the local Linux file system of the machine where the Hiveserver2 is located rather than the local file system where the HIVE client is located.
image.png
3.1.3 EXERCISE
- load data from local files. Data is located at HS2(node1), equal to
hadoop fs -put
LOAD DATA LOCAL INPATH '/root/hivedata/students.txt' INTO TABLE student_local;
- Load data from HDFS to partition and a specified partition. The data is in the root of the HDFS file system.
- 1st: use put uploading the data to htfs
hadoop fs -put /root/hivedata/students.txt
- 2end: use LOAD
LOAD DATA INPATH '/students.txt' INTO TABLE student_HDFS_p partition(country="China")
3.1.4 hive 3.0 new characters
- create a table
CREATE TABLE if not exists tab1 (col1 int, col2 int)
PARTITIONED BY (col3 int)
row format delimited fields terminated by ',';
>>
11,22
33,44
- before 3.0, loading data
LOAD DATA LOCAL INPATH '/root/hivedata/xxx.txt' INTO TABLE tab1 partition(col3="1");
- After 3.0
LOAD DATA LOCAL INPATH '/root/hivedata/tab1.txt' INTO TABLE tab1;
3.2 insert
Using insert like the sql in the hive will take a long time. So you can use it in
insert+select makes you insert the data which you selected.
insert into table student_from_insert
select num, name from student
3.2.1 multiple insert
- normal insert
insert into student_insert1
select num from student;
insert into student_insert2
select name from student;
- scan once, insert multipule
--多重插入 一次扫描 多次插入
from student
insert overwrite table student_insert1
select num
insert overwrite table student_insert2
select name;
3.3 export data
- export to hdfs,use default delimiter
\001
insert overwrite directory '/tmp/hive_export/e1' select num,name,age from student limit 10;
- export to local system
insert overwrite local directory '/root/hive_export/e1' select * from student
- Using specified dilimiters to export data
insert overwrite directory '/tmp/hive_export/e2' row format delimited fields terminated by ','
select * from student
3.4 Select
3.4.1 Covid data exercise
- create a normal table
drop table if exists t_usa_covid19;
CREATE TABLE t_usa_covid19(
count_date string,
county string,
state string,
fips int,
cases int,
deaths int)
row format delimited fields terminated by ",";
- load data into the
t_usa_covid19
- load data into the
load data local inpath '/root/hivedata/us-covid19-counties.dat' into table t_usa_covid19;
select * from t_usa_covid19;
- create a partitioned table based on count_data, state
CREATE TABLE if not exists t_usa_covid19_p(
county string,
fips int,
cases int,
deaths int)
partitioned by(count_date string,state string)
row format delimited fields terminated by ",";
- load
t_usa_covid19_p
using dynamic partitioned table
- load
set hive.exec.dynamic.partition.mode = nonstrict;
insert into table t_usa_covid19_p partition (count_date,state)
select county,fips,cases,deaths,count_date,state from t_usa_covid19;