HIVE(2)

2023-02-20 本文已影响0人山猪打不过家猪

3. Hive Table Operations

3.1 Load

you should use load when you finish the data cleaning.

image.png

3.1.1 filepath

image.png

3.1.2 LOCAL

add LOCAL: It means finding files in the local system; relative path or unrelative path file:///user/hive/project/data1..
No LOCAL field: It uses fs.default.name(unexpected is HDFS)
where is LOCAL
local file system refers to the local Linux file system of the machine where the Hiveserver2 is located rather than the local file system where the HIVE client is located.

image.png

3.1.3 EXERCISE

load data from local files. Data is located at HS2(node1), equal to hadoop fs -put

LOAD DATA LOCAL INPATH '/root/hivedata/students.txt' INTO TABLE student_local;

Load data from HDFS to partition and a specified partition. The data is in the root of the HDFS file system.

1st: use put uploading the data to htfs

hadoop fs -put /root/hivedata/students.txt

2end: use LOAD

LOAD DATA INPATH '/students.txt' INTO TABLE student_HDFS_p partition(country="China")

3.1.4 hive 3.0 new characters

create a table

CREATE TABLE if not exists tab1 (col1 int, col2 int)
PARTITIONED BY (col3 int)
row format delimited fields terminated by ',';
>>
11,22
33,44

before 3.0, loading data

LOAD DATA LOCAL INPATH '/root/hivedata/xxx.txt' INTO TABLE tab1 partition(col3="1");

After 3.0

LOAD DATA LOCAL INPATH '/root/hivedata/tab1.txt' INTO TABLE tab1;

3.2 insert

Using insert like the sql in the hive will take a long time. So you can use it in
insert+select makes you insert the data which you selected.

insert into table student_from_insert 
select num, name from student

3.2.1 multiple insert

normal insert

insert into student_insert1
select num  from student;

insert into student_insert2
select name  from student;

scan once, insert multipule

--多重插入  一次扫描 多次插入
from student
insert overwrite table student_insert1
select num
insert overwrite table student_insert2
select name;

3.3 export data

export to hdfs,use default delimiter\001

insert overwrite directory '/tmp/hive_export/e1' select num,name,age from student limit 10;

export to local system

insert overwrite local directory '/root/hive_export/e1' select * from student

Using specified dilimiters to export data

insert overwrite directory '/tmp/hive_export/e2' row format delimited fields terminated by ','
select * from student

3.4 Select

3.4.1 Covid data exercise

1. create a normal table

drop table if exists t_usa_covid19;
CREATE TABLE t_usa_covid19(
       count_date string,
       county string,
       state string,
       fips int,
       cases int,
       deaths int)
row format delimited fields terminated by ",";

1. load data into the t_usa_covid19

load data local inpath '/root/hivedata/us-covid19-counties.dat' into table t_usa_covid19;

select * from t_usa_covid19;

1. create a partitioned table based on count_data, state

CREATE TABLE if not exists t_usa_covid19_p(
     county string,
     fips int,
     cases int,
     deaths int)
partitioned by(count_date string,state string)
row format delimited fields terminated by ",";

1. loadt_usa_covid19_p using dynamic partitioned table

set hive.exec.dynamic.partition.mode = nonstrict;

insert into table t_usa_covid19_p partition (count_date,state)
select county,fips,cases,deaths,count_date,state from t_usa_covid19;

HIVE(2)

3. Hive Table Operations

3.1 Load

3.1.1 filepath

3.1.2 LOCAL

3.1.3 EXERCISE

3.1.4 hive 3.0 new characters

3.2 insert

3.2.1 multiple insert

3.3 export data

3.4 Select

3.4.1 Covid data exercise

猜你喜欢

热点阅读