hive shell基本命令
2019-02-23 本文已影响0人
熵_运维
查看表操作
SHOW TABLES
SHOW TABLES 'part.*';
SHOW TABLES '.*ext';
SHOW PARTITIONS part_t1;
DESCRIBE t1;
SELECT id FROM t1 WHERE a.ds='2019-02-23';
DESCRIBE EXTENDED part_t1 PARTITION (ds='2019-02-23');
内部表
- 创建内部表
CREATE TABLE t1 (id int,ms string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED as TEXTFILE;
- 载入数据
t1.txt: 1 one 11 one 111 one 1111 one
- 直接put文件
hdfs dfs -put /tmp/t1.txt /user/hive/warehouse/test1.db/t1/
- load data
overwrite: 会覆盖原本数据load data local inpath '/tmp/t1.txt' overwrite into table t1;
外部表
外部表和内部表的主要区别是,删除表时数据不会删。数据可以已经存在,也可以先关联到外部目录,再载入数据,载入方法和内部表一样。
- 创建外部表
关键字: EXTERNAL 和 LOCATIONCREATE EXTERNAL TABLE ext_t1 (id int,ms string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED as TEXTFILE LOCATION '/user/hive/warehouse/test1.db/ext_t1';
分区表
创建表时可以同时为表创建一个或多个分区(如日期),查询数据时指定分区减少数据查询量。
- 创建分区表
CREATE TABLE part_t1 (id int,ms string) PARTITIONED BY (ds string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED as TEXTFILE;
hdfs dfs -ls /user/hive/warehouse/test1.db/part_t1 drwxrwxrwt - hdfs hive 0 2019-02-23 11:07 /user/hive/warehouse/test1.db/part_t1/ds=2018 drwxrwxrwt - hdfs hive 0 2019-02-23 11:09 /user/hive/warehouse/test1.db/part_t1/ds=2019
- 查询数据
load data local inpath '/tmp/t2018.txt' into table part_t1 partition (ds='2018'); load data local inpath '/tmp/t2019.txt' into table part_t1 partition (ds='2019'); select * from part_t1 where ds='2018'; 2018 1 2018 2018 2 2018 2018 3 2018 select * from part_t1 where ds='2019'; 2019 1 2019 2019 2 2019 2019 3 2019 select * from part_t1; 2018 1 2018 2018 2 2018 2018 3 2018 2019 1 2019 2019 2 2019 2019 3 2019
桶表
分区表虽然可以分区,不过数据分布不均衡时还是会影响查询效率,桶表对数据哈希取模,目的让数据更均衡分布到不同的数据文件。桶表专门用来抽样查询。