Hive学习笔记(6)DDL

2019-04-03  本文已影响0人  井地儿

官方参考文档 LanguageManual DDL

创建/删除/更改/使用数据库

在hive sql中database关键词和 schema关键词可以互换,意思是一样的

创建数据库

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];

小牛试刀:
hive> create database if not exists test;
OK
Time taken: 0.131 seconds
hive> create schema if not exists test_schema;
OK
Time taken: 0.07 seconds
hive> show databases;
OK
default
test
test_schema
Time taken: 0.255 seconds, Fetched: 3 row(s)
hive> create database if not exists stefan comment 'just for test';
OK
Time taken: 0.047 seconds
hive> desc database stefan;
OK
stefan  just for test   hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db hadoop  USER
Time taken: 0.068 seconds, Fetched: 1 row(s)

删除数据库

删除数据库时,默认行为是RESTRICT,这种情况下如果数据库不为空,则删除动作失败。如果需要删除库以及库里的表可以使用CASCADE关键字。

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

小牛试刀:
hive> drop database test;
OK
Time taken: 0.242 seconds
hive> drop database if exists test;
OK
Time taken: 0.011 seconds
hive> drop database if exists test_schema cascade;
OK
Time taken: 1.518 seconds
hive> show databases;
OK
default
prop_database
stefan

修改数据库

需要注意下述命令的生效版本号。

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

alter database ... set location 语句不会将数据库当前目录的内容移动到新指定的位置。它不会更改当前数据库下已经存在的任何表/分区关联的位置,仅改变在该数据库下新建表的默认父目录。

小牛试刀:

添加属性:

hive> show create database stefan;
OK
CREATE DATABASE `stefan`
COMMENT
  'just for test'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db'
Time taken: 0.222 seconds, Fetched: 5 row(s)
hive> alter database stefan set dbproperties ('owner'='stefan', 'create_time'='20190403');
OK
Time taken: 0.07 seconds
hive> show create database stefan;
OK
CREATE DATABASE `stefan`
COMMENT
  'just for test'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db'
WITH DBPROPERTIES (
  'create_time'='20190403',
  'owner'='stefan')
Time taken: 0.196 seconds, Fetched: 8 row(s)

修改owner:

hive> alter schema stefan set owner user stefan_test;
OK
Time taken: 0.084 seconds
hive> desc database stefan;
OK
stefan  just for test   hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db stefan_test USER
Time taken: 0.145 seconds, Fetched: 1 row(s)

切换数据库

USE database_name;
USE DEFAULT;

hive> use stefan;
OK
Time taken: 0.019 seconds

查看库

show databases;

hive> show databases;
OK
default
Time taken: 5.987 seconds, Fetched: 1 row(s)

查看当前库

show current_database();

hive> select current_database();
OK
default
Time taken: 0.852 seconds, Fetched: 1 row(s)

查看库的属性信息

desc database extended default;

hive> desc database extended default;
OK
default Default Hive database   hdfs://jms-master-01:9000/user/hive/warehouse   public  ROLE
Time taken: 0.361 seconds, Fetched: 1 row(s)

查看建库语句

show create database default;

hive> show create database default;
OK
CREATE DATABASE `default`
COMMENT
  'Default Hive database'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse'
Time taken: 0.525 seconds, Fetched: 5 row(s)

建库时携带属性

create database if not exists prop_database comment 'test properties' with dbproperties ('prop1'='aaa', 'prop2'='bbb');

hive> create database if not exists prop_database comment 'test properties database' with dbproperties('prop1'='aaa', 'prop2'='bbb');
OK
Time taken: 0.121 seconds
hive> show create database prop_database;
OK
CREATE DATABASE `prop_database`
COMMENT
  'test properties database'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/prop_database.db'
WITH DBPROPERTIES (
  'prop1'='aaa',
  'prop2'='bbb')
Time taken: 0.181 seconds, Fetched: 8 row(s)

创建/删除/清空表

建表

根据指定表明创建表。当表明存在时抛出异常,可以使用 IF NOT EXISTS 跳过该异常。
*表和列的注释comment格式是字符串,使用单引号。
*表名和列名不区分大小写。
*使用external创建的表是外部表,不使用默认创建的表是托管表(我习惯称之为内部表)。查看一个表是托管表还是外部表,可以通过describe extended table_name;查看其中的tableType属性值。

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];

data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)

primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)

array_type
: ARRAY < data_type >

map_type
: MAP < primitive_type, data_type >

struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>

union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)

row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| JSONFILE -- (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE

创建托管表:
hive> create table manage_table (id int, name string);
OK
Time taken: 0.674 seconds

创建外部表:

hive> create external table external_table (id int, name string)  location '/user/hadoop/tmp/hive/default.db/external_table';
OK
Time taken: 0.449 seconds

查看表是托管表还是外部表:
可以看出托管表的tableType是MANAGED_TABLE;外部表的tableType是EXTERNAL_TABLE。

hive> describe extended manage_table;
OK
id                      int
name                    string

Detailed Table Information  Table(tableName:manage_table, dbName:default, owner:hadoop, createTime:1554341982, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://jms-master-01:9000/user/hive/warehouse/manage_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1554341982}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)
Time taken: 0.329 seconds, Fetched: 4 row(s)
hive> describe extended external_table;
OK
id                      int
name                    string

Detailed Table Information  Table(tableName:external_table, dbName:default, owner:hadoop, createTime:1554342083, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://jms-master-01:9000/user/hadoop/tmp/hive/default.db/external_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1554342083}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, rewriteEnabled:false)
Time taken: 0.143 seconds, Fetched: 4 row(s)

创建分区表

hive> create table partition_table (id int, name string) partitioned by (dt string);
OK
Time taken: 0.393 seconds
hive> show create table partition_table;
OK
CREATE TABLE `partition_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db/partition_table'
TBLPROPERTIES (
  'transient_lastDdlTime'='1554344052')
Time taken: 0.275 seconds, Fetched: 15 row(s)

create table as select(CTAS)

CTAS语法限制:

hive> select * from partition_table;
OK
1   郭靖  20190402    china
2   黄蓉  20190402    china
3   杨康  20190402    china
4   穆念慈 20190402    china
5   东邪  20190402    china
6   西毒  20190402    china
7   黄老邪 20190402    china
8   杨铁心 20190402    china
Time taken: 1.499 seconds, Fetched: 8 row(s)
hive> create table new_key_value_store row format serde "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" stored as RCFile as select (id * 10) new_key, concat(id, name) key_value_pair from partition_table sort by new_key, key_value_pair;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190404103533_902fb48c-2471-4e2d-bc79-5cba46b8c710
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1552651623473_0007, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0007/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-04-04 10:35:45,795 Stage-1 map = 0%,  reduce = 0%
2019-04-04 10:35:50,537 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.39 sec
2019-04-04 10:35:56,978 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.27 sec
MapReduce Total cumulative CPU time: 5 seconds 270 msec
Ended Job = job_1552651623473_0007
Moving data to directory hdfs://jms-master-01:9000/user/hive/warehouse/new_key_value_store
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.27 sec   HDFS Read: 8487 HDFS Write: 248 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 270 msec
OK
Time taken: 24.897 seconds
hive> select * from  new_key_value_store;
OK
10  1郭靖
20  2黄蓉
30  3杨康
40  4穆念慈
50  5东邪
60  6西毒
70  7黄老邪
80  8杨铁心
Time taken: 0.21 seconds, Fetched: 8 row(s)
hive> show create table new_key_value_store;
OK
CREATE TABLE `new_key_value_store`(
  `new_key` int,
  `key_value_pair` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/new_key_value_store'
TBLPROPERTIES (
  'transient_lastDdlTime'='1554345358')
Time taken: 0.299 seconds, Fetched: 13 row(s)

克隆表(create table ... like)

create table ... like 会创建一个和源表结构完全一致的空表。

hive> create table like_new_key_value_store like new_key_value_store;
OK
Time taken: 0.149 seconds
hive> show create table like_new_key_value_store;
OK
CREATE TABLE `like_new_key_value_store`(
  `new_key` int,
  `key_value_pair` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/like_new_key_value_store'
TBLPROPERTIES (
  'transient_lastDdlTime'='1554345937')
Time taken: 1.801 seconds, Fetched: 13 row(s)
hive> select * from like_new_key_value_store;
OK
Time taken: 0.252 seconds

分桶排序表

创建一个分桶排序表
hive> create table bucketed_table (id int, name string) partitioned by (dt string, country string) clustered by (id) sorted by (name) into 4 buckets row format delimited fields terminated by '\001' collection items terminated by '\002' map keys terminated by '\003' stored as sequencefile;
OK
Time taken: 0.241 seconds
hive> show create table bucketed_table;
OK
CREATE TABLE `bucketed_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string,
  `country` string)
CLUSTERED BY (
  id)
SORTED BY (
  name ASC)
INTO 4 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'colelction.delim'='',
  'field.delim'='',
  'mapkey.delim'='',
  'serialization.format'='')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db/bucketed_table'
TBLPROPERTIES (
  'transient_lastDdlTime'='1554346570')
Time taken: 0.268 seconds, Fetched: 26 row(s)

向多个分区多动态插入,需要设置参数
set hive.exec.dynamici.partition=true; #开启动态分区,默认是false
set hive.exec.dynamic.partition.mode=nonstrict; #开启允许所有分区都是动态的,否则必须要有静态分区才能使用。

hive> insert into bucketed_table partition(dt, country) select id, name, dt, country from default.partition_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190404110332_1e7c3bf3-e5e6-4ff1-af30-2a5bcf0278e6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 4
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1552651623473_0008, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0008/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4
2019-04-04 11:03:38,849 Stage-1 map = 0%,  reduce = 0%
2019-04-04 11:03:43,157 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.47 sec
2019-04-04 11:03:48,331 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 4.05 sec
2019-04-04 11:03:50,399 Stage-1 map = 100%,  reduce = 75%, Cumulative CPU 9.22 sec
2019-04-04 11:03:51,435 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 12.12 sec
MapReduce Total cumulative CPU time: 12 seconds 120 msec
Ended Job = job_1552651623473_0008
Loading data to table stefan.bucketed_table partition (dt=null, country=null)

Loaded : 1/1 partitions.
     Time taken to load dynamic partitions: 0.243 seconds
     Time taken for adding to write entity : 0.0 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 4   Cumulative CPU: 12.12 sec   HDFS Read: 21718 HDFS Write: 937 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 120 msec
OK
Time taken: 20.806 seconds
hive> select * from bucketed_table;
OK
8   杨铁心 20190402    china
4   穆念慈 20190402    china
5   东邪  20190402    china
1   郭靖  20190402    china
6   西毒  20190402    china
2   黄蓉  20190402    china
3   杨康  20190402    china
7   黄老邪 20190402    china
Time taken: 0.225 seconds, Fetched: 8 row(s)

数据倾斜表(Skewed Tables)

暂时没接触过也没研究过。

临时表

临时表只在当前会话可见。数据存储在用户的临时目录总,会话结束时删除。

create temporary table ...

小牛试刀
hive> create temporary table temp_table(id int, name string);
OK
Time taken: 6.14 seconds
hive> insert into temp_table select id, name from partition_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190408171623_2c787f42-e5e5-460d-8a21-90a5a5c9afc3
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1552651623473_0010, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0010/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-08 17:16:32,625 Stage-1 map = 0%,  reduce = 0%
2019-04-08 17:16:38,994 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.1 sec
MapReduce Total cumulative CPU time: 2 seconds 100 msec
Ended Job = job_1552651623473_0010
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://jms-master-01:9000/tmp/hive-hadoop/hadoop/8dbf32dc-51a3-4caa-b5be-9d682ccdb29c/_tmp_space.db/09e10ccb-5d5f-45ab-b07a-afe5590b2352/.hive-staging_hive_2019-04-08_17-16-23_750_5995661068890136721-1/-ext-10000
Loading data to table default.temp_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.1 sec   HDFS Read: 4605 HDFS Write: 155 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 100 msec
OK
Time taken: 16.73 seconds
hive> select * from temp_table;
OK
1   郭靖
2   黄蓉
3   杨康
4   穆念慈
5   东邪
6   西毒
7   黄老邪
8   杨铁心
Time taken: 0.161 seconds, Fetched: 8 row(s)

临时表插入数据不支持insert overwrite语法。

hive> insert overwrite temp_table select id, name from partition_table;
NoViableAltException(24@[])
    at org.apache.hadoop.hive.ql.parse.HiveParser.destination(HiveParser.java:38762)
    at org.apache.hadoop.hive.ql.parse.HiveParser.insertClause(HiveParser.java:38531)
    at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36478)
    at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
    at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35710)
    at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2284)
    at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
    at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
    at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
FAILED: ParseException line 1:17 cannot recognize input near 'temp_table' 'select' 'id' in destination specification

退出当前会话,临时表会被删除。

hive> quit;
[hadoop@jms-master-01 ~]$ hive
which: no hbase in (/usr/share/svensudo/bin:/home/hadoop/tools/scala-2.12.8/bin:/home/hadoop/tools/spark-2.4.0-bin-hadoop2.7/bin:/home/hadoop/tools/java/jdk1.8.0_191/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/go/bin/:/alc/tool/bin:/alc/tool/bin:/home/hadoop/tools/hadoop-2.7.7/bin:/home/hadoop/tools/apache-hive-2.3.4-bin/bin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/tools/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/tools/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/home/hadoop/tools/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> select * from temp_table;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'temp_table'

事务表(Transactional Tables)

从Hive1.4.0+版本开始支持事务表。Hive提供了支持ACID语义的事务表,可以完成增删改操作。
关于事务表,展开又是一个专题了。可以参考官方事务表文档

主键约束

从Hive2.1.0版本开始,支持约束。Hive支持未经验证的主键和外键约束,注意是未经验证的,所以主键的正确性需要上游数据来保证。

删除表

drop table [if exists] table_name [purge];
通常情况下,删除托管表,hive会删除元数据信息和数据文件,删除文件相当于执行了hadoop fs -rm,也就是说数据文件会被移动到hdfs系统下是trash回收站中,在误操作的情况下,在一定时效期间还可以找回数据。如果指定了purge选项,则不会移动到回收站下,而是直接永久删除。所以drop动作一定要谨慎。

Truncate表

truncate table table_name [partition partition_spec]
这里partition_spec指(partition_column=partition_col_value,partition_column=partition_col_value, ...)

相信如果对关系型数据库sql语言比较熟悉的话,以上操作都可以理解。

Alter table/partition/column

Alter table
重命名表

alter table table_name rename to new_table_name;

hive> show tables;
OK
first_table
Time taken: 5.51 seconds, Fetched: 9 row(s)
hive> alter table first_table rename to second_table;
OK
Time taken: 0.406 seconds
hive> show tables;
OK
second_table
Time taken: 0.05 seconds, Fetched: 9 row(s)
修改表的属性

alter table table_name set tblproperties table_properties;
table_properties:
(property_name = property_value, property_name = property_value, ...)

上一篇下一篇

猜你喜欢

热点阅读