Hive元数据常用表介绍

2019-05-02  本文已影响0人  喵星人ZC

Hive版本:1.1.0-cdh5.7.0
一、VERSION表

mysql> select * from VERSION ;  
+--------+----------------+-----------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT                         |
+--------+----------------+-----------------------------------------+
|      1 | 1.1.0          | Set by MetaStore hadoop@192.168.245.100 |
+--------+----------------+-----------------------------------------+

此表存着Hive的版本信息,有且只有一条数据。可以尝试删除此条信息和新增一条信息,都会导致Hive不可用

二 、数据库有关的元数据表
1、DBS表存储Hive中数据库的信息,default 为自带数据,HDFS路径为/user/hive/warehouse。g6_hadoop 为自建数据,HDFS路径为/user/hive/warehouse/g6_hadoop.db

mysql> select * from DBS ;
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                                        | NAME      | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+
|     1 | Default Hive database | hdfs://hadoop000:8020/user/hive/warehouse              | default   | public     | ROLE       |
|     2 | NULL                  | hdfs://hadoop000:8020/user/hive/warehouse/g6_hadoop.db | g6_hadoop | hadoop     | USER       |
+-------+-----------------------+--------------------------------------------------------+-----------+------------+------------+

2、DATABASE_PARAMS存储数据的属性信息,与DBS关联主键为DB_ID

mysql> desc DATABASE_PARAMS;
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| DB_ID       | bigint(20)    | NO   | PRI | NULL    |       |
| PARAM_KEY   | varchar(180)  | NO   | PRI | NULL    |       |
| PARAM_VALUE | varchar(4000) | YES  |     | NULL    |       |
+-------------+---------------+------+-----+---------+-------+

三、与数据表有关的元数据表
1、TBLS存储表的信息

mysql> select * from TBLS \G;
*************************** 1. row ***************************
            TBL_ID: 11
       CREATE_TIME: 1554218873
             DB_ID: 2
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 16
          TBL_NAME: g6_access
          TBL_TYPE: EXTERNAL_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
*************************** 2. row ***************************
            TBL_ID: 16
       CREATE_TIME: 1555690888
             DB_ID: 2
  LAST_ACCESS_TIME: 0
             OWNER: hue
         RETENTION: 0
             SD_ID: 26
          TBL_NAME: views
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL

与DBS关联的主键为DB_ID,与SDS关联的主键为SD_ID。

2、SDS存储的是表的输入和输出格式,也就是存储格式

mysql> select * from SDS where SD_ID =16 \G;
*************************** 1. row ***************************
                    SD_ID: 16
                    CD_ID: 11
             INPUT_FORMAT: org.apache.hadoop.mapred.TextInputFormat
            IS_COMPRESSED:  
IS_STOREDASSUBDIRECTORIES:  
                 LOCATION: hdfs://hadoop000:8020/g6/hadoop/access/clear
              NUM_BUCKETS: -1
            OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                 SERDE_ID: 16

SDS与CDS关联的主键为CD_ID,与SERDES(存储表使用的序列化类的信息)和SERDE_PARAMS(存储表使用的序列化的一些属性、格式信息,比如:行、列分隔符)关联的主键为SERDE_ID。

3、SERDES存储表使用的序列化类的信息
SERDE_PARAMS存储表使用的序列化的一些属性、格式信息,比如:行、列分隔符

mysql> select * from SERDES where SERDE_ID=16;  
+----------+------+----------------------------------------------------+
| SERDE_ID | NAME | SLIB                                               |
+----------+------+----------------------------------------------------+
|       16 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
+----------+------+----------------------------------------------------+


mysql> select * from SERDE_PARAMS where SERDE_ID=16;
+----------+----------------------+-------------+
| SERDE_ID | PARAM_KEY            | PARAM_VALUE |
+----------+----------------------+-------------+
|       16 | field.delim          |                |
|       16 | serialization.format |                |
+----------+----------------------+-------------+

4、TABLE_PARAMS该表存储表/视图的属性信息

mysql> select * from TABLE_PARAMS where TBL_ID = 11;
+--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY             | PARAM_VALUE |
+--------+-----------------------+-------------+
|     11 | EXTERNAL              | TRUE        |
|     11 | transient_lastDdlTime | 1554218873  |
+--------+-----------------------+-------------+

与TBLS关联主键为TBL_ID

5、TBL_PRIVS该表存储表/视图的授权信息,与TBLS关联主键为TBL_ID

6、COLUMNS_V2该表存储表对应的字段信息

mysql> select * from COLUMNS_V2 where  CD_ID = 11;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
|    11 | NULL    | cdn         | string    |           0 |
|    11 | NULL    | domain      | string    |           5 |
|    11 | NULL    | ip          | string    |           4 |
|    11 | NULL    | level       | string    |           2 |
|    11 | NULL    | region      | string    |           1 |
|    11 | NULL    | time        | string    |           3 |
|    11 | NULL    | traffic     | bigint    |           7 |
|    11 | NULL    | url         | string    |           6 |
+-------+---------+-------------+-----------+-------------+

COLUMNS_V2要先通过SDS表CD_ID关联取到SD_ID,再用SD_ID与TBLS关联。

Hive元数据常用表的UML图如下:


image.png
上一篇下一篇

猜你喜欢

热点阅读