Hive 远程模式

2018-05-22 本文已影响50人金刚_30bf

版本： 2.3.3

配置mysql数据库：

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://10.30.16.201:3306/hivemetaremote?createDatebaseIfNotExist=true</value>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hiveuser</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive123</value>
  </property>
<property>
<!-- 开启校验schema的版本 -->
 <property>
   <name>hive.metastore.schema.verification</name>
   <value>true</value>
 </property>

配置metastore thrift ：

<property>
 <name>hive.metastore.uris</name>
 <value>thrift://node203.hmbank.com:9083</value>
 <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>

3.开启允许并发执行

<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>

HiveServer2的配置

  <property>
    <name>hive.server2.authentication</name>
    <value>NONE</value>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>node203.hmbank.com</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is ‘binary’.</description>
  </property>

  <property>
    <name>hive.server2.thrift.http.port</name>
    <value>10001</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is ‘http’.</description>
  </property>

  <property>
    <name>hive.server2.thrift.client.user</name>
    <value>hadoop</value>
    <description>Username to use against thrift client</description>
  </property>
  <property>
    <name>hive.server2.thrift.client.password</name>
    <value>hadoop</value>
    <description>Password to use against thrift client</description>
  </property>

使用schematool 初始化metastore
启动：

1. 先启动 metastore  
      hive --service metastore  
2. 再启动hiveserver2 
    hiveserver2

验证，使用beeline ：

-bash-4.1$ beeline 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/apacheori/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.3 by Apache Hive

beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hive
Enter password for jdbc:hive2://localhost:10000: 
Connected to: Apache Hive (version 2.3.3)
Driver: Hive JDBC (version 2.3.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.334 seconds)

可能出现的错误：

beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hadoop
Enter password for jdbc:hive2://localhost:10000: ******
18/05/22 14:33:21 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Failed to open new session:
 java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
 User: root is not allowed to impersonate hadoop (state=08S01,code=0)

错误原因：由于以root用户启动hive服务，当hive与hdfs交互时，其使用的用户是root，而connect时，输入的用户为hadoop ，对于hdfs来说，它不允许root来代表hadoop用户。
看到这里，如果我们在！connect时，输入的用户为root是不是就可以了呢？实践发现也会报错：

User: root is not allowed to impersonate root(state=08S01,code=0)

无论我们以什么用户去connect 都会报上述错误。

原因：hdfs系统不认root用户。
修改hdfs配置文件： core-site.xml

   <!-- 配置root代理用户 -->
   <property>
      <name>hadoop.proxyuser.root.groups</name>
      <value>*</value>
   </property>

   <property>
      <name>hadoop.proxyuser.root.hosts</name>
      <value>*</value>
   </property>

上述配置即：任何host上的用户提交的作业，都会被认为代理root用户进行执行。这样在hdfs系统中显示的用户是提交作业的用户。
参考：hadoop的用户代理机制

另外，由于配置的认证方式为NONE ，所以输入用户后，不用输入密码即可连接成功。

使用beeline 创建表：

create table test3(sid int , sname string);

使用beeline插入数据：

0: jdbc:hive2://localhost:10000> insert into test3 values(1, 'xx');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. 
Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

提示为基于MapReduce的Hive在当前版本已废弃，可能不会成功。建议使用spark 或tez ，或者使用Hive 1.x版本。

（插入数据没能成功！）

使用hive CLI ：

hive> insert into test2 values( 1, 'xx', 3.01);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20180522200745_a5738c95-2cc0-47e2-b3b1-8a7ac496a701
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1525767620603_0046, Tracking URL = http://node203.hmbank.com:54315/proxy/application_1525767620603_0046/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1525767620603_0046
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-05-22 20:07:54,306 Stage-1 map = 0%,  reduce = 0%
2018-05-22 20:08:00,724 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.72 sec
MapReduce Total cumulative CPU time: 1 seconds 720 msec
Ended Job = job_1525767620603_0046
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hmcluster/user/hive/warehouse/hivecluster.db/test2/.hive-staging_hive_2018-05-22_20-07-45_383_6231255496761759560-1/-ext-10000
Loading data to table hivecluster.test2
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.72 sec   HDFS Read: 4510 HDFS Write: 83 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 720 msec
OK
Time taken: 17.491 seconds

使用hive CLI ，也会提示基于MapReduce的Hive 在2.x版本已废弃，建议使用1.x版本。
但是insert 操作可以执行成功。

使用hive时在hdfs文件系统中显示的用户名：

图片.png

当使用beeline时，用户owner是在connect时输入的用户名。
当使用hive时，用户owner 是执行hive CLI的操作系统用户名。
从上图可以看到hive创建的文件权限都是rwxrwxrwx ，可以换用户执行操作。

远程模式与本地和内嵌的区别：

远程模式时需要先创建database ，然后use database ，然后才能进行表操作。

Hive 远程模式

无论我们以什么用户去connect 都会报上述错误。

猜你喜欢

热点阅读