Hadoop大数据我爱编程

Hive 远程模式

2018-05-22  本文已影响50人  金刚_30bf

版本: 2.3.3

  1. 配置mysql数据库:
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://10.30.16.201:3306/hivemetaremote?createDatebaseIfNotExist=true</value>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hiveuser</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive123</value>
  </property>
<property>
<!-- 开启校验schema的版本 -->
 <property>
   <name>hive.metastore.schema.verification</name>
   <value>true</value>
 </property>
  1. 配置metastore thrift :
<property>
 <name>hive.metastore.uris</name>
 <value>thrift://node203.hmbank.com:9083</value>
 <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>

3.开启允许并发执行

<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>
  1. HiveServer2的配置
  <property>
    <name>hive.server2.authentication</name>
    <value>NONE</value>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>node203.hmbank.com</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is ‘binary’.</description>
  </property>

  <property>
    <name>hive.server2.thrift.http.port</name>
    <value>10001</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is ‘http’.</description>
  </property>

  <property>
    <name>hive.server2.thrift.client.user</name>
    <value>hadoop</value>
    <description>Username to use against thrift client</description>
  </property>
  <property>
    <name>hive.server2.thrift.client.password</name>
    <value>hadoop</value>
    <description>Password to use against thrift client</description>
  </property>

  1. 使用schematool 初始化metastore
  2. 启动:
1. 先启动 metastore  
      hive --service metastore  
2. 再启动hiveserver2 
    hiveserver2 
  1. 验证,使用beeline :
-bash-4.1$ beeline 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/apacheori/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.3 by Apache Hive

beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hive
Enter password for jdbc:hive2://localhost:10000: 
Connected to: Apache Hive (version 2.3.3)
Driver: Hive JDBC (version 2.3.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.334 seconds)

可能出现的错误:

beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hadoop
Enter password for jdbc:hive2://localhost:10000: ******
18/05/22 14:33:21 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Failed to open new session:
 java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
 User: root is not allowed to impersonate hadoop (state=08S01,code=0)

错误原因: 由于以root用户启动hive服务, 当hive与hdfs交互时, 其使用的用户是root, 而connect时,输入的用户为hadoop , 对于hdfs来说,它不允许root来代表hadoop用户。
看到这里, 如果我们在!connect时, 输入的用户为root是不是就可以了呢?实践发现也会报错:

User: root is not allowed to impersonate root(state=08S01,code=0)

无论我们以什么用户去connect 都会报上述错误。

原因:hdfs系统不认root用户。
修改hdfs配置文件 : core-site.xml

   <!-- 配置root代理用户 -->
   <property>
      <name>hadoop.proxyuser.root.groups</name>
      <value>*</value>
   </property>

   <property>
      <name>hadoop.proxyuser.root.hosts</name>
      <value>*</value>
   </property>

上述配置即: 任何host上的用户提交的作业,都会被认为代理root用户进行执行。这样在hdfs系统中显示的用户是提交作业的用户。
参考:hadoop的用户代理机制

另外, 由于配置的认证方式为NONE , 所以输入用户后,不用输入密码即可连接成功。

  1. 使用beeline 创建表:
create table test3(sid int , sname string);
  1. 使用beeline插入数据:
0: jdbc:hive2://localhost:10000> insert into test3 values(1, 'xx');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. 
Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

提示为基于MapReduce的Hive在当前版本已废弃,可能不会成功。 建议使用spark 或tez , 或者使用Hive 1.x版本。

(插入数据没能成功!)

  1. 使用hive CLI :
hive> insert into test2 values( 1, 'xx', 3.01);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20180522200745_a5738c95-2cc0-47e2-b3b1-8a7ac496a701
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1525767620603_0046, Tracking URL = http://node203.hmbank.com:54315/proxy/application_1525767620603_0046/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1525767620603_0046
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-05-22 20:07:54,306 Stage-1 map = 0%,  reduce = 0%
2018-05-22 20:08:00,724 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.72 sec
MapReduce Total cumulative CPU time: 1 seconds 720 msec
Ended Job = job_1525767620603_0046
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hmcluster/user/hive/warehouse/hivecluster.db/test2/.hive-staging_hive_2018-05-22_20-07-45_383_6231255496761759560-1/-ext-10000
Loading data to table hivecluster.test2
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.72 sec   HDFS Read: 4510 HDFS Write: 83 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 720 msec
OK
Time taken: 17.491 seconds

使用hive CLI ,也会提示基于MapReduce的Hive 在2.x版本已废弃, 建议使用1.x版本。
但是insert 操作可以执行成功。

  1. 使用hive时在hdfs文件系统中显示的用户名:


    图片.png
  1. 远程模式与本地和内嵌的区别:
上一篇下一篇

猜你喜欢

热点阅读