Hive 远程模式
2018-05-22 本文已影响50人
金刚_30bf
版本: 2.3.3
- 配置mysql数据库:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.30.16.201:3306/hivemetaremote?createDatebaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive123</value>
</property>
<property>
<!-- 开启校验schema的版本 -->
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>
- 配置metastore thrift :
<property>
<name>hive.metastore.uris</name>
<value>thrift://node203.hmbank.com:9083</value>
<description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
3.开启允许并发执行
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
- HiveServer2的配置
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>node203.hmbank.com</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is ‘binary’.</description>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is ‘http’.</description>
</property>
<property>
<name>hive.server2.thrift.client.user</name>
<value>hadoop</value>
<description>Username to use against thrift client</description>
</property>
<property>
<name>hive.server2.thrift.client.password</name>
<value>hadoop</value>
<description>Password to use against thrift client</description>
</property>
- 使用schematool 初始化metastore
- 启动:
1. 先启动 metastore
hive --service metastore
2. 再启动hiveserver2
hiveserver2
- 验证,使用beeline :
-bash-4.1$ beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/apacheori/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.3 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hive
Enter password for jdbc:hive2://localhost:10000:
Connected to: Apache Hive (version 2.3.3)
Driver: Hive JDBC (version 2.3.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (1.334 seconds)
可能出现的错误:
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hadoop
Enter password for jdbc:hive2://localhost:10000: ******
18/05/22 14:33:21 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Failed to open new session:
java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
User: root is not allowed to impersonate hadoop (state=08S01,code=0)
错误原因: 由于以root用户启动hive服务, 当hive与hdfs交互时, 其使用的用户是root, 而connect时,输入的用户为hadoop , 对于hdfs来说,它不允许root来代表hadoop用户。
看到这里, 如果我们在!connect时, 输入的用户为root是不是就可以了呢?实践发现也会报错:
User: root is not allowed to impersonate root(state=08S01,code=0)
无论我们以什么用户去connect 都会报上述错误。
原因:hdfs系统不认root用户。
修改hdfs配置文件 : core-site.xml
<!-- 配置root代理用户 -->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
上述配置即: 任何host上的用户提交的作业,都会被认为代理root用户进行执行。这样在hdfs系统中显示的用户是提交作业的用户。
参考:hadoop的用户代理机制
另外, 由于配置的认证方式为NONE , 所以输入用户后,不用输入密码即可连接成功。
- 使用beeline 创建表:
create table test3(sid int , sname string);
- 使用beeline插入数据:
0: jdbc:hive2://localhost:10000> insert into test3 values(1, 'xx');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions.
Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
提示为基于MapReduce的Hive在当前版本已废弃,可能不会成功。 建议使用spark 或tez , 或者使用Hive 1.x版本。
(插入数据没能成功!)
- 使用hive CLI :
hive> insert into test2 values( 1, 'xx', 3.01);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20180522200745_a5738c95-2cc0-47e2-b3b1-8a7ac496a701
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1525767620603_0046, Tracking URL = http://node203.hmbank.com:54315/proxy/application_1525767620603_0046/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1525767620603_0046
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-05-22 20:07:54,306 Stage-1 map = 0%, reduce = 0%
2018-05-22 20:08:00,724 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 sec
MapReduce Total cumulative CPU time: 1 seconds 720 msec
Ended Job = job_1525767620603_0046
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hmcluster/user/hive/warehouse/hivecluster.db/test2/.hive-staging_hive_2018-05-22_20-07-45_383_6231255496761759560-1/-ext-10000
Loading data to table hivecluster.test2
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.72 sec HDFS Read: 4510 HDFS Write: 83 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 720 msec
OK
Time taken: 17.491 seconds
使用hive CLI ,也会提示基于MapReduce的Hive 在2.x版本已废弃, 建议使用1.x版本。
但是insert 操作可以执行成功。
-
使用hive时在hdfs文件系统中显示的用户名:
图片.png
- 当使用beeline时, 用户owner是在connect时输入的用户名。
- 当使用hive时, 用户owner 是执行hive CLI的操作系统用户名。
- 从上图可以看到hive创建的文件权限都是rwxrwxrwx ,可以换用户执行操作。
- 远程模式与本地和内嵌的区别:
- 远程模式时需要先创建database , 然后use database , 然后才能进行表操作。