部署Hive3自定密码验证机制,Hadoop3配置proxy u

2021-03-15  本文已影响0人  旋转马达

本文讲解部署hive3的过程中遇到的问题和解决方案

一 :hive的部署

安装方法见Hive3整合Hadoop3的安装配置

二:安装tez作为hive的计算引擎

下载tez的安装包下载地址,解压到安装目录,安装指南(英文版)

简要说一下安装步骤:
1:确保部署tez之前先部署hadoop,并且版本大于等于2.7.0
2:编译tez,如果下载的是编译好的bin版本,该步骤可以省略,我们用的是bin版本
3:复制tez相关的jar和配置tez-site.xml文件

hadoop fs -mkdir /user/tez
hadoop fs -put ${TEZ_HOME}/tez.tar.gz /user/tez

3.1 在tez-site.xml中设置 tez.lib.uris参数,只想我们刚刚put上的hdfs的路径

        <property>
                <name>tez.lib.uris</name>
                <value>/user/tez/tez.tar.gz</value> <!-- 这里指向hdfs上的tez.tar.gz包 -->
        </property>

确保 tez.use.cluster.hadoop-libs没有被设置在tez-site.xml中,如果设置了该参数则应该值为false。
4:如果要运行MapReduce任务(job)在tez之上,修改hadoop的mapred-site.xml配置文件的以下参数

        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn-tez</value>
        </property>

5:修改客户端节点配置,保证tez相关的类库在hadoop的classpath下
编辑 hadoop-env.sh,在文件末尾追加以下配置

TEZ_CONF_DIR=/opt/programs/hadoop-3.2.2/etc/hadoop/tez-site.xml
TEZ_JARS=/opt/programs/apache-tez-0.9.2-bin
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

请注意“*”,在为包含jar文件的目录设置类路径时,这是一个重要的要求。

6:在tez-examples.jar中有一个使用MRR作业的基本示例。请参阅源代码中的OrderedWordCount.java。要运行这个例子:

hadoop jar tez-examples.jar orderedwordcount <input> <output>

:hive支持多用户,自定义增加验证机制,hadoop配置修改

默认hive是不需要认证的,所有人都可以直接访问数据,这样是很不安全的,所以这里我们需要自定义认证机制,然后才能通过beeline或者jdbc连接hive。

代码主要是实现一个PasswdAuthenticationProvider的子类,实现抽象方法,实现认证逻辑,

public class BasicUsernamePasswdAuthenticator implements PasswdAuthenticationProvider {
    private final static Logger LOGGER = LoggerFactory.getLogger(BasicUsernamePasswdAuthenticator.class);
    private static final String HIVE_JDBC_PASSWD_AUTH_PREFIX = "hive.jdbc.passwd.%s";


    private Configuration conf = null;

    @Override
    public void Authenticate(String user, String password) throws AuthenticationException {
        LOGGER.info("user: " + user + " try login.");
        String passwdFromConf = getConf().get(String.format(HIVE_JDBC_PASSWD_AUTH_PREFIX, user));
        LOGGER.info("读取到用户{}的配置密码为{},传入密码为{}", user, passwdFromConf, password);
        if (passwdFromConf == null) {
            String message = "user's ACL configuration is not found. user:" + user + ",passwdFromConf:" + passwdFromConf;
            LOGGER.info(message);
            throw new AuthenticationException(message);
        }
        if (!passwdFromConf.equals(password)) {
            String message = "user name and password is mismatch. user:" + user + ",passwdFromConf:" + passwdFromConf;
            LOGGER.error(message);
            throw new AuthenticationException(message);
        }
        LOGGER.info("认证通过");
    }


    public Configuration getConf() {
        if (conf == null) {
            this.conf = new Configuration(new HiveConf());
        }
        return conf;
    }

    public void setConf(Configuration conf) {
        this.conf = conf;
    }
}

然后将打包该class为jar,上传到hive 的lib目录中,修改hive的配置,增加如下配置

 <property>
    <name>hive.server2.authentication</name>
    <value>CUSTOM</value>
    <description>
      Expects one of [nosasl, none, ldap, kerberos, pam, custom].
      Client authentication types.
        NONE: no authentication check
        LDAP: LDAP/AD based authentication
        KERBEROS: Kerberos/GSSAPI authentication
        CUSTOM: Custom authentication provider
                (Use with property hive.server2.custom.authentication.class)
        PAM: Pluggable authentication module
        NOSASL:  Raw transport
    </description>
  </property>
  <property>
    <name>hive.server2.custom.authentication.class</name>
    <value>org.puppy.hive.auth.basic.BasicUsernamePasswdAuthenticator</value>
  </property>
  <property>
    <name>hive.jdbc.passwd.hadoop</name>
    <value>123456789</value>
  </property>

以上配置了一个hadoop用户,密码是1-9,用这个账户连接hive操作hdfs中的数据,用户名hadoop是在代码中用String.format模式匹配得到的,所以这里只有一个配置。

如果只是配置到这里,我们尝试启动hive,然后用beeline连接

$ nohup hiveserver2 >> /opt/programs/apache-hive-3.1.2-bin/logs/hive.log &
$ beeline
Beeline version 3.1.2 by Apache Hive
beeline> !connect jdbc:hive2://hadoop000:10000
Enter username for jdbc:hive2://hadoop000:10000: hadoop
Enter password for jdbc:hive2://hadoop000:10000: *********

回车之后会得到一个错误

Error: Could not open client transport with JDBC Uri:
 jdbc:hive2://hadoop000:10000: Failed to open new session: java.lang.RuntimeException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): 
User: puppy is not allowed to impersonate joo (state=08S01,code=0)

看后台的hive日志会看到这样的异常

hadoop is not allowed to impersonate puppy

解释一下,意思就是hadoop这个用户不允许乔装为puppy这个用户,为什么会这样呢?
有两个原因,这哥就是hadoop的安全机制了,hadoop不是随便哪个用户都可以操作的,要用super user来代替hadoop这个用户操作hdfs才管用,hadoop提供了一种impersonate的机制,
Superusers Acting On Behalf Of Other Users,然后super user可以proxy user的身份来提交任务,需要修改以下配置
hive-site.xml

<property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
  </property>

core-site.xml

<property>
        <name>hadoop.proxyuser.puppy.hosts</name>
        <value>hadoop000,172.24.163.174,localhost,127.0.0.1</value>
</property>
<property>
        <name>hadoop.proxyuser.puppy.groups</name>
        <value>supergroup</value>
</property>
<property>
        <name>hadoop.proxyuser.puppy.users</name>
        <value>hadoop,bob,joe</value>
</property>

权限刷新

hdfs dfsadmin -refreshUserToGroupsMappings
yarn rmadmin -refreshSuperUserGroupsConfiguration
为了保险起见,重启hadoop,就可以了

如果上面的hadoop.proxyuser.puppy.hosts配置错误,会有下面的错误

Caused by: org.apache.hadoop.ipc.RemoteException: Unauthorized connection for super-user: puppy from IP /172.24.163.174
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562) ~[hadoop-common-3.2.2.jar:?]
        at org.apache.hadoop.ipc.Client.call(Client.java:1508) ~[hadoop-common-3.2.2.jar:?]
        at org.apache.hadoop.ipc.Client.call(Client.java:1405) ~[hadoop-common-3.2.2.jar:?]

:jdbc客户端连接hive

代码如下

    public static void main(String[] args) throws SQLException {

        DruidDataSource source = new DruidDataSource();
        source.setUrl("jdbc:hive2://hadoop000:10000");
        source.setDbType("hive");
        source.setUsername("hadoop");
        source.setPassword("123456789");
        DruidPooledConnection connection = source.getConnection();
        System.out.println("获取到连接:" + connection);
        PreparedStatement statement = connection.prepareStatement("select * from test_db.u_data");
        try {
            ResultSet resultSet = statement.executeQuery();
            while (resultSet.next()) {
                String phone = resultSet.getString("phone");
                System.out.println("手机号码为:" + phone);
            }
        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
            connection.close();
            statement.close();
        }
    }

maven依赖

      <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>3.1.2</version> 
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid</artifactId>
            <version>1.2.5</version>
        </dependency>
上一篇下一篇

猜你喜欢

热点阅读