Hadoop开启kerberos(基于CDH6.3)
步骤
系统环境: Centos7.9
用户: root
一. 搭建Kerberos环境
1. 搭建Kerberos server
1. 安装
yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation openldap-clients -y
执行命令后会生成/etc/krb5.conf、/var/kerberos/krb5kdc/kadm5.acl、/var/kerberos/krb5kdc/kdc.conf三个文件
2. 修改配置, 将realm改为HADOOP.COM(按需)
- /etc/krb5.conf
# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false
pkinit_anchors = FILE:/etc/pki/tls/certs/ca-bundle.crt
# default_realm = EXAMPLE.COM
default_realm = HADOOP.COM
# 该配置建议去掉, 可能会导致客户端认证失败
# default_ccache_name = KEYRING:persistent:%{uid}
[realms]
# EXAMPLE.COM = {
# kdc = kerberos.example.com
# admin_server = kerberos.example.com
# }
HADOOP.COM = {
# kdc认证服务的host名称, 需要提前在/etc/hosts做配置
kdc = kdc-server-host1
admin_server = kdc-server-host1
}
[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM
.example.com = HADOOP.COM
example.com = HADOOP.COM
- /var/kerberos/krb5kdc/kadm5.acl
*/admin@HADOOP.COM *
- /var/kerberos/krb5kdc/kdc.conf
[kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88
[realms]
HADOOP.COM = {
#master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:norm
al
}
3. 初始化Kerberos
- 创建Kerberos数据库
kdb5_util create –r http://HADOOP.COM -s
途中提示输入密码, 重复两次即可
- 创建Kerberos的管理账号
执行kadmin.local, 输入admin/admin@HADOOP.COM
途中提示输入密码, 重复两次即可
- 启动kdc和kadmin服务
systemctl enable krb5kdc
systemctl enable kadmin
systemctl start krb5kdc
systemctl start kadmin
2. Kerberos Client访问
建议所有hadoop和微服务相关的全部机器都进行Kerberos client的安装, 并将Kerberos server的配置/etc/krb5.conf拷贝一份过来到相同目录下
- 安装客户端
# RHEL
yum install krb5-workstation krb5-libs
# SUSE
zypper install krb5-client
# Ubuntu, Debian
apt-get install krb5-user
二. Hadoop环境配置
1. hdfs配置
需要操作的角色: Datanode, Namenode, SecondaryNameNode
1) Kerberos凭证生成和初始化
在Kerberos server所在机器(root用户)生成hdfs相关的凭证, 每台hdfs的节点都需要用于组件内部通讯, 示例为namenode数量为1, datanode数量为3
- 凭证名称
hdfs/hadoop-node1@HADOOP.COM
hdfs/hadoop-node2@HADOOP.COM
hdfs/hadoop-node3@HADOOP.COM
HTTP/hadoop-node1@HADOOP.COM
HTTP/hadoop-node2@HADOOP.COM
HTTP/hadoop-node3@HADOOP.COM
- 输入命令给所有机器生成凭证, 途中需要输入两次密码
kadmin.local:
add_principal hdfs/hadoop-node1@HADOOP.COM
add_principal hdfs/hadoop-node2@HADOOP.COM
....
list_principals为查看命令
- 导出凭证保存为hdfs.keytab文件, 将keytab文件上传到datanode机器的相同目录上
kadmin.local:
ktadd -k /tmp/hadoop-node1/hdfs.keytab hdfs/hadoop-node1@HADOOP.COM
ktadd -k /tmp/hadoop-node2/hdfs.keytab hdfs/hadoop-node2@HADOOP.COM
.....
2) core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>java.security.krb5.conf</name>
<value>/path/to/krb5.conf</value>
</property>
3) hdfs-site.xml
- NameNode
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@<REALM></value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/path/to/hdfs.keytab</value>
</property>
- DataNode
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST@<REALM></value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/path/to/hdfs.keytab</value>
</property>
- Secondary Namenode
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value>hdfs/_HOST@<REALM></value>
</property>
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value>/path/to/hdfs.keytab</value>
</property>
开启Kerberos后, 可能会对系统的pid有要求需要小于1024, 如果启动Datanode报错时, 尝试在Datanode节点上的hdfs-site.xml加入如下配置, 对Datanode的数据收发端口做配置
<property>
<name>dfs.datanode.address</name>
<value>datanode-host1:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>datanode-host1:1006</value>
</property>
hdfs http web控制台Kerberos(按需)
core-site.xml
<property>
<name>hadoop.http.filter.initializers</name>
<value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
</property>
<property>
<name>hadoop.http.authentication.type</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.http.authentication.signature.secret.file</name>
<value>/opt/http-auth-signature-secret</value>
</property>
<property>
<name>hadoop.http.authentication.cookie.domain</name>
<value></value>
</property>
<property>
<name>hadoop.http.authentication.kerberos.keytab</name>
<value>/path/to/hdfs.keytab</value>
</property>
<property>
<name>hadoop.http.authentication.kerberos.principal</name>
<value>HTTP/_HOST@HADOOP.COM</value>
</property>
hdfs-site.xml
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/path/to/hdfs.keytab</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@HADOOP.COM</value>
</property>
开启http的Kerberos认证后, hdfs相关的web界面不能直接访问了,必须通过域名进行访问, 不能ip访问会报错401或403
- curl
curl --negotiate -u : http://hdfs-node1:9870 - 火狐浏览器
1.之所以用火狐是因为配置较为简单, 地址栏输入
about:config,调整如下两个配置network.negotiate-auth.trusted-uris=hdfs-node1,hdfs-node2输入目标机器的域名;network.negotiate-auth.using-native-gsslib=true;</br>2.下载window版本的MIT,即kerberos的环境,修改配置krb5.ini(可以将linux kdc下的/etc/krb5.conf拷贝下来),随后GetTicket输入账号密码;
3.访问http://hdfs-node1:9870
重启hdfs所有节点, 并挑选机器进行验证, 机器需要安装Kerberos Client
kdestroy
kinit -kt /tmp/hadoop-node1/hdfs.keytab hdfs/hadoop-node1@HADOOP.COM
hadoop fs -ls /
命令能正常输出即为正常
2. yarn配置
涉及角色: NodeManager,ResourceManager,JobHistory Server
yarn同样需要提前生成kerberos的keytab文件,参考hdsf章节
yarn/yarn-node1@HADOOP.COM
yarn/yarn-node2@HADOOP.COM
core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
hdfs-site.xml
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@HADOOP.COM</value>
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>HTTP/_HOST@HADOOP.COM</value>
</property>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST@HADOOP.COM</value>
</property>
yarn-site.xml
- ResourceManager
<!-- resource manager -->
<property>
<name>yarn.resourcemanager.principal</name>
<value>yarn/_HOST@HADOOP.COM</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.spnego-principal</name>
<value>HTTP/_HOST@HADOOP.COM</value>
</property>
<property>
<name>yarn.resourcemanager.keytab</name>
<value>/path/to/yarn.keytab</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.spnego-keytab-file</name>
<value>/path/to/yarn.keytab</value>
</property>
- NodeManager
<!-- nodemanager -->
<property>
<name>yarn.nodemanager.principal</name>
<value>yarn/_HOST@HADOOP.COM</value>
</property>
<property>
<name>yarn.nodemanager.webapp.spnego-principal</name>
<value>HTTP/_HOST@HADOOP.COM</value>
</property>
<property>
<name>yarn.nodemanager.keytab</name>
<value>/path/to/yarn.keytab</value>
</property>
<property>
<name>yarn.nodemanager.webapp.spnego-keytab-file</name>
<value>/path/to/yarn.keytab</value>
</property>
yarn http web控制台Kerberos(按需)
前提: hdfs开启了控制台的Kerberos认证
- core-site.xml
<property>
<name>hadoop.http.authentication.type</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.http.authentication.signature.secret.file</name>
<value>/opt/http-auth-signature-secret</value>
</property>
<property>
<name>hadoop.http.authentication.kerberos.principal</name>
<value>HTTP/_HOST@HADOOP.COM</value>
</property>
<property>
<name>hadoop.http.authentication.cookie.domain</name>
<value></value>
</property>
<property>
<name>hadoop.http.authentication.kerberos.keytab</name>
<value>yarn.keytab</value>
</property>
开启yarn的Kerberos认证后, yarn相关的web界面不能直接访问了,必须通过域名进行访问, 不能ip访问会报错401或403
- curl
curl --negotiate -u : http://yarn-node1:8088/cluster -
火狐浏览器
1.之所以用火狐是因为配置较为简单, 地址栏输入
about:config,调整如下两个配置network.negotiate-auth.trusted-uris=yarn-node1,yarn-node2输入目标机器的域名;network.negotiate-auth.using-native-gsslib=true;
2.下载window版本的MIT,即kerberos的环境,修改配置krb5.ini(可以将linux kdc下的/etc/krb5.conf拷贝下来),随后GetTicket输入账号密码;
3.访问http://yarn-node1:8088
三. kerberos适配
- flink相关脚本改造
# 导入凭证
kdestroy
kinit -kt /tmp/hdfs.keytab hdfs@HADOOP.COM
# 执行
export KRB5_CONFIG=/tmp/krb5.conf
$FLINK_HOME/bin/flink run -d \
-yD security.kerberos.krb5-conf.path=/tmp/krb5.conf \
-yD security.kerberos.login.keytab=/tmp/hdfs.keytab \
-yD security.kerberos.login.principal=hdfs@HADOOP.COM \
....
- hdfs
public static void main(String[] args) throws Exception {
String local_dir = "/tmp/";
String userName = "hdfs/hdfs@HADOOP.COM";
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://xxx:8020");
conf.set("hadoop.security.authentication", "kerberos");
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
System.setProperty("java.security.krb5.conf", local_dir + "krb5.conf");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytabAndReturnUGI(userName, local_dir + "hdfs.keytab");
SecurityUtil.setSecurityInfoProviders(new AnnotatedSecurityInfo());
FileSystem fs = UserGroupInformation.getLoginUser().doAs(
(PrivilegedExceptionAction<FileSystem>) () -> FileSystem.get(conf));
}
五. 常见异常处理
Requested user hdfs is not whitelisted and has id 981,which is below the minimum allowed 1000
修改nodemanager的container-executor.cfg配置, min.user.id=0
Requested user hdfs is banned
修改nodemanager的container-executor.cfg配置, 在users的配置项去掉hdfs用户,banned.users=root,bin
Flink任务运行一段时间后异常, Unable to set the Hadoop login user, Checksum failed
Shutting YarnJobClusterEntrypoint down with application status FAILED. Diagnostics org.apache.flink.runtime.security.modules.SecurityModule$Security
hdfs@HADOOP.COM from keytab /yarn/nm/usercache/hdfs/appcache/application_1712457844733_0001/container_1712457844733_0001_03_000001/krb5.keytab javax
Caused by: javax.security.auth.login.LoginException: Checksum failed
解决:
说明hadoop集群的kb认证可能到期了或者变更了, 需要在cdh上重新生成凭据, 然后重启集群