27-使用Ansible部署OpenShift Origin集群
1.概述
研究使用Ansible来部署OpenShift Origin容器云集群(OKD)。以下以一个简单的测试集群(v3.11)为例,详细记录安装部署过程中遇到的问题及处理办法。
2.规划
可以从规模和高可用性来规划。
规模:打算跑多少个pod;
高可用性:生产环境还是测试环境;
主机名 | 安装的基础组件 |
---|---|
master.example.com | Master, etcd, and node |
node-1.example.com | Node |
其它规模集群,可参考官网。
3.资源
根据集群规模,可确定需要的服务器资源,详细可参考官网。
4.安装
以CentOS7作为底层操作系统。
4.1.SELinux requirements
开启SELinux。
/etc/selinux/config文件如下:
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
4.2.DNS Requirements
最好是在集群环境内搭建DNS服务器,如果仅仅测试用,可在/etc/hosts文件中配置。
4.3.防护墙
集群默认使用iptables防火墙,但在CentOS7中,建议使用firewalld。可在hosts文件中做如下配置。
[OSEv3:vars]
os_firewall_use_firewalld=True
开放端口。
端口 | 协议 |
---|---|
4789 | UDP |
53/8053 | TCP/UDP |
443/8443 | TCP |
10250 | TCP |
10010 | TCP |
2049 | TCP/UDP |
2379 | TCP |
2380 | TCP |
9000 | TCP |
8444 | TCP |
9200 | TCP |
9300 | TCP |
1936 | TCP |
或者关闭防火墙。
如果是测试,建议关闭防火墙。
4.4.配置SSH
确保通过SSH免密码能登陆个节点,将公钥塞到各节点。
$ for host in master.example.com \
master.example.com \
node-1.example.com; \
do ssh-copy-id -i ~/.ssh/id_rsa.pub $host; \
done
4.5.安装依赖
以下依赖,需要在各个节点安装。
$ yum install wget git net-tools bind-utils yum-utils bridge-utils bash-completion kexec-tools sos psacct
更新系统。
$ yum update -y
$ reboot
在master节点上安装ansible。
$ yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$ sed -i -e "s/^enabled=1/enabled=0/" /etc/yum.repos.d/epel.repo
$ yum -y --enablerepo=epel install ansible pyOpenSSL
在master上git openshift-ansible。
$ cd /opt
$ git clone https://github.com/openshift/openshift-ansible
$ cd openshift-ansible
$ git checkout release-3.11
openshift-ansible需要和docker版本对应。
4.6.安装docker
以1.13.1为例,需要在master和node节点上安装。
$ yum install docker-1.13.1 -y
$ rpm -V docker-1.13.1
$ docker version
配置docker存储。
4.7.配置hosts文件
如下。
# Create an OSEv3 group that contains the masters, nodes, and etcd groups
[OSEv3:children]
masters
nodes
etcd
# Set variables common for all OSEv3 hosts
[OSEv3:vars]
os_firewall_use_firewalld=True
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root
# If ansible_ssh_user is not root, ansible_become must be set to true
#ansible_become=true
openshift_deployment_type=origin
openshift_release=3.11
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability,package_version,package_availability
# uncomment the following to enable htpasswd authentication; defaults to AllowAllPasswordIdentityProvider
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
# host group for masters
[masters]
master.example.com
# host group for etcd
[etcd]
master.example.com
# host group for nodes, includes region info
[nodes]
master.example.com openshift_node_group_name='node-config-master-infra'
node-1.example.com openshift_node_group_name='node-config-compute'
4.8.安装
$ ansible-playbook openshift-ansible/playbooks/prerequisites.yml -v
等待安全,安装完后,如下。
PLAY RECAP **************************************************************************************
localhost : ok=11 changed=0 unreachable=0 failed=0
master.example.com : ok=88 changed=6 unreachable=0 failed=0
node-1.example.com : ok=59 changed=6 unreachable=0 failed=0
INSTALLER STATUS ********************************************************************************
Initialization : Complete (0:00:42)
如果报错,该命令可重复执行。
安装部署。
$ ansible-playbook openshift-ansible/playbooks/deploy_cluster.yml -v
等待安装,安装完后,如下。
......
PLAY RECAP ****************************************************************************************************
localhost : ok=11 changed=0 unreachable=0 failed=0
master.example.com : ok=690 changed=166 unreachable=0 failed=0
node-1.example.com : ok=114 changed=17 unreachable=0 failed=0
INSTALLER STATUS **********************************************************************************************
Initialization : Complete (0:00:42)
Health Check : Complete (0:00:01)
Node Bootstrap Preparation : Complete (0:01:43)
etcd Install : Complete (0:00:44)
Master Install : Complete (0:03:57)
Master Additional Install : Complete (0:00:44)
Node Join : Complete (0:00:31)
Hosted Install : Complete (0:00:58)
Cluster Monitoring Operator : Complete (0:00:18)
Web Console Install : Complete (0:00:55)
Console Install : Complete (0:00:39)
metrics-server Install : Complete (0:00:02)
Service Catalog Install : Complete (0:03:40)
如果报错,该命令可重复执行。
5.验证
安装完后,检查成员列表及状态。
$ oc get nodes
NAME STATUS ROLES AGE VERSION
master.example.com Ready infra,master 23h v1.11.0+d4cacc0
node-1.example.com Ready compute,infra 23h v1.11.0+d4cacc0
使用https://master.example.comg:8443进入console界面。
注:需要确保master.example.com能正确解析到master节点。
6.遇到问题
1)报错-1
fatal: [rhel7-5-a]: FAILED! => {"msg": "last_checked_host: rhel7-5-a, last_checked_var: openshift_master_manage_htpasswd;openshift_master_identity_providers contains a provider of kind==HTPasswdPasswordIdentityProvider and filename is set. Please migrate your htpasswd files to /etc/origin/master/htpasswd and update your existing master configs, and remove the filename keybefore proceeding."}
原因分析:新版本取消了openshift_master_identity_providers中“finename”关键字。
解决办法:删除“filename”键值对。
注:集群部署成功后,会在/etc/origin/master下创建htpasswd空文件,用户可进入该目录通过htpasswd -c htpasswd username password方式创建账户,用于集群登录。
2)报错-2
Failed : Retrying : Wait for ServiceMonitor CRD to be created
原因分析:ifcfg-eth0中为NM_CONTROLLED=no。
解决办法:更改为NM_CONTROLLED=yes,并重启系统。
3)报错-3
Console install failed
原因分析:v3.9以前console是以服务在宿主机上运行,v3.9及以后是以pod运行的,默认情况下会将pod调度到带有master=true标签的node上,如果没有这样的node,则console创建失败。
解决方案:在[nodes] entry中添加“openshift_node_group_name='node-config-master'”或“openshift_node_group_name='node-config-master-infra'”
参考地址:
https://www.jianshu.com/p/2d1e5ce931c8
官网:https://docs.okd.io/latest/install/index.html
认证模式:https://docs.openshift.com/container-platform/3.7/install_config/configuring_authentication.html#LookupMappingMethod
https://blog.csdn.net/sun_qiangwei/article/details/80443943
报错-1:https://github.com/openshift/openshift-docs/pull/9138
报错-2:https://github.com/gshipley/installcentos/issues/112
报错-3:https://github.com/openshift/openshift-ansible/issues/6912,https://bugzilla.redhat.com/show_bug.cgi?id=1576088