k8s集群及metrics-server、prometheus搭
准备工作
准备三台Linux服务器
image.png
角色 | 主机名 | IP地址 |
---|---|---|
master | k8s | 192.168.5.159 |
node1 | k8s-node | 192.168.5.160 |
node2 | k8s-node2 | 192.168.5.161 |
vim /etc/hosts
192.168.5.159 master
192.168.5.160 node1
192.168.5.161 node2
关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
三台服务器校验系统时间
# 安装ntp
yum install -y ntp
# 同步时间
ntpdate cn.pool.ntp.org
关闭selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0
关闭swap => K8S中不支持swap分区
编辑etc/fstab将swap那一行注释掉或者删除掉
vim /etc/fstab
#/dev/mapper/centos-swap swap swap defaults 0 0
将桥接的IPv4流量传递到iptables的链
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
安装docker、kubeadm、kublet
以下所有节点都需要做的操作
# wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
# yum -y install docker-ce-18.06.1.ce-3.el7
# systemctl enable docker && systemctl start docker
# docker --version
Docker version 18.06.1-ce, build e68fc7a
添加阿里云yum软件源
# cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
安装Kubeadm&Kubelet&Kubectl
yum install -y kubelet-1.17.3 kubeadm-1.17.3 kubectl-1.17.3
systemctl enable kubelet
部署Master节点
由于默认拉取镜像地址k8s.gcr.io国内无法访问,这里指定阿里云镜像仓库地址(registry.aliyuncs.com/google_containers)。官方建议服务器至少2CPU+2G内存
kubeadm init \
--apiserver-advertise-address=192.168.5.159 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.17.3 \
--service-cidr=10.1.0.0/16 \
--pod-network-cidr=10.244.0.0/16
配置文件
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# kubectl get nodes
部署pod网络插件
节点全部需部署,被墙原因,曲线救国
docker pull lizhenliang/flannel:v0.11.0-amd64
docker tag lizhenliang/flannel:v0.11.0-amd64 quay.io/coreos/flannel:v0.12.0-amd64
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
查看状态
全部为Running则OK,其中一个不为Running,比如:Pending、ImagePullBackOff都表明Pod没有就绪
kubectl get pod --all-namespaces
如果其中有的Pod没有Running,可以通过以下命令查看具体错误原因,比如这里我想查看kube-flannel-ds-amd64-8bmbm这个pod的错误信息:
kubectl describe pod kube-flannel-ds-amd64-xpd82 -n kube-system
node节点加入master
kubeadm join 192.168.5.159:6443 --token 1l64hh.7z7xgdjp4bu58720 --discovery-token-ca-cert-hash sha256:8c4bafd2aa326a7c45754f982132a38a8b4f651ca6d052dc4294424e93fe7129
如果master的token找不到,可以使用以下命令查看
kubeadm token list
token默认有效期24小时,过期后使用该命令无法查看,可通过命令修改
kubeadm token create
获取ca证书sha256编码hash值
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
部署metrics-server
下载部署文件
for file in auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml ; do wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/$file;done
image.png
修改metrics-server-deployment.yaml
被墙原因镜像地址改成了国内的
image.png
image.png
修改resource-reader.yaml
添加nodes/stats
image.png
kubectl apply -f .
image.png
测试
image.png
image.png
metrics-server API使用:
<meta charset="utf-8">
Metrics-server 可用 API 列表如下:
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes/<node-name>
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/pods
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/namespace/<namespace-name>/pods/<pod-name>
由于 k8s 在 v1.10 后废弃了 8080 端口,可以通过代理或者使用认证的方式访问这些 API:
$ kubectl proxy
$ curl http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
},
"items": [
{
"metadata": {
"name": "node2",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/node2",
"creationTimestamp": "2020-08-28T02:24:07Z"
},
"timestamp": "2020-08-28T02:23:42Z",
"window": "30s",
"usage": {
"cpu": "37549321n",
"memory": "302864Ki"
}
},
{
"metadata": {
"name": "master",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/master",
"creationTimestamp": "2020-08-28T02:24:07Z"
},
"timestamp": "2020-08-28T02:24:30Z",
"window": "30s",
"usage": {
"cpu": "174668532n",
"memory": "1105964Ki"
}
},
{
"metadata": {
"name": "node1",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/node1",
"creationTimestamp": "2020-08-28T02:24:07Z"
},
"timestamp": "2020-08-28T02:23:43Z",
"window": "30s",
"usage": {
"cpu": "22156105n",
"memory": "362676Ki"
}
}
]
}
也可以直接通过 kubectl 命令来访问这些 API,比如:
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/<node-name>
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespace/<namespace-name>/pods/<pod-name>
prometheus搭建
git clone https://github.com/iKubernetes/k8s-prom.git
cd k8s-prom
# 创建prom的名称空间
[root@master k8s-prom]# kubectl apply -f namespace.yaml
namespace/prom created
# 部署node_exporter:
[root@master k8s-prom]# cd node_exporter/
[root@master node_exporter]# ls
node-exporter-ds.yaml node-exporter-svc.yaml
[root@master node_exporter]# kubectl apply -f .
daemonset.apps/prometheus-node-exporter created
service/prometheus-node-exporter created
[root@master node_exporter]# kubectl get pods -n prom
NAME READY STATUS RESTARTS AGE
prometheus-node-exporter-dmmjj 1/1 Running 0 7m
prometheus-node-exporter-ghz2l 1/1 Running 0 7m
prometheus-node-exporter-zt2lw 1/1 Running 0 7m
# 部署prometheus
[root@master k8s-prom]# cd prometheus/
[root@master prometheus]# ls
prometheus-cfg.yaml prometheus-deploy.yaml prometheus-rbac.yaml prometheus-svc.yaml
[root@master prometheus]# kubectl apply -f .
configmap/prometheus-config created
deployment.apps/prometheus-server created
clusterrole.rbac.authorization.k8s.io/prometheus created
serviceaccount/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
service/prometheus created
看prom名称空间中的所有资源
[root@master prometheus]# kubectl logs prometheus-server-556b8896d6-dfqkp -n prom
Warning FailedScheduling 2m52s (x2 over 2m52s) default-scheduler 0/3 nodes are available: 3 Insufficient memory.
修改prometheus-deploy.yaml,删掉内存那三行
resources:
limits:
memory: 2Gi
重新apply
[root@master prometheus]# kubectl apply -f prometheus-deploy.yaml
[root@master prometheus]# kubectl get all -n prom
NAME READY STATUS RESTARTS AGE
pod/prometheus-node-exporter-dmmjj 1/1 Running 0 10m
pod/prometheus-node-exporter-ghz2l 1/1 Running 0 10m
pod/prometheus-node-exporter-zt2lw 1/1 Running 0 10m
pod/prometheus-server-65f5d59585-6l8m8 1/1 Running 0 55s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus NodePort 10.111.127.64 <none> 9090:30090/TCP 56s
service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 10m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prometheus-node-exporter 3 3 3 3 3 <none> 10m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-server 1 1 1 1 56s
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-server-65f5d59585 1 1 1 56s
上面我们看到通过NodePorts的方式,可以通过宿主机的30090端口,来访问prometheus容器里面的应用。
image.png
部署kube-state-metrics,用来整合数据
[root@master k8s-prom]# cd kube-state-metrics/
[root@master kube-state-metrics]# ls
kube-state-metrics-deploy.yaml kube-state-metrics-rbac.yaml kube-state-metrics-svc.yaml
[root@master kube-state-metrics]# kubectl apply -f .
deployment.apps/kube-state-metrics created
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
镜像被墙原因无法拉取,可拉国内再打tag
docker pull quay.io/coreos/kube-state-metrics:v1.3.1
docker tag quay.io/coreos/kube-state-metrics:v1.3.1 gcr.io/google_containers/kube-state-metrics-amd64:v1.3.1
部署k8s-prometheus-adapter,这个需要自制证书:
[root@master k8s-prometheus-adapter]# cd /etc/kubernetes/pki/
[root@master pki]# (umask 077; openssl genrsa -out serving.key 2048)
Generating RSA private key, 2048 bit long modulus
...........................................................................................+++
...............+++
e is 65537 (0x10001)
证书请求:
[root@master pki]# openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"
开始签证:
[root@master pki]# openssl x509 -req -in serving.csr -CA ./ca.crt -CAkey ./ca.key -CAcreateserial -out serving.crt -days 3650
Signature ok
subject=/CN=serving
Getting CA Private Key
创建加密的配置文件:
注:cm-adapter-serving-certs是custom-metrics-apiserver-deployment.yaml文件里面的名字。
[root@master pki]# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key -n prom
secret/cm-adapter-serving-certs created
[root@master pki]# kubectl get secrets -n prom
NAME TYPE DATA AGE
cm-adapter-serving-certs Opaque 2 51s
default-token-knsbg kubernetes.io/service-account-token 3 1h
kube-state-metrics-token-sccdf kubernetes.io/service-account-token 3 1h
prometheus-token-nqzbz kubernetes.io/service-account-token 3 1h
部署k8s-prometheus-adapter:
[root@master k8s-prom]# cd k8s-prometheus-adapter/
[root@master k8s-prometheus-adapter]# ls
custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml custom-metrics-apiserver-service.yaml
custom-metrics-apiserver-auth-reader-role-binding.yaml custom-metrics-apiservice.yaml
custom-metrics-apiserver-deployment.yaml custom-metrics-cluster-role.yaml
custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml custom-metrics-resource-reader-cluster-role.yaml
custom-metrics-apiserver-service-account.yaml hpa-custom-metrics-cluster-role-binding.yaml
会遇到不兼容问题,解决方法访问https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/deploy/manifests下载最新版的custom-metrics-apiserver-deployment.yaml文件,并把里面的namespace的名字改成prom;同时还要下载custom-metrics-config-map.yaml文件到本地来,并把里面的namespace的名字改成prom。
[root@master k8s-prometheus-adapter]# kubectl apply -f .
查看状态,全部running状态
[root@master k8s-prometheus-adapter]# kubectl get all -n prom
查看api是否存在 custom.metrics.k8s.io/v1beta1
[root@master k8s-prometheus-adapter]# kubectl api-versions
custom.metrics.k8s.io/v1beta1
开代理测试
[root@master k8s-prometheus-adapter]# kubectl proxy --port=8080
[root@master pki]# curl http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "namespaces/kube_endpoint_info",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/kube_hpa_status_desired_replicas",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/kube_pod_container_status_waiting",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/kube_hpa_labels",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "jobs.batch/kube_hpa_spec_min_replicas",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
}
问题:
1、node节点 使用 kubectl查询资源报错:kubernetes:The connection to the server localhost:8080 was refused - did you specify the right host
将master节点的admin.conf 拷贝到work节点$HOME/.kube目录下,文件改名config
2、证书已存在,删除对应目录重新执行命令
3、kubelet和kubeadm版本不一致问题,重新安装
yum remove kubelet
yum install -y kubelet-1.17.3 kubeadm-1.17.3 kubectl-1.17.3