OpenShift 3.11 Prometheus Operat
2018-11-06 本文已影响0人
ragpo
Operator 部署、运行、使用流程简介
- 使用 Deployment 创建 Prometheus Operator;
- 这时候 Operator 会自动创建 CustomResourceDefinitions(CRD),CRD 就如pod、devployment、rc等资源一样供用户使用;
- 这时候用 oc get customresourcedefinitions 可以看到有哪些 customresourcedefinitions;
- 使用 oc get 具体的 customresourcedefinitions对象(例如oc get pod 那样),就可以看到该 customresourcedefinitions 有哪些具体运行的实例了;
- 因为 operator 是有状态服务的管理,所以是以statefuset的方式运行
接下来以 OCP 3.11 的 Prometheus Operator 为例进行介绍
- 进入 openshift-monitoring 项目,查看当前有哪些 deployment,关注 prometheus-operator 这个 deployment
[root@master ~]# oc project
Using project "openshift-monitoring" on server "https://openshift-cluster.test2.com:8443".
[root@master ~]# oc get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
cluster-monitoring-operator 1 1 1 1 10d
grafana 1 1 1 1 10d
kube-state-metrics 1 1 1 1 10d
prometheus-operator 1 1 1 1 10d
- 查看 prometheus-operator 的 deployment,可以看到该 deployment 用到了一个叫做 ose-prometheus-operator 的镜像。
[root@master ~]# oc get deployment prometheus-operator -o yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: 2018-10-21T03:17:27Z
generation: 229
labels:
k8s-app: prometheus-operator
name: prometheus-operator
namespace: openshift-monitoring
resourceVersion: "104636"
selfLink: /apis/extensions/v1beta1/namespaces/openshift-monitoring/deployments/prometheus-operator
uid: d69fae8c-d4df-11e8-9b31-000c29c33fc6
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: prometheus-operator
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: prometheus-operator
spec:
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --logtostderr=true
- --config-reloader-image=172.16.37.12:5000/openshift3/ose-configmap-reloader:v3.11.16
- --prometheus-config-reloader=172.16.37.12:5000/openshift3/ose-prometheus-config-reloader:v3.11.16
- --namespace=openshift-monitoring
image: 172.16.37.12:5000/openshift3/ose-prometheus-operator:v3.11.16
imagePullPolicy: IfNotPresent
name: prometheus-operator
ports:
- containerPort: 8080
name: http
protocol: TCP
resources: {}
securityContext: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
nodeSelector:
node-role.kubernetes.io/infra: "true"
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: prometheus-operator
serviceAccountName: prometheus-operator
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2018-10-21T03:17:27Z
lastUpdateTime: 2018-10-21T03:17:31Z
message: ReplicaSet "prometheus-operator-56c79d89b9" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: 2018-10-31T02:33:04Z
lastUpdateTime: 2018-10-31T02:33:04Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 229
readyReplicas: 1
replicas: 1
updatedReplicas: 1
- 查看有哪些 customresourcedefinitions,其中前三个都是部署完 prometheus-operator deployment 后自动生成的
[root@master ~]# oc get customresourcedefinitions
NAME CREATED AT
alertmanagers.monitoring.coreos.com 2018-10-21T03:17:30Z
prometheuses.monitoring.coreos.com 2018-10-21T03:17:30Z
prometheusrules.monitoring.coreos.com 2018-10-21T03:17:30Z
servicemonitors.monitoring.coreos.com 2018-10-21T03:17:30Z
- 查看 prometheuses.monitoring 的 customresourcedefinitions 具体内容,其中重点是 names: kind: Prometheus 字段,定义了 customresourcedefinitions 对外暴露的资源对象 Prometheus ,这样就有了如 pod 一样的对象
[root@master ~]# oc get customresourcedefinition prometheuses.monitoring.coreos.com -o yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
creationTimestamp: 2018-10-21T03:17:30Z
generation: 1
name: prometheuses.monitoring.coreos.com
resourceVersion: "2627"
selfLink: /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/prometheuses.monitoring.coreos.com
uid: d80c1379-d4df-11e8-9b31-000c29c33fc6
spec:
additionalPrinterColumns:
- JSONPath: .metadata.creationTimestamp
description: |-
CreationTimestamp is a timestamp representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations. Clients may not set this value. It is represented in RFC3339 form and is in UTC.
Populated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
name: Age
type: date
group: monitoring.coreos.com
names:
kind: Prometheus
listKind: PrometheusList
plural: prometheuses
singular: prometheus
scope: Namespaced
validation:
openAPIV3Schema:
- 查看prometheuses.monitoring正在具体运行的实例(prometheuses.monitoring.coreos.com与prometheuses.monitoring、prometheuses等同,是一样的,prometheuses 是最简的缩写)
[root@master ~]# oc get prometheuses.monitoring.coreos.com
NAME AGE
k8s 10d
- 查看 prometheuses 正在具体运行的实例 k8s 的yaml文件内容,可以看到调用的kind为Prometheus;其中定义了 prometheuses 的运行副本为两个。
[root@master ~]# oc get prometheuses k8s -o yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
creationTimestamp: 2018-10-21T03:18:03Z
generation: 1
labels:
prometheus: k8s
name: k8s
namespace: openshift-monitoring
resourceVersion: "2842"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-monitoring/prometheuses/k8s
uid: ebf61283-d4df-11e8-9b31-000c29c33fc6
spec:
alerting:
alertmanagers:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
name: alertmanager-main
namespace: openshift-monitoring
port: web
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
serverName: alertmanager-main.openshift-monitoring.svc
baseImage: 172.16.37.12:5000/openshift3/prometheus
containers:
- args:
- -provider=openshift
- -https-address=:9091
- -http-address=
- -email-domain=*
- -upstream=http://localhost:9090
- -htpasswd-file=/etc/proxy/htpasswd/auth
- -openshift-service-account=prometheus-k8s
- '-openshift-sar={"resource": "namespaces", "verb": "get"}'
- '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}'
- -tls-cert=/etc/tls/private/tls.crt
- -tls-key=/etc/tls/private/tls.key
- -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
- -cookie-secret-file=/etc/proxy/secrets/session_secret
- -openshift-ca=/etc/pki/tls/cert.pem
- -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- -skip-auth-regex=^/metrics
image: 172.16.37.12:5000/openshift3/oauth-proxy:v3.11.16
name: prometheus-proxy
ports:
- containerPort: 9091
name: web
resources: {}
volumeMounts:
- mountPath: /etc/tls/private
name: secret-prometheus-k8s-tls
- mountPath: /etc/proxy/secrets
name: secret-prometheus-k8s-proxy
- mountPath: /etc/proxy/htpasswd
name: secret-prometheus-k8s-htpasswd
externalLabels:
cluster: openshift-cluster.test2.com
externalUrl: https://prometheus-k8s-openshift-monitoring.test2.com/
listenLocal: true
nodeSelector:
node-role.kubernetes.io/infra: "true"
replicas: 2
resources: {}
retention: 15d
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
secrets:
- prometheus-k8s-tls
- prometheus-k8s-proxy
- prometheus-k8s-htpasswd
securityContext: {}
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector:
matchExpressions:
- key: openshift.io/cluster-monitoring
operator: Exists
serviceMonitorSelector:
matchExpressions:
- key: k8s-app
operator: Exists
tag: v3.11.16
version: v2.3.2
- 查看statefulset,可以看到 prometheus-k8s 的 statefulset 数量也是为2
[root@master ~]# oc get statefulset
NAME DESIRED CURRENT AGE
alertmanager-main 3 3 10d
prometheus-k8s 2 2 10d
- 查看pod,prometheus-k8s 的 pod 数量为2.
[root@master ~]# oc get pod
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 3/3 Running 12 2d
alertmanager-main-1 3/3 Running 12 1d
alertmanager-main-2 3/3 Running 12 1d
cluster-monitoring-operator-6b5cdf65c5-47xtm 1/1 Running 5 2d
grafana-f47c66565-q8ddc 2/2 Running 8 2d
kube-state-metrics-5564bb7b47-m786t 3/3 Running 13 2d
node-exporter-6v8xl 2/2 Running 40 10d
node-exporter-vz5sv 2/2 Running 30 10d
prometheus-k8s-0 4/4 Running 21 1d
prometheus-k8s-1 4/4 Running 17 1d
prometheus-operator-56c79d89b9-swb2v 1/1 Running 6 2d
- 通过修改 Prometheus 和 alertmanager 的对象资源副本数即可增删pod
[root@master ~]# oc edit prometheus k8s
prometheus.monitoring.coreos.com/k8s edited
# Please edit the object below. Lines beginning with a '#' will be ignored,
[root@master ~]# oc get statefulset
NAME DESIRED CURRENT AGE
alertmanager-main 3 3 10d
prometheus-k8s 1 1 10d
[root@master ~]# oc get pod
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 3/3 Running 12 2d
alertmanager-main-1 3/3 Running 12 1d
alertmanager-main-2 3/3 Running 12 1d
cluster-monitoring-operator-6b5cdf65c5-47xtm 1/1 Running 5 2d
grafana-f47c66565-q8ddc 2/2 Running 8 2d
kube-state-metrics-5564bb7b47-m786t 3/3 Running 13 2d
node-exporter-6v8xl 2/2 Running 40 10d
node-exporter-vz5sv 2/2 Running 30 10d
prometheus-k8s-0 4/4 Running 21 1d
prometheus-operator-56c79d89b9-swb2v 1/1 Running 6 2d
[root@master ~]# oc get alertmanager
NAME AGE
main 10d
[root@master ~]# oc edit alertmanager main
alertmanager.monitoring.coreos.com/main edited
[root@master ~]# oc get pod
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 0/3 ContainerCreating 0 2s
cluster-monitoring-operator-6b5cdf65c5-47xtm 1/1 Running 5 2d
grafana-f47c66565-q8ddc 2/2 Running 8 2d
kube-state-metrics-5564bb7b47-m786t 3/3 Running 13 2d
node-exporter-6v8xl 2/2 Running 40 10d
node-exporter-vz5sv 2/2 Running 30 10d
prometheus-k8s-0 4/4 Running 21 1d
prometheus-operator-56c79d89b9-swb2v 1/1 Running 6 2d
[root@master ~]# oc get pod
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 3/3 Running 0 8s
cluster-monitoring-operator-6b5cdf65c5-47xtm 1/1 Running 5 2d
grafana-f47c66565-q8ddc 2/2 Running 8 2d
kube-state-metrics-5564bb7b47-m786t 3/3 Running 13 2d
node-exporter-6v8xl 2/2 Running 40 10d
node-exporter-vz5sv 2/2 Running 30 10d
prometheus-k8s-0 4/4 Running 21 1d
prometheus-operator-56c79d89b9-swb2v 1/1 Running 6 2d