Prometheus Operator 高级配置(服务自动发现、
自动发现配置
1、定义配置jod(prometheus-additional.yaml)
$ cd /k8s-cmp/yaml/prometheus_Operator/kube-prometheus/manifests
$ vim prometheus-additional.yaml
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
2、创建jod的secret对象
$ kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret "additional-configs" created
3、声明 prometheus 的资源对象文件中添加上这个额外的配置:(prometheus-prometheus.yaml)
添加:
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0
4、添加完成后,直接更新 prometheus 这个 CRD 资源对象:
$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com "k8s" configured
5、隔一小会儿,可以前往 Prometheus 的 Dashboard 中查看配置是否生效:
config在 Prometheus Dashboard 的配置页面下面我们可以看到已经有了对应的的配置信息了,但是我们切换到 targets 页面下面却并没有发现对应的监控任务,查看 Prometheus 的 Pod 日志:
$ kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
level=error ts=2018-12-20T15:14:06.772903214Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list pods at the cluster scope"
level=error ts=2018-12-20T15:14:06.773096875Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list services at the cluster scope"
level=error ts=2018-12-20T15:14:06.773212629Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list endpoints at the cluster scope"
......
可以看到有很多错误日志出现,都是xxx is forbidden,这说明是 RBAC 权限的问题,通过 prometheus 资源对象的配置可以知道 Prometheus 绑定了一个名为 prometheus-k8s 的 ServiceAccount 对象,而这个对象绑定的是一个名为 prometheus-k8s 的 ClusterRole。
6、修改 prometheus-k8s 的 ClusterRole(prometheus-clusterRole.yaml)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
更新上面的 ClusterRole 这个资源对象,然后重建下 Prometheus 的所有 Pod,正常就可以看到 targets 页面下面有 kubernetes-service-endpoints 这个监控任务了:
$ kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
···
prometheus-k8s-0 3/3 Running 1 87m
prometheus-k8s-1 3/3 Running 1 87m
···
$ kubectl delete pod prometheus-k8s-0 -n monitoring
$ kubectl delete pod prometheus-k8s-1 -n monitoring
image.png
这里自动监控了两个 Service,第一个是之前创建的 Redis 的服务,在 Redis Service 中有两个特殊的 annotations:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9121"
所以被自动发现了,当然也可以用同样的方式去配置 Pod、Ingress 这些资源对象的自动发现。
数据持久化
在修改完权限的时候,重启了 Prometheus 的 Pod,会发现之前采集的数据已经没有了,这是因为通过 prometheus 这个 CRD 创建的 Prometheus 并没有做数据的持久化,查看生成的 Prometheus Pod 的挂载情况就清楚了:
$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
volumeMounts:
- mountPath: /etc/prometheus/config_out
name: config-out
readOnly: true
- mountPath: /prometheus
name: prometheus-k8s-db
......
volumes:
......
- emptyDir: {}
name: prometheus-k8s-db
......
可以看到 Prometheus 的数据目录 /prometheus 实际上是通过 emptyDir 进行挂载的, emptyDir 挂载的数据的生命周期和 Pod 生命周期一致的,所以如果 Pod 挂掉了,数据也就丢失了,这也就是为什么重建 Pod 后之前的数据就没有了的原因,对应线上的监控数据肯定需要做数据的持久化的,同样的 prometheus 这个 CRD 资源提供了数据持久化的配置方法,Prometheus 最终是通过 Statefulset 控制器进行部署的,所以需要通过 storageclass 来做数据持久化。
1、创建一个 StorageClass 对象( prometheus-storageclass.yaml)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: prometheus-data-db
provisioner: fuseim.pri/ifs
2、在 prometheus 的 CRD 资源对象中添加如下配置:(prometheus-prometheus.yaml)
添加
storage:
volumeClaimTemplate:
spec:
storageClassName: prometheus-data-db
resources:
requests:
storage: 10Gi
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
storage:
volumeClaimTemplate:
spec:
storageClassName: prometheus-data-db
resources:
requests:
storage: 16Gi
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0
注意这里的 storageClassName 名字为上面创建的 StorageClass 对象名称,然后更新 prometheus 这个 CRD 资源。更新完成后会自动生成两个 PVC 和 PV 资源对象:
$ kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-0cc03d41-047a-11e9-a777-525400db4df7 10Gi RWO prometheus-data-db 8m
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-1938de6b-047b-11e9-a777-525400db4df7 10Gi RWO prometheus-data-db 1m
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-0cc03d41-047a-11e9-a777-525400db4df7 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 prometheus-data-db 2m
pvc-1938de6b-047b-11e9-a777-525400db4df7 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 prometheus-data-db 1m
3、查看 Prometheus Pod 的数据目录就可以看到是关联到一个 PVC 对象上了。
$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
volumeMounts:
- mountPath: /etc/prometheus/config_out
name: config-out
readOnly: true
- mountPath: /prometheus
name: prometheus-k8s-db
......
volumes:
......
- name: prometheus-k8s-db
persistentVolumeClaim:
claimName: prometheus-k8s-db-prometheus-k8s-0
......