在 AKS 上部署 prometheus-operator
2019-12-31 本文已影响0人
leeehao
在 Azure kubernetes service 上部署 prometheus-operator
问题
1. 监控被托管的组件
类似 AKS EKS GKE 等托管平台,ETCD 相关组件请按照平台文档选择性开启
2. helm 部署显示 Error: release xxxx failed: context canceled
手动部署 CRD 后执行 helm install
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/alertmanager.crd.yaml
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheus.crd.yaml
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheusrule.crd.yaml
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/servicemonitor.crd.yaml
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/podmonitor.crd.yaml
执行脚本
## https://github.com/helm/helm/issues/6130
kubectl apply -f ./prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f ./prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f ./prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f ./prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f ./prometheus-operator-crd/podmonitor.crd.yaml
helm install --name po --namespace monitoring -f values.yaml stable/prometheus-operator --version 8.3.3 --set prometheusOperator.createCustomResource=false
3. TooManyPods 规则报错
- 删除现有 helm prometheus-operator release
- 清理遗留的 prometheus-operator endpoint
kubectl get endpoints -n kube-system -l k8s-app=kubelet
kubectl delete ep -n kube-system dapper-bird-prometheus-ope-kubelet
kubectl delete ep -n kube-system prometheus-operator-kubelet
kubectl delete ep -n kube-system prometheus-prometheus-oper-kubelet
4. kubernetes.default.svc endpoint ip 172.31.x.x does not respond
无脑拷贝
解决方案
kubeApiServer:
relabelings:
- sourceLabels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
action: keep
regex: default;kubernetes;https
- targetLabel: __address__
replacement: kubernetes.default.svc:443
5. kubelet Unhealthy
无脑设置
解决方案
https://github.com/coreos/prometheus-operator/issues/926
https: false