在 AKS 上部署 prometheus-operator

2019-12-31  本文已影响0人  leeehao

在 Azure kubernetes service 上部署 prometheus-operator

问题

1. 监控被托管的组件

类似 AKS EKS GKE 等托管平台,ETCD 相关组件请按照平台文档选择性开启

2. helm 部署显示 Error: release xxxx failed: context canceled

手动部署 CRD 后执行 helm install

wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/alertmanager.crd.yaml
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheus.crd.yaml
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/prometheusrule.crd.yaml
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/servicemonitor.crd.yaml
wget https://raw.githubusercontent.com/coreos/prometheus-operator/v0.34.0/example/prometheus-operator-crd/podmonitor.crd.yaml

执行脚本

## https://github.com/helm/helm/issues/6130
kubectl apply -f ./prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f ./prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f ./prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f ./prometheus-operator-crd/servicemonitor.crd.yaml
kubectl apply -f ./prometheus-operator-crd/podmonitor.crd.yaml

helm install --name po --namespace monitoring -f values.yaml stable/prometheus-operator --version 8.3.3 --set prometheusOperator.createCustomResource=false

3. TooManyPods 规则报错

  1. 删除现有 helm prometheus-operator release
  2. 清理遗留的 prometheus-operator endpoint
kubectl get endpoints -n kube-system -l k8s-app=kubelet
kubectl delete ep -n kube-system dapper-bird-prometheus-ope-kubelet
kubectl delete ep -n kube-system prometheus-operator-kubelet
kubectl delete ep -n kube-system prometheus-prometheus-oper-kubelet

4. kubernetes.default.svc endpoint ip 172.31.x.x does not respond

无脑拷贝
解决方案

kubeApiServer:
  relabelings:
   - sourceLabels:
     - __meta_kubernetes_namespace
     - __meta_kubernetes_service_name
     - __meta_kubernetes_endpoint_port_name
     action: keep
     regex: default;kubernetes;https
   - targetLabel: __address__
     replacement: kubernetes.default.svc:443

5. kubelet Unhealthy

无脑设置
解决方案
https://github.com/coreos/prometheus-operator/issues/926

https: false
上一篇下一篇

猜你喜欢

热点阅读