cnsre运维博客

kube-eventer事件监控

2021-10-09  本文已影响0人  SRE运维博客

文章链接

下载deployment

我这里保存成kube-event.yaml

# cat kube-event.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: kube-eventer
  name: kube-eventer
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-eventer
  template:
    metadata:
      labels:
        app: kube-eventer
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccount: kube-eventer
      containers:
        - image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.2.0-484d9cd-aliyun
          name: kube-eventer
          command:
            - "/kube-eventer"
            - "--source=kubernetes:https://kubernetes.default"
            ## .e.g,dingtalk sink demo
            #- --sink=dingtalk:[your_webhook_url]&label=[your_cluster_id]&level=[Normal or Warning(default)]
            - --sink=dingtalk:https://oapi.dingtalk.com/robot/send?access_token=355cf0156xxxxxxxxxxxxxxxxxx&level=Warning
          env:
          # If TZ is assigned, set the TZ value as the time zone
          - name: TZ
            value: "Asia/Shanghai"
          volumeMounts:
            - name: localtime
              mountPath: /etc/localtime
              readOnly: true
            - name: zoneinfo
              mountPath: /usr/share/zoneinfo
              readOnly: true
          resources:
            requests:
              cpu: 100m
              memory: 100Mi
            limits:
              cpu: 500m
              memory: 250Mi
      volumes:
        - name: localtime
          hostPath:
            path: /etc/localtime
        - name: zoneinfo
          hostPath:
            path: /usr/share/zoneinfo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-eventer
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - events
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-eventer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-eventer
subjects:
  - kind: ServiceAccount
    name: kube-eventer
    namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-eventer
  namespace: kube-system

钉钉群里创建自定义webhook

设置--智能群助手--添加机器人--选择WeebHook。定义机器人名称和安全设置

安全设置这里我定义了关键字,Waring。创建后复制webhook地址。然后更改上面deployment中的sink处。

我把上面的label删掉了,只留下了level=Waring,刚好对应了我关键字的Waring。只有带有关键字的才会触发告警。

测试告警

然后创建一个测试的Tomcat的deployment,故意把image镜像的tag写错,让他无法拉取镜像

[root@master allenjol]# kubectl apply -f deploy-tomcat-test.yaml
deployment.apps/tomcat-deployment-allenjol created
service/tomcat-service-allenjol created

[root@master allenjol]# kubectl get po
NAME                                        READY   STATUS             RESTARTS   AGE
tomcat-deployment-allenjol-b6687f99-l5vj9   0/1     ImagePullBackOff   0          45s

部署kube-event.yaml并查看日志。可以看到隔30s去检测一次

]# kubectl apply -f kube-event.yaml
]# kubectl get po -n kube-system | grep kube-event


[root@master allenjol]# kubectl logs -f kube-eventer-648f64c985-zfkkg -n kube-system
I0708 09:26:36.409034       1 eventer.go:67] /kube-eventer --source=kubernetes:https://kubernetes.default --sink=dingtalk:https://oapi.dingtalk.com/robot/send?access_token=355cf01569aef206dc6c05681aaf3ed0ea19ed3597db4c26c565dbeb69ce1303&level=Warning
I0708 09:26:36.409191       1 eventer.go:68] kube-eventer version: v1.2.0 commit: 484d9cd
I0708 09:26:36.411557       1 eventer.go:94] Starting with DingTalkSink sink
I0708 09:26:36.411596       1 eventer.go:108] Starting eventer
I0708 09:26:36.411678       1 eventer.go:116] Starting eventer http service
I0708 09:27:00.000163       1 manager.go:102] Exporting 5 events
I0708 09:27:30.000130       1 manager.go:102] Exporting 9 events
I0708 09:28:00.000147       1 manager.go:102] Exporting 1 events
I0708 09:28:30.000150       1 manager.go:102] Exporting 4 events
I0708 09:29:00.000138       1 manager.go:102] Exporting 1 events
...

可以看到这里已经看到了钉钉的webhook地址了,并且还收集到了events。
查看钉钉群,就会看到已经出现了告警了。


cnsre运维博客|Linux系统运维|自动化运维|云计算|运维监控

其实这个告警当前还存在点问题。个人认为不应该这么频繁发送,应该像prometheus一样可以配置抑制和静默。然后监控时间可以更改。当然熟悉go语言可以自己改源码然后构建成镜像。
文章链接

上一篇下一篇

猜你喜欢

热点阅读