alertmanager告警企业微信使用docker搭建部署

2021-06-09 本文已影响0人 Iris_Yzy

一、alertmanager

1.1 创建alertmanager配置文件

vim /root/alertmanager/config.yml

global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: true
  smtp_hello: localhost
  smtp_require_tls: true
  pagerduty_url:'https://events.pagerduty.com/v2/enqueue'
  opsgenie_api_url: 'https://api.opsgenie.com/'
  wechat_api_url: '[https://qyapi.weixin.qq.com/cgi-bin/ '
  wechat_api_corp_id: wxe11111111111ca #企业id
  victorops_api_url: 'https://alert.victorops.com/integrations/generic/20131114/alert/'
route:
  receiver: zhangsan // 对应下面receivers中的name
  group_by:
  - groupLabel //分类字段，可自定义，对应告警rules中的字段
  continue: false
  group_wait: 30s 
  group_interval: 3m
  repeat_interval: 3m
receivers:
- name: zhangsan
  wechat_configs:
  - send_resolved: true
    http_config:
      follow_redirects: true
    api_secret: <secret> // 申请企业微信应用后生成的密码
    corp_id: wxe11111111111ca
    message: '{{ template "wechat.default.message" . }}'
    api_url: https://qyapi.weixin.qq.com/cgi-bin/
    to_user: zhangsan //发送到某一用户也可以 @all 就是群组全员发送
    to_party: '{{ template "wechat.default.to_party" . }}'
    to_tag: '{{ template "wechat.default.to_tag" . }}'
    agent_id: "1000296" //申请企业微信应用id
    message_type: text
templates:
- /apps/srv/alertmanager/templates/*.tmpl //告警模板路径

1.2 告警模板

示例测试模板，可根据需求自定义

{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
@警报 【{{ len .Alerts.Firing }}】
{{ range .Alerts }}
<pre>
信息: {{ .Annotations.summary }}
详情: {{ .Annotations.description }}
时间: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
</pre>
{{ end }}
{{ end -}}
{{ end }}

2 构建alermanager

$ docker search alertmanager
$ docker pull docker.io/prom/alertmanager:latest
$ docker run -d -p 9093:9093 -v /root/alertmanager/config.yml:/etc/alertmanager/config.yml docker.io/prom/alertmanager:latest --config.file=/etc/alertmanager/config.yml

容器成功起起来以后访问 ip:port/#/alerts 可以看见下图 alertmanger就是成功搭建好了

alertmananger.png

二、prometheus配置修改

1.1 修改prometheus配置文件

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 172.21.135.17:9093
 
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/*.yml"
 
 
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'alertmanager'    #指定监控任务alertmanager
    static_configs:
    - targets:
       - ip:9093 #alertmanager所在机器的ip:port

1.2 增加rules告警规则配置

test_rules.yml

groups:
  - name: passportHttpCode
    rules:
    - alert: XX服务http响应码
      expr: code{service="1"} != 200
      for: 1m
      labels:
        type: httpCode
        object: 总体
        title: XX服务http响应码
        groupLabel: passportHttpCode
      annotations:
        summary: XX服务http响应码异常
        description: "当前异常响应码数量为 {{ printf \"%.2f\" $value }}\n趋势: http://XXX"

重启prometheus使配置生效，访问 ip:9090/graph 如下图可以看到配置的规则生效了会展示在rules目录里面

告警rules.png