一文学会Prometheus的AlertManager

2021-06-30  本文已影响0人  sknfie

概述

Prometheus本身不支持告警功能,主要通过插件alertmanage来实现告警。AlertManager用于接收Prometheus发送的告警并对于告警进行一系列的处理后发送给指定的用户。
Prometheus监控系统的的报警规则是在Prometheus这个组件完成配置的。
prometheus支持2种类型的规则:

一、安装

1)安装Alertmanager

[root@localhost opt]# ll alertmanager-0.20.0.linux-amd64.tar.gz 
-rw-r--r--. 1 root root 23928771 May 21 20:02 alertmanager-0.20.0.linux-amd64.tar.gz
[root@localhost opt]# tar -zxvf alertmanager-0.20.0.linux-amd64.tar.gz
[root@localhost opt]# cp -r alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager

2)添加alertmanager为系统服务开机启动

[root@localhost ~]# vi /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=Prometheus Alertmanager Service daemon
After=network.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/alertmanager/alertmanager \
    --config.file=/usr/local/alertmanager/alertmanager.yml \
    --storage.path=/usr/local/alertmanager/data/ \
    --data.retention=120h \
    --web.external-url=http://192.168.1.10:9093
    --web.listen-address=:9093
Restart=on-failure

[Install]
WantedBy=multi-user.target

# alertmanager选项说明
# ExecStart=/usr/local/alertmanager/alertmanager  启动运行alertmanager程序所在的路径
# --config.file=/usr/local/alertmanager/alertmanager.yml  指定alertmanager配置文件路径
# --storage.path=/usr/local/alertmanager/data/  数据存储路径
# --data.retention=120h  历史数据最大保留时间,默认120小时
# --web.external-url  生成返回alertmanager的相对和绝对链接地址,可以在后续告警通知信息中直接点击链接地址访问alertmanager web ui。其格式为http://{ip或者域名}:9093
# --web.listen-address  监听web接口和API的地址端口

[root@localhost ~]# systemctl daemon-reload
[root@localhost ~]# systemctl restart alertmanager.service
[root@localhost ~]# systemctl status alertmanager.service

3)web访问测试

浏览器访问示例地址:http://192.168.2.136:9093/#/status

docker方式安装

1)下载alertmanager镜像

[root@localhost ~]# docker pull prom/alertmanager

2)检查是否下载成功

[root@localhost ~]# docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
docker.io/prom/alertmanager   latest              0881eb8f169f        5 months ago        52.1 MB

3)运行alertmanager镜像

[root@localhost ~]# docker run -d -p 9093:9093 -v /usr/local/alertmanager/simple.yml:/etc/alertmanager/config.yml --name alertmanager prom/alertmanager
[root@localhost ~]# docker ps 
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
121610a9f7ee        prom/alertmanager   "/bin/alertmanager..."   17 seconds ago      Up 16 seconds       0.0.0.0:9093->9093/tcp   alertmanager

告警配置及监控

1.配置

打开prometheus.yml配置文件,去掉注释,修改如下:

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 192.168.2.136:9093
......
  - job_name: 'AlertManager'
    static_configs:
    - targets: ['localhost:9090']

2.监控

node_exporter是否是up的,不是的话会发告警

rule_files:
  - "/usr/local/prometheus/rules/*_rules.yml"

创建规则:

[root@centos7_9-mod prometheus]#mkdir rules
[root@centos7_9-mod prometheus]#cd rules
[root@centos7_9-mod rules]# vi node_rules.yml
groups:
- name: test
  rules:
  - alert: prometheus
    expr: up{job="node_exporter"} == 0
    for: 3m
    labels:
      serverity: critical
    annotations:
      summary: "node down"
      description: "Node has been down for more than 3 minutes."

校验及重启:

[root@centos7_9-mod prometheus]# ./promtool check rules rules/node_rules.yml
Checking rules/node_rules.yml
  SUCCESS: 1 rules found

systemctl restart prometheus.service

模拟停止node_exporter:

systemctl stop node_exporter.service
告警页面
告警查询
3m后变红

3.配置模板

vi rules/node_rules.yml
- name: test
  rules:
  - alert: prometheus
    expr: up{job="node_exporter"} == 0
    for: 3m
    labels:
      serverity: critical
    annotations:
      summary: "{{ $labels.instance }} down.up=={{ $value }}"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 3 minutes."

[root@centos7_9-mod prometheus]# systemctl restart node_exporter.service 
[root@centos7_9-mod prometheus]# systemctl restart prometheus.service 
正常

重新停止:

[root@centos7_9-mod prometheus]# systemctl stop node_exporter.service
有图有真相

email报警

1)修改alertmanager默认配置文件

[root@localhost alertmanager]# cat alertmanager.yml 
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.sknfie.com:465'  # 邮箱SMTP服务器代理地址
  smtp_from: 'sknfie@163.com'    # 发送邮件的名称
  smtp_auth_username: 'sknfie@163.com'  # 邮箱用户名称
  smtp_auth_password: 'rkmdpoviehcvddde'   # 邮箱授权密码
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: 'sknfie@163.com'
    headers: { Subject: " WARNING- -告警邮件" }
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

2)检查配置文件 并 重启服务

[root@localhost alertmanager]# ./amtool check-config alertmanager.yml 
Checking 'alertmanager.yml'  SUCCESS
[root@localhost alertmanager]# systemctl restart alertmanager

3)配置prometheus配置文件

[root@localhost prometheus]# cat prometheus.yml 
global:
  scrape_interval:     15s
  evaluation_interval: 15s 
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 192.168.2.136:9093
rule_files:
   - "/usr/local/prometheus/rules/*.yml"
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
    - targets: ['192.168.1.6:9100']

  - job_name: 'Alertmanager'
    static_configs:
    - targets: ['192.168.1.10:9093']

4)配置告警规则文件

[root@localhost prometheus]# cat rules/up_rules.yml 
groups:
- name: UP
  rules:
  - alert: node
    expr: up{job="node"} == 0
    for: 1m
    labels:
      severity: crirical
    annotations:
      description: " {{ $labels.instance }} of job of {{ $labels.job }} has been down for more than 5 minutes."
      summary: "{{ $labels.instance }} down,up=={{ $value }}"

5)重启prometheus服务

[root@localhost prometheus]# systemctl restart prometheus

6)测试

停止node_exporter

[root@localhost ~]# systemctl stop node_exporter
上一篇 下一篇

猜你喜欢

热点阅读