云原生架构师-可观测性

【Alerting】【AlertManager】从入门到精通

2018-12-25  本文已影响0人  炼狱腾蛇Eric

1.简介:

2.链接:

2.1. 参考文献

3. 架构图:

image.png

4. 部署:

4.1. rpm安装:(4.1和4.2二选一)

[prometheus]
name=prometheus
baseurl=https://packagecloud.io/prometheus-rpm/release/el/7/$basearch
repo_gpgcheck=1
enabled=1
gpgkey=https://packagecloud.io/prometheus-rpm/release/gpgkey
       https://raw.githubusercontent.com/lest/prometheus-rpm/master/RPM-GPG-KEY-prometheus-rpm
gpgcheck=1
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300
/etc/default/alertmanager # systemd的环境变量
/etc/prometheus/alertmanager.yml # Alertmanager的主配置文件
/usr/bin/alertmanager # Alertmanager的启动文件
/usr/bin/amtool # 查看报警的工具程序
/usr/lib/systemd/system/alertmanager.service # systemd的入口程序
/var/lib/prometheus # 库文件

4.2. 二进制包安装:(4.1和4.2二选一)

~]# ll /opt/alertmanager-0.15.3.linux-amd64/
total 31200
-rwxr-xr-x 1 3434 3434 19998160 Nov  9 16:41 alertmanager # Alertmanager的启动文件
-rw-r--r-- 1 3434 3434      380 Nov  9 17:00 alertmanager.yml # Alertmanager的主配置文件
-rwxr-xr-x 1 3434 3434 11923635 Nov  9 16:41 amtool # 查看报警的工具程序
-rw-r--r-- 1 3434 3434    11357 Nov  9 17:00 LICENSE
-rw-r--r-- 1 3434 3434      457 Nov  9 17:00 NOTICE

5. 配置文件

5.1. /usr/lib/systemd/system/alertmanager.service

# -*- mode: conf -*-
[Unit]
Description=Prometheus Alertmanager.
Documentation=https://github.com/prometheus/alertmanager
After=network.target
[Service]
EnvironmentFile=-/etc/default/alertmanager
User=prometheus
ExecStart=/usr/bin/alertmanager \
          --config.file=/etc/prometheus/alertmanager.yml \
          --storage.path=/var/lib/prometheus/alertmanager \
          $ALERTMANAGER_OPTS
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
[Install]
WantedBy=multi-user.target

5.2. /etc/default/alertmanager

\color{#FF0000}{注意:}--web.external-url\color{#FF0000}{选项一定要加,prometheus发送告警邮件的时候回引用这个地址,如果不加,默认是机器名}

ALERTMANAGER_OPTS='\ 
--web.external-url=http://10.41.91.91:9093 \ # 被外部访问的地址,10.41.91.91是本机地址,其他服务器的配置请记得修改这个
--cluster.listen-address=10.41.91.91:9094 \ # 本机被集群监听的地址
--cluster.peer=10.41.91.91:9094 \ # 本机监听其他集群的地址
--cluster.peer=10.210.149.26:9094 \
--cluster.peer=10.210.149.27:9094'

5.3. /etc/prometheus/alertmanager.yml

global: # 全局配置
  resolve_timeout: 5m # 解决报警时间间隔
route: # 分发的规则
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers: # 接受者,可以是邮箱,wechat或者web接口等等
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'
inhibit_rules: # 抑制的规则
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

6. 管理工具

6.1. amtool

amtool alert --alertmanager.url=http://localhost:9093
Alertname    Starts At                Summary
RootfsUsage  2019-01-10 07:13:32 CET  Not enough space for root fs on 10.210.54.227:9100
RootfsUsage  2019-01-11 14:36:17 CET  Not enough space for root fs on 10.210.54.226:9100
MemoryUsage  2019-01-17 00:44:17 CET  Memory of instance 150.132.195.26:9100 is not enough

6.2. web UI

http://你的服务器IP:9093


image.png
上一篇下一篇

猜你喜欢

热点阅读