Prometheus的告警配置

2020-07-14  本文已影响0人  skiler

一、 prometheus告警机制

告警机制是由Alertmanager与Prometheus两个组件共同实现的。Prometheus应用根据报警规则计算并触发报警发送给Alertmanager。Alertmanager对告警进行处理,包括屏蔽(silencing)、抑制(inhibition)、聚合(aggregation),通过电子邮件等方式发送告警。

二、配置和触发告警分三步

1. 在Prometheus配置告警规则

ALERT <alert name> # 告警规则名
  IF <expression>  # promQL表达式
  [ FOR <duration> ]  # 阈值持续时间
  [ LABELS <label set> ] # 可以添加标签
  [ ANNOTATIONS <label set> ] # 备注信息,可以使用模板

文件样例

groups:
    - name: rules-group
      rules:
      - alert: NodeMemoryHigh
        expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 50
        for: 1m
        labels:
          team: node
        annotations:
          summary: "{{ $labels.instance }}:High Memory Usage detected"
          description: "{{ $labels.instance }}: Memory usage us avive 50% (current value is :: {{ $value }})"

可以使用工具检查自己写的rule文件是否异常,基于go语言命令行工具
go get github.com/prometheus/prometheus/cmd/promtool
promtool check rules /path/to/example.rules.yml

2. 启动和配置Alertmanager

Alert的配置文件详细可以参考官方文档

global:
  resolve_timeout: 5m
  smtp_from: '123456789@qq.com'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '123456789@qq.com'
  smtp_auth_password: 'odggumkcwskybfgb'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '123456789@qq.com'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

3. 配置Prometheus和Alertmanager通信

上一篇 下一篇

猜你喜欢

热点阅读