基于k8s搭建企业级prometheus监控系统(1)
2021-04-14 本文已影响0人
小王同学123321
1,简单来说就是引入prometheus的联邦架构,基础架构如下图:
基础架构图
prometheus存在四个的原因:全部门总共2000+台机器,如果过少,会出现性能问题,在grafana中出现断图
prometheus_master存在两个的原因:1,出于性能方面考虑 2,全双工提高稳定性
2,官网选择适合自己所运行平台的prometheus版本
官网链接
我选择的版本是:
我是在linux上运行,下载好之后直接解压。
运行即可:
/data1/prometheus/prometheus --config.file=/data1/prometheus/prometheus.yml --web.listen-address=:8004 --web.max-connections=100000 --storage.tsdb.path=/data1/prometheus/data --storage.tsdb.retention.time=30d
#--config.file=/data1/prometheus/prometheus.yml Prometheus configuration file path
#--web.listen-address=:8004 Address to listen on for UI, API, and telemetry
#--web.max-connections=100000 Maximum number of simultaneous connections
#--storage.tsdb.path=/data1/prometheus/data Base path for metrics storage
#--storage.tsdb.retention.time=30d How long to retain samples in storage. When this flag is set it overrides "storage.tsdb.retention". If neither this flag nor "storage.tsdb.retention" nor "storage.tsdb.retention.size" is set, the retention time defaults to 15d. Units Supported: y, w, d, h, m, s, ms
3,边缘节点配置文件介绍:
程序目录结构
告警是在prometheus_master节点做的,所以下面配置文件中Alertmanager configuration做了注释
# my global config
global:
scrape_interval: 10s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 60s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
#alerting:
# alertmanagers:
# - static_configs:
# - targets: ['10.182.12.179:9093']
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
#rule_files:
# - "rules/*.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
#- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
#static_configs:
#- targets: ['localhost:9090']
#采集node exporter监控数据
- job_name: mongo_exporter
file_sd_configs:
- files:
- nodes/mongo-*.json
refresh_interval: 1m
- job_name: memcache_exporter
file_sd_configs:
- files:
- nodes/memcache-*.json
refresh_interval: 1m
- job_name: dproxy_mobile_dbl
file_sd_configs:
- refresh_interval: 1m
files:
- nodes/dproxy_mobile_dbl.json