2020-05-29 构建Docker容器监控系统
1. 为什么要监控
- 对系统不间断实时监控
- 实时反馈系统当前状态
- 保证业务持续性运行
2. 要监控什么
监控目标3. Prometheus 概述
Prometheus(普罗米修斯)是一个最初在SoundCloud上构建的监控系统。自2012年成为社区开源项目,拥有非常活跃的开发人员和用户社区。为强调开源及独立维护,Prometheus于2016年加入云原生云计算基金会(CNCF),成为继Kubernetes之后的第二个托管项目。
https://prometheus.io
https://github.com/prometheus
Prometheus 特点:
• 多维数据模型:由度量名称和键值对标识的时间序列数据
• PromSQL:一种灵活的查询语言,可以利用多维数据完成复杂的查询
• 不依赖分布式存储,单个服务器节点可直接工作
• 基于HTTP的pull方式采集时间序列数据
• 推送时间序列数据通过PushGateway组件支持
• 通过服务发现或静态配置发现目标
• 多种图形模式及仪表盘支持(grafana)
• Prometheus Server:收集指标和存储时间序列数据,并提供查询接口
• ClientLibrary:客户端库
• Push Gateway:短期存储指标数据。主要用于临时性的任务
• Exporters:采集已有的第三方服务监控指标并暴露metrics
• Alertmanager:告警
• Web UI:简单的Web控制台
实例:可以抓取的目标称为实例(Instances)
作业:具有相同目标的实例集合称为作业(Job)
4. Prometheus 部署
二进制部署:https://prometheus.io/docs/prometheus/latest/getting_started/
Docker部署:https://prometheus.io/docs/prometheus/latest/installation/
访问Web:http://localhost:9090
配置Prometheus监控本身:
scrape_configs:
-job_name: 'prometheus'
scrape_interval: 5s
static_configs:
-targets: ['localhost:9090']
示例:使用容器部署Prometheus
[root@docker ~]# docker pull prom/prometheus ---拉取prometheus镜像
[root@docker ~]# cd /opt
[root@docker opt]# docker run -p 9090:9090 prom/prometheus ---前台启动容器,找到prometheus.yml的位置
level=info ts=2020-05-24T04:04:41.585Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-05-24T04:04:41.585Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=HEAD, revision=ecee9c8abfd118f139014cb1b174b08db3f342cf)"
level=info ts=2020-05-24T04:04:41.585Z caller=main.go:338 build_context="(go=go1.14.2, user=root@2117a9e64a7e, date=20200507-16:51:47)"
level=info ts=2020-05-24T04:04:41.585Z caller=main.go:339 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 6684d2eca5b7 (none))"
level=info ts=2020-05-24T04:04:41.585Z caller=main.go:340 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-05-24T04:04:41.585Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-05-24T04:04:41.586Z caller=main.go:678 msg="Starting TSDB ..."
level=info ts=2020-05-24T04:04:41.591Z caller=head.go:575 component=tsdb msg="Replaying WAL, this may take awhile"
level=info ts=2020-05-24T04:04:41.592Z caller=web.go:523 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-05-24T04:04:41.597Z caller=head.go:624 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-05-24T04:04:41.597Z caller=head.go:627 component=tsdb msg="WAL replay completed" duration=5.550783ms
level=info ts=2020-05-24T04:04:41.598Z caller=main.go:694 fs_type=XFS_SUPER_MAGIC
level=info ts=2020-05-24T04:04:41.598Z caller=main.go:695 msg="TSDB started"
level=info ts=2020-05-24T04:04:41.598Z caller=main.go:799 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-05-24T04:04:41.606Z caller=main.go:827 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-05-24T04:04:41.606Z caller=main.go:646 msg="Server is ready to receive web requests."
[root@docker opt]# docker run -d -p 9090:9090 prom/prometheus ---退出前台运行,改为后台运行容器
2650114c3999e420827c321ff9b67a491c707b324bad56f481b31fe5e90e56a8
[root@docker opt]# docker cp 2650:/etc/prometheus/prometheus.yml ./ ---把容器的配置文件拷贝到当前目录
[root@docker opt]# cat prometheus.yml
# my global config
global: ---全局配置
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting: ---告警配置
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files: ---角色配置
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: ---采集的目标
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
[root@docker opt]# docker rm -f 2650 ---复制完就可以把容器删掉
2650
[root@docker opt]# docker run -d \ ---后台启动容器
> -p 9090:9090 \
> -v /opt/prometheus.yml:/etc/prometheus/prometheus.yml \
> prom/prometheus
4efdfce65a02d291e230d2808d254a5f9c1676e41f2542dbb6bc9801185e87bb
prometheus.yml的全局配置
Prometheus主界面
cAdvisor(Container Advisor)用于收集正在运行的容器资源使用和性能信息。(采集当前主机的信息,每个需要监控的主机都需要安装一个)
Grafana 是一个开源的度量分析和可视化系统。
https://github.com/google/cadvisor
https://grafana.com/grafana/download
https://grafana.com/dashboards/193
安装cAdvisor
[root@docker ~]# sudo docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
--privileged \
google/cadvisor:latest
cAdvisor界面
cAdvisor界面
metrics接口获取容器相关性能指标数据
安装Grafana
[root@docker ~]# docker run -d --name=grafana -p 3000:3000 grafana/grafana
登录Grafana
默认帐号密码均为admin。
输入新密码Grafana首页
添加数据源
选中Prometheus
配置Prometheus的接口
保存
配置成功
如果配置错误会提示有问题
配置有问题的话检查容器间的连通性。
配置监控:
修改配置文件,把容器纳入监控:
[root@aliyun opt]# vi prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
-------------------------------------------------添加以下行-------------------------------------------------
- job_name: 'docker'
static_configs:
- targets: ['47.107.112.134:8080'] ---目标是cAdvisor采集到的数据
重启prometheus
[root@aliyun opt]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
09443f69b057 grafana/grafana "/run.sh" 2 days ago Up 2 days 0.0.0.0:3000->3000/tcp grafana
bf61d366dceb google/cadvisor:latest "/usr/bin/cadvisor -…" 2 days ago Up 2 days 0.0.0.0:8080->8080/tcp cadvisor
ba3b58b59d15 prom/prometheus "/bin/prometheus --c…" 2 days ago Up 2 days 0.0.0.0:9090->9090/tcp suspicious_bartik
517d13c3f563 prom/prometheus "/bin/prometheus --c…" 2 days ago Exited (0) 2 days ago sad_elbakyan
[root@aliyun opt]# docker restart ba3b58b59d15
ba3b58b59d15
在Prometheus输入container就有很多监控指标弹出来了。
image.png随便选择一个指标进行监控。
image.pngGrafana添加仪表盘。
Grafana添加仪表盘 新增按需求设置仪表盘内容。
配置仪表盘内容 设置完成也可以去官网https://grafana.com下载模板导入。
image.png一般选择下载量多的。
image.png记住这ID。
image.png导入模板。
导入模板加载模板。
加载模板选择刚添加的数据源。
选择数据源监控成功。
image.png再导入一个带图形界面的。
image.png image.png image.png导入成功。
image.png