springboot整合prometheus和grafana
准备springBoot服务
pom引入依赖
<!-- 集成micrometer,将监控数据存储到prometheus -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
application.properties配置
spring.application.name=mall-tiny-grafana
management.endpoints.web.exposure.include=prometheus
management.metrics.tags.application=${spring.application.name}
将服务打包成镜像,上传到本地仓库
不会的参考这个:https://www.cnblogs.com/liufei96/p/16727629.html
使用docker-compose安装prometheus 和 grafana
准备docker-compose.yml 文件
version: '3' # https://blog.51cto.com/msiyuetian/2369130
networks:
monitor:
driver: bridge
my_mysql_default:
external: true
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /mydata/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- /mydata/prometheus/node_down.yml:/etc/prometheus/node_down.yml
ports:
- "9090:9090"
networks:
- monitor
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /mydata/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
networks:
- monitor
grafana:
image: grafana/grafana:latest
container_name: grafana
hostname: grafana
restart: always
ports:
- "3000:3000"
networks:
- monitor
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
networks:
- monitor
cadvisor:
image: google/cadvisor:latest
container_name: cadvisor
hostname: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8888:8080"
networks:
- monitor
mall-tiny-grafana:
image: 192.168.245.132:5000/mall-tiny/mall-tiny-grafana:1.0
container_name: mall-tiny-grafana
hostname: mall-tiny-grafana
restart: always
external_links:
- mysql:db #可以用db这个域名访问mysql服务
environment:
- spring.profiles.active=qa
- TZ="Asia/Shanghai"
volumes:
- /etc/localtime:/etc/localtime
- /mydata/app/mall-tiny-grafana/logs:/var/logs
ports:
- "8088:8088"
networks:
- monitor
- my_mysql_default
注意:
因为我的mall-tiny-grafana 服务里面用到了mysql。而且我的mysql已经安装好了。所以这里要使用外部的mysql,就加了 external_links 配置
而且我的mysql网络是 my_mysql_default 跟 mall-tiny-grafana 服务的网络不在同一个网络里面,所以就加了
networks:
my_mysql_default:
external: true
并且 mall-tiny-grafana 服务的 networks 也加上了 my_mysql_default
prometheus.yml 文件内容
global:
scrape_interval: 5s
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'mall-tiny-grafana'
metrics_path: '/mall/actuator/prometheus' # 采集数据路径。根据自己的路径配置
static_configs: #采集服务的地址
- targets: ['mall-tiny-grafana:8088']
node_down.yml 文件内容
groups:
- name: node_down
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: test
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
alertmanager.yml 文件内容
global:
smtp_smarthost: 'smtp.163.com:25' #163服务器
smtp_from: '18855993840@163.com' #发邮件的邮箱
smtp_auth_username: '18855993840@163.com' #发邮件的邮箱用户名,也就是你的邮箱
smtp_auth_password: 'xxxx' #发邮件的邮箱密码
smtp_require_tls: false #不进行tls验证
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: live-monitoring
receivers:
- name: 'live-monitoring'
email_configs:
- to: '1583409404@qq.com' #收邮件的邮箱
使用 docker-compose up -d
运行
运行完之后,查看docker容器
[root@localhost prometheus]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
14e0ced0ee83 192.168.245.132:5000/mall-tiny/mall-tiny-grafana:1.0 "java -jar /mall-tin…" 55 minutes ago Up 55 minutes 0.0.0.0:8088->8088/tcp, :::8088->8088/tcp mall-tiny-grafana
fd06fd425a63 grafana/grafana:latest "/run.sh" 18 hours ago Up About an hour 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp grafana
5c8d6fc3de5b prom/alertmanager:latest "/bin/alertmanager -…" 18 hours ago Up 7 hours 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp alertmanager
ca07f5ef64ae prom/prometheus:latest "/bin/prometheus --c…" 18 hours ago Up 22 minutes 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp prometheus
94ab1295f69e google/cadvisor:latest "/usr/bin/cadvisor -…" 18 hours ago Up 7 hours 0.0.0.0:8888->8080/tcp, :::8888->8080/tcp cadvisor
6bbed35c17b6 prom/node-exporter:latest "/bin/node_exporter" 18 hours ago Up 7 hours 0.0.0.0:9100->9100/tcp, :::9100->9100/tcp
访问prometheus。看到下面内容,就说明配置成功了
http://192.168.245.132:9090/targets
配置grafana
访问 http://192.168.245.132:3000/ 账号和密码 admin/admin
配置prometheus
1999224-20221006170905453-2007225172.png
配置好之后,点击 Save & test 按钮,测试下。
去grafana官方下载模板json数据
https://grafana.com/grafana/dashboards/?search=JVM
因为我这里是监控springBoot服务的JVM指标,所以我下载的是JVM相关的模板。自己根据需求下载对应的模板
配置好之后,就可以看到监控数据了。
1999224-20221006170905453-2007225172.png
安装过程遇到的错误
一开始我的prometheus.yml 配置文件钟加了 scrape_timeout: 10s 这个参数。但是prometheus启动一直报错
global:
scrape_interval: 5s
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'mall-tiny-grafana'
scrape_interval: 5s
scrape_timeout: 10s
metrics_path: '/mall/actuator/prometheus' # 采集数据路径
static_configs: #采集服务的地址
- targets: ['mall-tiny-grafana:8088']
下面是错误内容
ts=2022-10-06T08:16:44.584Z caller=main.go:437 level=error msg="Error loading config (--config.file=/etc/prometheus/prometheus.yml)" err="parsing YAML file /etc/prometheus/prometheus.yml: scra pe timeout greater than scrape interval for scrape config with job name \"mall-tiny-grafana\""
从报错内容上,说是解析yml失败。一开始我是怀疑我的文件格式不对,不是yml格式的。找了个yml在线校验工具,发现没有错
然后就改变了这个配置的位置
global:
scrape_interval: 5s
scrape_timeout: 10s
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'mall-tiny-grafana'
metrics_path: '/mall/actuator/prometheus' # 采集数据路径
static_configs: #采集服务的地址
- targets: ['mall-tiny-grafana:8088']
就报了下面的错误
ts=2022-10-06T08:44:29.945Z caller=main.go:437 level=error msg="Error loading config (--config.file=/etc/prometheus/prometheus.yml)" err="parsing YAML file /etc/prometheus/prometheus.yml: glob al scrape timeout greater than scrape interval"
从错误上看是说 scrape_timeout的值要大于 scrape_interval
最后调整了下值 scrape_interval: 15s , scrape_timeout: 10s
再重新启动下prometheus ,发现启动正常了
仓库代码
https://github.com/liufei96/mall-learning/tree/main/mall-tiny-grafana