一文学会Prometheus的Exporter
概述
广义上讲,向prometheus提供监控数据的程序都可以成为一个exporter的,一个exporter的实例称为target。exporter来源主要2个方面,一个是社区提供的,一种是用户自定义的。
一、Linux主机监控node_exporter
在主机上面安装了node_exporter程序,该程序对外暴露一个用于获取当前监控样本数据的http的访问地址, 这个的一个程序成为exporter。Exporter的实例称为一个target, prometheus通过轮询的方式定时从这些target中获取监控数据。
1.上传软件包
[root@localhost ~]# ll /opt/node_exporter-0.18.1.linux-amd64.tar.gz
-rw-r--r--. 1 root root 8083296 May 12 20:21 /opt/node_exporter-0.18.1.linux-amd64.tar.gz
2.解压缩
[root@localhost opt]# tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz
[root@localhost opt]# cp -rf node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter
3.启动
[root@localhost opt]# cd /usr/local/node_exporter/
[root@localhost node_exporter]# ./node_exporter
4.写一个启动脚本
[root@localhost ~]# cat /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network-online.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
systemctl命令:
systemctl daemon-reload
systemctl stop node_exporter
systemctl start node_exporter
systemctl restart node_exporter
5.常用指标查询
CPU采集:
avg without(cpu, mode) (rate(node_cpu_seconds_total{mode="idle"}[1m]))
cpu使用率查询
内存采集:
node_memory_MemAvailable
内存使用率查询
[root@centos7_9-mod prometheus]# free -m
total used free shared buff/cache available
Mem: 7821 615 4200 48 3005 6868
Swap: 5119 0 5119
磁盘采集:
node_disk_read_time_seconds_total
磁盘使用率查询
文件系统采集:
node_filesystem_size_bytes
文件系统使用率查询
网络采集:
node_network_receive_bytes_total
网络使用率采集
二、数据库监控
1)安装数据库:
[root@localhost ~]# yum -y install mariadb-server
[root@localhost ~]# systemctl restart mariadb
2)创建数据库授权用户
[root@localhost ~]# mysql
MariaDB [(none)]> create user 'mysqld_exporter'@'localhost' identified by '123456';
MariaDB [(none)]> grant process, replication client, select on *.* to 'mysqld_exporter'@'localhost';
MariaDB [(none)]> flush privileges;
MariaDB [(none)]> select Host,User from mysql.user;
+-----------------------+-----------------+
| Host | User |
+-----------------------+-----------------+
| 127.0.0.1 | root |
| ::1 | root |
| localhost | |
| localhost | mysqld_exporter |
| localhost | root |
| localhost.localdomain | |
| localhost.localdomain | root |
+-----------------------+-----------------+
3)安装mysqld_exporter
[root@localhost opt]# ll
total 6956
-rw-r--r--. 1 root root 7121565 May 18 2020 mysqld_exporter-0.12.1.linux-amd64.tar.gz
[root@localhost opt]# tar -zxvf mysqld_exporter-0.12.1.linux-amd64.tar.gz
[root@localhost opt]# cp -r mysqld_exporter-0.12.1.linux-amd64 /usr/local/mysqld_exporter
4)配置数据库认证
[root@localhost ~]# cd /usr/local/mysqld_exporter/
[root@localhost mysqld_exporter]# vi .mysqld_exporter.cnf
[client]
user=mysqld_exporter
password=123456
5)启动mysqld_exporter
[root@localhost mysqld_exporter]# ./mysqld_exporter --config.my-cnf=".mysqld_exporter.cnf"
[root@localhost ~]# netstat -tlunp|grep 9104
tcp6 0 0 :::9104 :::* LISTEN 12070/./mysqld_expo
6)创建系统服务启动配置文件mysqld_exporter.service
[root@localhost ~]# vi /usr/lib/systemd/system/mysqld_exporter.service
[Unit]
Description=Prometheus MySQL daemon
After=network.target
[Service]
User=root
Group=root
Type=simple
Restart=always
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter \
--config.my-cnf=/usr/local/mysqld_exporter/.mysqld_exporter.cnf \
--collect.global_status \
--collect.auto_increment.columns \
--collect.info_schema.processlist \
--collect.binlog_size \
--collect.info_schema.tablestats \
--collect.global_variables \
--collect.info_schema.innodb_metrics \
--collect.info_schema.query_response_time \
--collect.info_schema.userstats \
--collect.info_schema.tables \
--collect.perf_schema.tablelocks \
--collect.perf_schema.file_events \
--collect.perf_schema.eventswaits \
--collect.perf_schema.indexiowaits \
--collect.perf_schema.tableiowaits \
--collect.slave_status \
--web.listen-address=0.0.0.0:9104
[Install]
WantedBy=multi-user.target
命令:
[root@localhost ~]# systemctl daemon-reload
[root@localhost ~]# systemctl start mysqld_exporter
7)与prometheus集成
[root@localhost ~]# vi /usr/local/prometheus/prometheus.yml
- job_name: 'mysqld_exporter'
scrape_interval: 10s
static_configs:
- targets: ['192.168.2.136:9104']
#添加一个job
[root@localhost ~]# systemctl restart prometheus
8)web测试
访问:http://192.168.2.136:9090
1、查询吞吐量
监控任何系统时,我们的主要关注点是确保系统工作有效完成,数据库运行时完成大量的查询操作,所有监控优先级应该确保MySQL按照预期执行查询。MySQL有一个名为Questions的内部计数器,MySQL术语为“服务器状态变量”。对于客户端应用程序发送的所有语句,该计数器都是递增的。
MariaDB [(none)]> show global status like "Questions";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Questions | 513 |
+---------------+-------+
命令行:mysql_global_status_questions
2、查询执行性能
关于查询执行性能表现方面,可以使用MySQL提供的Slow_queries计数器,每当查询的执行时间超过long_query_time参数指定的秒数时,计数器就会增加。默认阀值为10秒。
MariaDB [(none)]> show global status like "Slow_queries";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Slow_queries | 0 |
+---------------+-------+
命令行:mysql_global_status_slow_queries
3、连接情况
为了防止MySQL服务器的过载运行,数据库管理员需要根据业务进行预评估,以便现在客户端连接MySQL的数量。可以在my.cnf文件中配置最大连接数max_connections=512。
MariaDB [(none)]> show variables like "max_connections";
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 512 |
+-----------------+-------+
命令行:mysql_global_variables_max_connections
4、缓存池使用情况
当MySQL默认的存储引擎是InnoDB时,会使用缓存池来缓存表和索引的数据。可以在my.cnf中配置innodb_buffer_pool_size=128M。这是InnoDB最重要的参数,主要作用是缓存innodb表和索引、数据和插入数据,默认值为128M。
MariaDB [(none)]> show global status like "Innodb_buffer_pool_reads";
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| Innodb_buffer_pool_reads | 144 |
+--------------------------+-------+
命令行:mysql_global_status_innodb_buffer_pool_reads
三、黑盒监控blackbox_exporter
Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的监控数据采集
blackbox_exporter下载地址:https://prometheus.io/download/
1)安装blackbox_exporter
上传软件
[root@localhost opt]# ll blackbox_exporter-0.16.0.linux-amd64.tar.gz
-rw-r--r--. 1 root root 8314959 May 18 20:40 blackbox_exporter-0.16.0.linux-amd64.tar.gz
[root@localhost opt]# tar -zxvf blackbox_exporter-0.16.0.linux-amd64.tar.gz
[root@localhost opt]# cp -r blackbox_exporter-0.16.0.linux-amd64 /usr/local/blackbox_exporter
2)添加blackbox_exporter为系统服务开机启动配置文件blackbox_exporter.service
[root@localhost ~]# vi /usr/lib/systemd/system/blackbox_exporter.service
[Unit]
Description=blackbox_exporter
After=network.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter \
--config.file=/usr/local/blackbox_exporter/blackbox.yml \
--web.listen-address=:9115
Restart=on-failure
[Install]
WantedBy=multi-user.target
重启:
[root@localhost ~]# systemctl daemon-reload
[root@localhost ~]# systemctl restart blackbox_exporter
[root@localhost ~]# netstat -tlunp | grep blackbox_expo
tcp6 0 0 :::9115 :::* LISTEN 11925/blackbox_expo
- icmp监控,监控主机存活状态
通过icmp 这个指标的采集,我们可以确认到对方的线路是否有问题。这个也是监控里面比较重要的一个环节。我们要了解全国各地到我们机房的线路有哪条有问题我们总结了两种方案:
全国各地各节点ping 和访问数据采集。这种类似运营商有提供这类服务,但是要花钱;
我现在用的方法就是:找各地测试ping 的节点,我们从机房主动ping 看是否到哪个线路有故障。
prometheus 添加相关监控,Blackbox 使用默认配置启动即可
[root@localhost ~]# cat /usr/local/prometheus/prometheus.yml
- job_name: "icmp_ping"
metrics_path: /probe
params:
module: [icmp] # 使用icmp模块
file_sd_configs:
- refresh_interval: 10s
files:
- "/qq/ping_status*.yml" #具体的配置文件
relabel_configs:
- source_labels: [__address__]
regex: (.*)(:80)?
target_label: __param_target
replacement: ${1}
- source_labels: [__param_target]
target_label: instance
- source_labels: [__param_target]
regex: (.*)
target_label: ping
replacement: ${1}
- source_labels: []
regex: .*
target_label: __address__
replacement: 127.0.0.1:9115
具体配置
[root@localhost ~]# cat /qq/ping_status.yml
- targets: ['220.181.38.150','14.215.177.39','180.101.49.12','14.215.177.39','180.101.49.11','14.215.177.38','14.215.177.38']
labels:
group: '一线城市-电信网络监控'
- targets: ['112.80.248.75','163.177.151.109','61.135.169.125','163.177.151.110','180.101.49.11','61.135.169.121','180.101.49.11']
labels:
group: '一线城市-联通网络监控'
- targets: ['183.232.231.172','36.152.44.95','182.61.200.6','36.152.44.96','220.181.38.149']
labels:
group: '一线城市-移动网络监控'
监控主机端口存活状态
[root@localhost ~]# cat /usr/local/prometheus/prometheus.yml
- job_name: 'prometheus_port_status'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets: ['10.165.94.31:8765']
labels:
instance: 'port_status'
group: 'tcp'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 172.19.155.133:9115
10.165.94.31是被监控端ip,172.19.155.133是Blackbox_exporter
- 监控网站状态
prometheus 添加相关监控,Blackbox 使用默认配置启动即可
[root@localhost ~]# cat /usr/local/prometheus/prometheus.yml
- job_name: "blackbox"
metrics_path: /probe
params:
module: [http_2xx] #使用http模块
file_sd_configs:
- refresh_interval: 1m
files:
- "/qq/blackbox*.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
具体配置:
[root@localhost ~]# cat /qq/blackbox-dis.yml
- targets:
- https://www.zhibo8.cc
- https://www.baidu.com