prometheus配置详解
本文按照官方文档的相关内容整理整理的配置语法以及实现功能
1.prometheus 配置文件主体
# 此片段指定的是prometheus的全局配置, 比如采集间隔,抓取超时时间等.
global:
# 抓取间隔
[ scrape_interval: <duration> | default = 1m ]
# 抓取超时时间
[ scrape_timeout: <duration> | default = 10s ]
# 评估规则间隔
[ evaluation_interval: <duration> | default = 1m ]
# 外部一些标签设置
external_labels:
[ <labelname>: <labelvalue> ... ]
# File to which PromQL queries are logged.
# Reloading the configuration will reopen the file.
[ query_log_file: <string> ]
# 此片段指定报警规则文件, prometheus根据这些规则信息,会推送报警信息到alertmanager中。
rule_files:
[ - <filepath_glob> ... ]
# 此片段指定抓取配置,prometheus的数据采集通过此片段配置。
scrape_configs:
[ - <scrape_config> ... ]
# 此片段指定报警配置, 这里主要是指定prometheus将报警规则推送到指定的alertmanager实例地址。
alerting:
alert_relabel_configs:
[ - <relabel_config> ... ]
alertmanagers:
[ - <alertmanager_config> ... ]
# 指定后端的存储的写入api地址。
remote_write:
[ - <remote_write> ... ]
# 指定后端的存储的读取api地址。
remote_read:
[ - <remote_read> ... ]
2.scrape_configs配置详解
一个scrape_config 片段指定一组目标和参数, 目标就是实例,指定采集的端点, 参数描述如何采集这些实例, 配置文件格式如下
# The job name assigned to scraped metrics by default.
job_name: <job_name>
# 抓取间隔,默认继承global值。
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]
# 抓取超时时间,默认继承global值。
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]
# 抓取路径, 默认是/metrics
[ metrics_path: <path> | default = /metrics ]
# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels.
#
# Setting honor_labels to "true" is useful for use cases such as federation and
# scraping the Pushgateway, where all labels specified in the target should be
# preserved.
#
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
[ honor_labels: <boolean> | default = false ]
# honor_timestamps controls whether Prometheus respects the timestamps present
# in scraped data.
#
# If honor_timestamps is set to "true", the timestamps of the metrics exposed
# by the target will be used.
#
# If honor_timestamps is set to "false", the timestamps of the metrics exposed
# by the target will be ignored.
[ honor_timestamps: <boolean> | default = true ]
# 指定采集使用的协议,http或者https。
[ scheme: <scheme> | default = http ]
# 指定url参数。
params:
[ <string>: [<string>, ...] ]
# 指定认证信息。
basic_auth:
[ username: <string> ]
[ password: <secret> ]
[ password_file: <string> ]
# 指定token的数值, 用户get metrics认证使用
[ bearer_token: <secret> ]
# 指定获取token的文件, 用户get metrics认证使用
[ bearer_token_file: /path/to/bearer/token/file ]
# 指定获取metrics时需要的tls证书
tls_config:
[ <tls_config> ]
# Optional proxy URL.
[ proxy_url: <string> ]
# List of Azure service discovery configurations.
azure_sd_configs:
[ - <azure_sd_config> ... ]
# List of Consul service discovery configurations.
consul_sd_configs:
[ - <consul_sd_config> ... ]
# List of DNS service discovery configurations.
dns_sd_configs:
[ - <dns_sd_config> ... ]
# List of EC2 service discovery configurations.
ec2_sd_configs:
[ - <ec2_sd_config> ... ]
# List of OpenStack service discovery configurations.
openstack_sd_configs:
[ - <openstack_sd_config> ... ]
# List of file service discovery configurations.
file_sd_configs:
[ - <file_sd_config> ... ]
# List of GCE service discovery configurations.
gce_sd_configs:
[ - <gce_sd_config> ... ]
# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
[ - <kubernetes_sd_config> ... ]
# List of Marathon service discovery configurations.
marathon_sd_configs:
[ - <marathon_sd_config> ... ]
# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
[ - <nerve_sd_config> ... ]
# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
[ - <serverset_sd_config> ... ]
# List of Triton service discovery configurations.
triton_sd_configs:
[ - <triton_sd_config> ... ]
# 静态指定服务job
static_configs:
[ - <static_config> ... ]
# 控制采集哪些数据标签,可以删除不必要的标签
relabel_configs:
[ - <relabel_config> ... ]
# 添加、编辑或修改指标的标签值或标签格式。
metric_relabel_configs:
[ - <relabel_config> ... ]
# Per-scrape limit on number of scraped samples that will be accepted.
# If more than this number of samples are present after metric relabelling
# the entire scrape will be treated as failed. 0 means no limit.
[ sample_limit: <int> | default = 0 ]
因为部署在kubernetes环境中所以我只在意基于kubernetes_sd_configs的服务发现和static_configs静态文件的发现
2.1 relabel_configs
relable_configss是功能强大的工具,就是Relabel可以在Prometheus采集数据之前,通过Target实例的Metadata信息,动态重新写入Label的值。除此之外,我们还能根据Target实例的Metadata信息选择是否采集或者忽略该Target实例。
relabel_configs
配置格式如下:
# The source labels select values from existing labels. Their content is concatenated
# using the configured separator and matched against the configured regular expression
# for the replace, keep, and drop actions.
[ source_labels: '[' <labelname> [, ...] ']' ]
# 默认分隔符
[ separator: <string> | default = ; ]
# Label to which the resulting value is written in a replace action.
# It is mandatory for replace actions. Regex capture groups are available.
[ target_label: <labelname> ]
# Regular expression against which the extracted value is matched.
[ regex: <regex> | default = (.*) ]
# Modulus to take of the hash of the source label values.
[ modulus: <uint64> ]
# Replacement value against which a regex replace is performed if the
# regular expression matches. Regex capture groups are available.
[ replacement: <string> | default = $1 ]
# Action to perform based on regex matching.
[ action: <relabel_action> | default = replace ]
其中action主要包括:
replace:默认,通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组
keep:删除regex与连接不匹配的目标 source_labels
drop:删除regex与连接匹配的目标 source_labels
labeldrop:删除regex匹配的标签
labelkeep:删除regex不匹配的标签
hashmod:设置target_label为modulus连接的哈希值source_labels
labelmap:匹配regex所有标签名称。然后复制匹配标签的值进行分组,replacement分组引用({2},…)替代
prometheus中的数值都是key:value格式, 其中replace、keep、drop都是对value的操作, labelmap、labeldrop、labelkeep都是对key的操作
replace用法
replace是action的默认值, 通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
上面的列子中address的值为$1:$2
, 其中 $1
是正则表达式([^:]+)(?::\d+)?
从address中获取, $2
是正则表达式(\d+)从(\d+)
中获取, 最后的address的数值为192.168.1.1:9100
keep用法
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
上面的例子只要匹配__meta_kubernetes_service_annotation_prometheus_io_probe=true数据就保留, 反正source_labels中的值没有匹配regex中的值就丢弃
drop用法
drop 的使用和keep刚好相反, 还是使用keep的例子:
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
上面的例子只要__meta_kubernetes_service_annotation_prometheus_io_probe这个标签的值为true就丢弃, 反之如果__meta_kubernetes_service_annotation_prometheus_io_probe!=true的数据就保留
labelmap用法
labelmap的用法和上面说到replace、keep、drop不同, labelmap匹配的是标签名称, 而replace、keep、drop匹配的是value
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
上面例子中只要匹配到正则表达式__meta_kubernetes_service_label_(.+)
的标签, 就将标签重写为(.+)
中的内容, 效果如下:
原标签: __meta_kubernetes_service_label_test=111
重写后: test=111
hashmod用法
待续
2.1.6 labeldrop用法
使用labeldrop则可以对Target标签进行过滤,删除符合过滤条件的标签,例如:
relabel_configs:
- action: labeldrop
regex: __meta_kubernetes_service_label_(.+)
该配置会使用regex匹配当前target中的所有标签, 删除符合规则的标签, 反之保留不符合规则的
labelkeep用法
使用labelkeep则可以对Target标签进行过滤,仅保留符合过滤条件的标签,例如:
relabel_configs:
- action: labelkeep
regex: __meta_kubernetes_service_label_(.+)
该配置会使用regex匹配当前target中的所有标签, 保留符合规则的标签, 反之不符合的移除
2.2 metric_relabel_configs
上面我们说到relabel_config是获取metrics之前对标签的重写, 对应的metric_relabel_configs是对获取metrics之后对标签的操作, metric_relabel_configs能够确定我们保存哪些指标,删除哪些指标,以及这些指标将是什么样子。
metric_relabel_configs的配置和relabel_config的配置基本相同, 如果需要配置相关参数请参考2.scrape_configs
2.2 static_configs
主要用途为指定exporter获取metrics数据的目标, 可以指定prometheus、 mysql、 nginx等目标
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
此规则主要是用于抓取prometheus自己数据的配置, targets列表中的为prometheus 获取metrics的地址和端口, 因为没有指定metrics_path所以使用默认的/metrics中获取数据,
简单理解就是, prometheus访问 http://localhost:9090/metrics 获取监控数据
还可以配置指定exporter中的目的地址, 如获取node_exporter的数据
scrape_configs:
- job_name: node
static_configs:
- targets:
- 10.40.58.153:9100
- 10.40.61.116:9100
- 10.40.58.154:9100
简单理解为分别访问 http://10.40.58.153:9100/metrics http://10.40.58.154:9100/metrics http://10.40.61.116:9100/metrics 获取metrics数据
2.3 kubernetes_sd_configs
kubernetes的服务发现可以刮取以下几种数据
- node
- service
- pod
- endpoints
- ingress
通过指定kubernetes_sd_config的模式为endpoints,Prometheus会自动从Kubernetes中发现到所有的endpoints节点并作为当前Job监控的Target实例。如下所示,
kubernetes_sd_configs:
- role: endpoints
配置实例一
该配置是使用kubernetes的发现机制发现kube-apiservers
scrape_configs:
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
上面的刮取配置定义了如下信息:
- job名称为kubernetes-apiservers(job-name: kubernetes-apiservers)
- 获取kubernetes中endpoints的相关信息(role: endpoints)
- 使用https的方式获取信息(scheme: https)
- target的需要满足default名称空间下service名字为kubernetes,并且端口为https
- __meta_kubernetes_namespace=~default
- __meta_kubernetes_service_name=~kubernetes
- __meta_kubernetes_endpoint_port_name=~=https
配置实例二
该配置是自动发现kubernetes中的endpoints
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node
可以看到relable_configs中的规则很多, 具体的内容如下
- job名称为kubernetes-service-endpoints(job-name: kubernetes-service-endpoints)
- 获取kubernetes中endpoints的相关信息(role: endpoints)
- 使用http的方式获取信息(没有配置使用默认配置http)
- relabel配置部分:
-
annotations中必须存在
prometheus.io/scrape: "true"
配置才会被promethues发现 -
__scheme__
的值为__meta_kubernetes_service_annotation_prometheus_io_scheme的value, 需要满足正则表达式(https?)
-
__metrics_path__
的值为__meta_kubernetes_service_annotation_prometheus_io_path的value, 满足正则表达式(.+)
-
__address__
的value替换为IP:port的方式 -
kubernetes_namespace的value replace为__meta_kubernetes_namespace的value
-
kubernetes_name的value replace为__meta_kubernetes_service_name的value
-
kubernetes_node的value replace为__meta_kubernetes_pod_node_name的value
-
获取的metrics的信息如下:
up{app="prometheus",app_kubernetes_io_managed_by="Helm",chart="prometheus-11.3.0",component="node-exporter",heritage="Helm",instance="10.40.61.116:9100",job="kubernetes-service-endpoints",kubernetes_name="prometheus-node-exporter",kubernetes_namespace="devops",kubernetes_node="py-modelo2o08cn-p005.pek3.example.com",release="prometheus"}