Prometheus.yml 配置文件解析
Posted 都市侠客行
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Prometheus.yml 配置文件解析相关的知识,希望对你有一定的参考价值。
配置文件指标说明
global: 全局配置(如果有内部单独设定,会覆盖这个参数)
alerting: 告警插件定义。这里会设定alertmanager这个报警插件。
rule_files: 告警规则。 按照设定参数进行扫描加载,用于自定义报警规则,其报警媒介和route路由由alertmanager插件实现。
scrape_configs:采集配置。配置数据源,包含分组job_name以及具体target。又分为静态配置和服务发现
原始配置文件内容:
- # my global config
- global:
- scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
- evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
- # scrape_timeout is set to the global default (10s).
- # Alertmanager configuration
- alerting:
- alertmanagers:
- - static_configs:
- - targets:
- # - alertmanager:9093
- # Load rules once and periodically evaluate them according to the global \'evaluation_interval\'.
- rule_files:
- # - "first_rules.yml"
- # - "second_rules.yml"
- # A scrape configuration containing exactly one endpoint to scrape:
- # Here it\'s Prometheus itself.
- scrape_configs:
- # The job name is added as a label `job=` to any timeseries scraped from this config.
- - job_name: \'prometheus\'
- # metrics_path defaults to \'/metrics\'
- # scheme defaults to \'http\'.
- static_configs:
- - targets: [\'localhost:9090\']
1.global指标说明:
# my global config
global:
scrape_interval: 15s # 默认15s 全局每次数据收集的间隔
evaluation_interval: 15s # 规则扫描时间间隔是15秒,默认不填写是 1分钟
scrape_timeout: 5s #超时时间
external_labels: # 用于外部系统标签的,不是用于metrics(度量)数据
2.alerting说明
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
这里定义和prometheus集成的alertmanager插件,用于监控报警。后续会单独进行alertmanger插件的配置、配置说明、报警媒介以及route路由规则记录。
3.rule_files说明
这个主要是用来设置告警规则,基于设定什么指标进行报警(类似触发器trigger)。这里设定好规则以后,prometheus会根据全局global设定的evaluation_interval参数进行扫描加载,规则改动后会自动加载。其报警媒介和route路由由alertmanager插件实现。
# Load rules once and periodically evaluate them according to the global \'evaluation_interval\'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
4.scrape_configs 默认规则:
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: \'prometheus\'
# metrics_path defaults to \'/metrics\'
# scheme defaults to \'http\'.
static_configs:
- targets: [\'localhost:9090\']
支持的配置:
job_name: 任务目标名,可以理解成分组,每个分组包含具体的target组员。
scrape_interval: 5s #这里如果单独设定的话,会覆盖global设定的参数,拉取时间间隔为5s
metrics_path # 监控项访问的url路径,https://prometheus.21yunwei.com/metrics【通过前端web做了反向代理到后端】
targets: Endpoint # 监控目标访问地址
说明:上述为静态规则,没有设置自动发现。这种情况下增加主机需要自行修改规则,通过supervisor reload 对应任务,也是缺点:每次静态规则添加都要重启prometheus服务,不利于运维自动化。
prometheus支持服务发现
①文件服务发现
基于文件的服务发现方式不需要依赖其他平台与第三方服务,用户只需将 要新的target信息以yaml或json文件格式添加到target文件中 ,prometheus会定期从指定文件中读取target信息并更新
好处:
(1)不需要一个一个的手工去添加到主配置文件,只需要提交到要加载目录里边的json或yaml文件就可以了;
(2)方便维护,且不需要每次都重启prometheus服务端。
案例:
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: \'cn-hz-21yunwei-devops\'
# metrics_path defaults to \'/metrics\'
# scheme defaults to \'http\'.
#静态规则
static_configs:
- targets: [\'localhost:9090\']
#通过配置file 获取target,这里以21yunwei项目进行举例
- job_name: \'cn-hz-21yunwei-other\'
file_sd_configs:
- files:
- file_config/21yunwei/host.json json文件内容 [
{
"targets": [
"1.1.1.1:9010"
],
"labels": {
"group": "21yunwei",
"app": "web",
"hostname": "cn-hz-21yunwei-web"
}
},
{
"targets": [
"2.2.2.2:9010"
],
"labels": {
"group": "21yunwei",
"app": "devops",
"hostname": "cn-hz-21yunwei-devops"
}
}
] 成品图如下 网上找了一个全解prometheus.yml 很简单但是实用: (1)配置global参数(采集周期以及规则扫描周期); (2)集成alertmanager插件,用于后续报警操作; (3)设定报警rule 加载目录; (4)设定采集对象。这里既有静态设置也有设置服务发现。(服务发现用于后续target更改只需要进行规则修改即可,不需要进行prometheus守护进程重启) (5)设定功能检测。 这里定义了icmp、tcp_port、url三种check,分别通过调用blackbox_exporter来实现。 很简单但是实用: (1)配置global参数(采集周期以及规则扫描周期); (2)集成alertmanager插件,用于后续报警操作; (3)设定报警rule 加载目录; (4)设定采集对象。这里既有静态设置也有设置服务发现。(服务发现用于后续target更改只需要进行规则修改即可,不需要进行prometheus守护进程重启) (5)设定功能检测。 这里定义了icmp、tcp_port、url三种check,分别通过调用blackbox_exporter来实现。
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration:告警配置,集成alertmanager插件
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
# Load rules once and periodically evaluate them according to the global \'evaluation_interval\'.
rule_files:
- "rule/*.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it\'s Prometheus itself.
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: \'cn-hz-21yunwei-devops\'
# metrics_path defaults to \'/metrics\'
# scheme defaults to \'http\'.
static_configs:
- targets: [\'cn-hz-21yunwei-devops:9100\']
#通过配置file 获取target,记录21yunwei的 web
- job_name: \'cn-hz-21yunwei-other\'
file_sd_configs:
- files:
- file_config/21yunwei/host.json
#判断告警搜 probe_success
## tcp端口检测
- job_name: "tcp_port_check"
scrape_interval: 15s
scrape_timeout: 15s
metrics_path: /probe
params:
module: [tcp_connect]
file_sd_configs:
- files:
- check/port/*_port.json
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: ******:9115
## 判断状态码搜 probe_http_status_code
## 接口检测
- job_name: \'http_url_check\'
scrape_interval: 15s
scrape_timeout: 15s
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
file_sd_configs:
- files:设置状态
- check/url/*_url.json
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: *******:9115
### ICMP检测
- job_name: \'icmp_check\'
scrape_interval: 15s
scrape_timeout: 15s
metrics_path: /probe
params:
module: [icmp]
file_sd_configs:
- files:
- check/icmp/*_icmp.json
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: ******:9115
以上是关于Prometheus.yml 配置文件解析的主要内容,如果未能解决你的问题,请参考以下文章
无法使用docker(prom / prometheus)加载prometheus.yml配置文件