Helm / kube-prometheus-stack:我可以在 values.yaml 中为导出器创建规则吗?
Posted
技术标签:
【中文标题】Helm / kube-prometheus-stack:我可以在 values.yaml 中为导出器创建规则吗?【英文标题】:Helm / kube-prometheus-stack: Can I create rules for exporters in values.yaml? 【发布时间】:2021-12-10 14:10:39 【问题描述】:我希望能够为 prometheus-blackbox-exporter
指定我的所有规则,因此已将其添加到 rules-mine.yaml
并使用
helm upgrade --install -n monitoring blackbox -f values.yaml -f rules-mine.yaml .
我看不到 http://localhost:9090/rules 中列出的任何规则,并且似乎没有任何东西被评估为没有警报...。我需要以 IaC 的方式完成所有工作,并以自动化方式通过 terraform 进行部署。
是否可以通过这种方式向导出器添加规则? 如果是这样,那么任何人都可以看到下面的文件有问题吗? 如果没有,我怎样才能有效地向许多出口商添加规则?rules-mine.yaml
文件包含:
prometheusRule:
enabled: true
namespace: monitoring
additionalLabels:
team: foxtrot_blackbox
environment: production
cluster: cluster
namespace: namespace_x
namespace: "monitoring"
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 0m
labels:
severity: critical
annotations:
summary: Blackbox probe failed (instance `` $labels.instance ``)
description: "Probe failed\n VALUE = `` $value ``"
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 1m
labels:
severity: warning
annotations:
summary: Blackbox slow probe (instance `` $labels.instance ``)
description: "Blackbox probe took more than 1s to complete\n VALUE = `` $value ``"
感谢您的帮助....
【问题讨论】:
【参考方案1】:您确定您没有在标签名称中打错字:“环境”吗? 这肯定与您的预期不符,除非您实际标记了您的来源。
最好的
【讨论】:
我认为这个错字与问题的内容没有任何关系。【参考方案2】:我发现最好的方法似乎是将导出器规则添加到 kube-prometheus-stack
values.yaml
文件(我实际上创建了一个单独的 rules.yaml
文件)并将其提供给 helm:
helm upgrade --install -n monitoring prometheus --create-namespace -f values-mine.yaml -f rules-mine.yaml prometheus-community/kube-prometheus-stack
然后按照我的意愿选择所有规则,这似乎是一个不错的解决方案。但我仍然希望它们与导出器分组 - 如果我找到解决方案,我会再次发布。
additionalPrometheusRulesMap:
prometheus.rules:
groups:
- name: company.prometheus.rules
rules:
- alert: PrometheusNotificationsBacklog
expr: min_over_time(prometheus_notifications_queue_length[10m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: Prometheus notifications backlog (instance $labels.instance )
description: The Prometheus notification queue has not been empty for 10 minutes\nVALUE = $value
dashboard_url: $grafana_url/d/blackbox/blackbox-exporter?var-instance= $labels.instance
runbook_url: $wiki_url/ $labels.alertname
company.blackbox.rules:
groups:
- name: company.blackbox.rules
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: Blackbox probe failed (instance $labels.instance )
description: Probe failed\nVALUE = $value
dashboard_url: $grafana_url/d/blackbox/blackbox-exporter?var-instance= $labels.instance
runbook_url: $wiki_url/ $labels.alertname
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 3m
labels:
severity: warning
annotations:
summary: Blackbox slow probe (instance $labels.instance )
description: "Blackbox probe took more than 1s to complete\nVALUE = $value "
dashboard_url: $grafana_url/d/blackbox/blackbox-exporter?var-instance= $labels.instance
runbook_url: $wiki_url/ $labels.alertname
# etc....
【讨论】:
【参考方案3】:一位同事发现这是完全可能的。它似乎与原始实现中使用的引用有关。以下内容正在使用中,因此在此发布,希望对其他人有用。
总之,
`` $labels.instance ``
== 不好
`$labels.instance`
== 好
prometheusRule:
enabled: true
additionalLabels:
client: $client_id
cluster: $cluster
environment: $environment
grafana: $grafana_url
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: Blackbox probe failed for `$labels.instance`
description: Probe failed VALUE = `$value`
dashboard_url: https://$grafana_url/d/blackbox/blackbox-exporter?var-instance=`$labels.instance`
runbook_url: $wiki_url/BlackboxProbeFailed
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 2m
labels:
severity: warning
annotations:
summary: Blackbox slow probe for `$labels.instance`
description: Blackbox probe took more than 1s to complete VALUE = `$value|humanizeDuration`
dashboard_url: https://$grafana_url/d/blackbox/blackbox-exporter?var-instance=`$labels.instance`
runbook_url: $wiki_url/BlackboxSlowProbe
请忽略任何缺失的变量等
【讨论】:
以上是关于Helm / kube-prometheus-stack:我可以在 values.yaml 中为导出器创建规则吗?的主要内容,如果未能解决你的问题,请参考以下文章
通过 Terraform Helm 提供程序和 Azure DevOps 部署 helm 图表,同时从 ACR 获取 helm 图表