PrometheusGranfana 通过文件配置实现自动化
Posted shark_西瓜甜
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PrometheusGranfana 通过文件配置实现自动化相关的知识,希望对你有一定的参考价值。
使用配置文件方式配置 Datasource
配置文件目录:
/etc/grafana/provisioning/datasources/
文件 datasource.yml
内容如下
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
# Access mode - proxy (server in the UI) or direct (browser in the UI).
access: proxy
url: http://prometheus 的 IP:9090
#url: http://prometheus:9091
jsonData:
httpMethod: POST
exemplarTraceIdDestinations:
# Field with internal link pointing to data source in Grafana.
# datasourceUid 的值可以是任意的值,但是需要是全局唯一。并且这个值是 dashboards 中使用到的.
- datasourceUid: PBFA97CFB590B2093
name: traceID
# Field with external link.
- name: traceID
url: 'http://localhost:3000/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22Jaeger%22,%7B%22query%22:%22$$__value.raw%22%7D%5D'
使用配置文件方式配置 Dashboard
配置文件路径:
/etc/grafana/provisioning/dashboards/
将从官方下载好的 json 文件中的 annotations
任意文件名.json
"annotations":
"list": [
...
"datasource": "-- Grafana --",
修改为
"annotations":
"list": [
...
"datasource":
"type": "datasource",
"uid": "grafana"
,
sed -ri 's/"-- Grafana --",/\\n "type": "datasource",\\n "uid": "grafana"\\n ,doc /g' nodeExporter.json
再将所有的
$DS_TEST-PROMETHEUS
修改 为
在 Datasource 中配置的 uid, 这里假设是 PBFA97CFB590B2093
或者将 “$DS_PROMETHEUS”,
修改为
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
,
sed -i 's#"\\$DS_PROMETHEUS",#\\n "type": "prometheus",\\n "uid": "PBFA97CFB590B2093"\\n ,#gp' blackbox.json
使用文件配置创建和管理警报资源
官方文档 https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/
告警规则
/etc/grafana/provisioning/alerting/
在此目录下创建 yaml 文件,示例文件如下
# config file version
apiVersion: 1
# 要导入或更新的规则组列表
groups:
# <int> 组织 ID, default = 1
- orgId: 1
# <string, required> 规则组的名称
name: my_rule_group
# <string, required> 规则组将存储在其中的文件夹的名称
folder: my_first_folder
# <duration, required> 规则检查的时间间隔
interval: 60s
# <list, required> 属于规则组的规则列表
rules:
# <string, required> 规则的唯一标识符
- uid: my_id_1
# <string, required> 将在UI中显示的规则的标题
title: my_first_rule
# <string, required> 条件应使用哪个查询
condition: A
# <list, required>应在每次评估中执行的查询对象列表-应通过API获取
data:
- refId: A
# datasourceUid 数据源 ID
datasourceUid: 'PBFA97CFB590B2093'
model:
# 条件
conditions:
- evaluator:
params:
- 3
type: gt
operator:
type: and
query:
params:
- A
reducer:
type: last
type: query
datasource:
type: __expr__
uid: '-100'
expression: 1==0
intervalMs: 1000
maxDataPoints: 43200
refId: A
type: math
# <string> 警报规则应链接到的仪表板的UID
dashboardUid: my_dashboard
# <int> 警报规则应链接到的面板的ID
panelId: 123
# <string> 未返回数据时警报规则的状态
# 可以设置的值: "NoData", "Alerting", "OK", default = NoData
noDataState: Alerting
# <string> 查询执行失败时警报规则的状态
# 可以设置的值: "Error", "Alerting", "OK", default = Alerting
# <duration, required> 警报规则被触发后持续多久才发出告警信息
for: 60s
# <map<string, string>> 描述信息,任意数据的 key: value
annotations:
some_key: some_value
# <map<string, string> 可用于筛选和路由警报的字符串映射
labels:
team: sre_team_1
告警通道 钉钉
/etc/grafana/provisioning/alerting/
dingding.yml
# config file version
apiVersion: 1
# List of contact points to import or update
contactPoints:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the contact point
name: dingding
receivers:
# <string, required> unique identifier for the receiver
- uid: dingding
type: dingding
settings:
# <string, required>
url: https://oapi.dingtalk.com/robot/send?access_token=xxx
# <string> options: link, actionCard
# msgType: link
msgType: actionCard
# <string>
message: |
template "default.message" .
通知策略
/etc/grafana/provisioning/notifiers/
notifiers.yml
# config file version
apiVersion: 1
# List of notification policies
policies:
# <int> organization ID, default = 1
- orgId: 1
# <string> name of the contact point that should be used for this route
receiver: dingding
# <list> The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
#
# To aggregate by all possible labels use the special value '...' as
# the sole label name, for example:
# group_by: ['...']
# This effectively disables aggregation entirely, passing through all
# alerts as-is. This is unlikely to be what you want, unless you have
# a very low alert volume or your upstream notification system performs
# its own grouping.
group_by: ['...']
# <list> a list of matchers that an alert has to fulfill to match the node
matchers:
- alertname = Watchdog
- severity =~ "warning|critical"
# <list> Times when the route should be muted. These must match the name of a
# mute time interval.
# Additionally, the root node cannot have any mute times.
# When a route is muted it will not send any notifications, but
# otherwise acts normally (including ending the route-matching process
# if the `continue` option is not set)
mute_time_intervals:
- abc
# <duration> How long to initially wait to send a notification for a group
# of alerts. Allows to collect more initial alerts for the same group.
# (Usually ~0s to few minutes), default = 30s
group_wait: 30s
# <duration> How long to wait before sending a notification about new alerts that
# are added to a group of alerts for which an initial notification has
# already been sent. (Usually ~5m or more), default = 5m
group_interval: 5m
# <duration> How long to wait before sending a notification again if it has already
# been sent successfully for an alert. (Usually ~3h or more), default = 4h
repeat_interval: 4h
# <list> Zero or more child routes
# routes:
# ...
配置模板
# config file version
apiVersion: 1
# List of alert rule UIDs that should be deleted
deleteTemplates:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the template, must be unique
name: my_first_template
配置静音
# config file version
apiVersion: 1
# List of mute time intervals to import or update
muteTimes:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the mute time interval, must be unique
name: mti_1
# <list> time intervals that should trigger the muting
# refer to https://prometheus.io/docs/alerting/latest/configuration/#time_interval-0
time_intervals:
- times:
- start_time: '06:00'
end_time: '23:59'
weekdays: ['monday:wednesday', 'saturday', 'sunday']
months: ['1:3', 'may:august', 'december']
years: ['2020:2022', '2030']
days_of_month: ['1:5', '-3:-1']
以上是关于PrometheusGranfana 通过文件配置实现自动化的主要内容,如果未能解决你的问题,请参考以下文章