PrometheusGranfana集成
Posted shark_西瓜甜
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PrometheusGranfana集成相关的知识,希望对你有一定的参考价值。
使用配置文件方式配置 Datasource
配置文件目录:
/etc/grafana/provisioning/datasources/
文件 datasource.yml
内容如下
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
# Access mode - proxy (server in the UI) or direct (browser in the UI).
access: proxy
url: http://prometheus 的 IP:9090
#url: http://prometheus:9091
jsonData:
httpMethod: POST
exemplarTraceIdDestinations:
# Field with internal link pointing to data source in Grafana.
# datasourceUid 的值可以是任意的值,但是需要是全局唯一。并且这个值是 dashboards 中使用到的.
- datasourceUid: PBFA97CFB590B2093
name: traceID
# Field with external link.
- name: traceID
url: 'http://localhost:3000/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22Jaeger%22,%7B%22query%22:%22$$__value.raw%22%7D%5D'
使用配置文件方式配置 Dashboard
配置文件路径:
/etc/grafana/provisioning/dashboards/
将从官方下载好的 json 文件中的 annotations
任意文件名.json
"annotations":
"list": [
...
"datasource": "-- Grafana --",
修改为
"annotations":
"list": [
...
"datasource":
"type": "datasource",
"uid": "grafana"
,
sed -ri 's/"-- Grafana --",/\\n "type": "datasource",\\n "uid": "grafana"\\n ,doc /g' nodeExporter.json
再将所有的
$DS_TEST-PROMETHEUS
修改 为
在 Datasource 中配置的 uid, 这里假设是 PBFA97CFB590B2093
或者将 “$DS_PROMETHEUS”,
修改为
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
,
sed -i 's#"\\$DS_PROMETHEUS",#\\n "type": "prometheus",\\n "uid": "PBFA97CFB590B2093"\\n ,#gp' blackbox.json
使用文件配置创建和管理警报资源
官方文档 https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/
告警规则
/etc/grafana/provisioning/alerting/
在此目录下创建 yaml 文件,示例文件如下
# config file version
apiVersion: 1
# List of rule groups to import or update
groups:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the rule group
name: my_rule_group
# <string, required> name of the folder the rule group will be stored in
folder: my_first_folder
# <duration, required> interval that the rule group should evaluated at
interval: 60s
# <list, required> list of rules that are part of the rule group
rules:
# <string, required> unique identifier for the rule
- uid: my_id_1
# <string, required> title of the rule that will be displayed in the UI
title: my_first_rule
# <string, required> which query should be used for the condition
condition: A
# <list, required> list of query objects that should be executed on each
# evaluation - should be obtained trough the API
data:
- refId: A
datasourceUid: '-100'
model:
conditions:
- evaluator:
params:
- 3
type: gt
operator:
type: and
query:
params:
- A
reducer:
type: last
type: query
datasource:
type: __expr__
uid: '-100'
expression: 1==0
intervalMs: 1000
maxDataPoints: 43200
refId: A
type: math
# <string> UID of a dashboard that the alert rule should be linked to
dashboardUid: my_dashboard
# <int> ID of the panel that the alert rule should be linked to
panelId: 123
# <string> the state the alert rule will have when no data is returned
# possible values: "NoData", "Alerting", "OK", default = NoData
noDataState: Alerting
# <string> the state the alert rule will have when the query execution
# failed - possible values: "Error", "Alerting", "OK"
# default = Alerting
# <duration, required> for how long should the alert fire before alerting
for: 60s
# <map<string, string>> a map of strings to pass around any data
annotations:
some_key: some_value
# <map<string, string> a map of strings that can be used to filter and
# route alerts
labels:
team: sre_team_1
告警通道 钉钉
/etc/grafana/provisioning/alerting/
dingding.yml
# config file version
apiVersion: 1
# List of contact points to import or update
contactPoints:
# <int> organization ID, default = 1
- orgId: 1
# <string, required> name of the contact point
name: dingding
receivers:
# <string, required> unique identifier for the receiver
- uid: dingding
type: dingding
settings:
# <string, required>
url: https://oapi.dingtalk.com/robot/send?access_token=xxx
# <string> options: link, actionCard
# msgType: link
msgType: actionCard
# <string>
message: |
template "default.message" .
通知策略
/etc/grafana/provisioning/notifiers/
notifiers.yml
# config file version
apiVersion: 1
# List of notification policies
policies:
# <int> organization ID, default = 1
- orgId: 1
# <string> name of the contact point that should be used for this route
receiver: dingding
# <list> The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
#
# To aggregate by all possible labels use the special value '...' as
# the sole label name, for example:
# group_by: ['...']
# This effectively disables aggregation entirely, passing through all
# alerts as-is. This is unlikely to be what you want, unless you have
# a very low alert volume or your upstream notification system performs
# its own grouping.
group_by: ['...']
# <list> a list of matchers that an alert has to fulfill to match the node
matchers:
- alertname = Watchdog
- severity =~ "warning|critical"
# <list> Times when the route should be muted. These must match the name of a
# mute time interval.
# Additionally, the root node cannot have any mute times.
# When a route is muted it will not send any notifications, but
# otherwise acts normally (including ending the route-matching process
# if the `continue` option is not set)
mute_time_intervals:
- abc
# <duration> How long to initially wait to send a notification for a group
# of alerts. Allows to collect more initial alerts for the same group.
# (Usually ~0s to few minutes), default = 30s
group_wait: 30s
# <duration> How long to wait before sending a notification about new alerts that
# are added to a group of alerts for which an initial notification has
# already been sent. (Usually ~5m or more), default = 5m
group_interval: 5m
# <duration> How long to wait before sending a notification again if it has already
# been sent successfully for an alert. (Usually ~3h or more), default = 4h
repeat_interval: 4h
# <list> Zero or more child routes
# routes:
# ...
以上是关于PrometheusGranfana集成的主要内容,如果未能解决你的问题,请参考以下文章