PrometheusGranfana集成

Posted 2022-12-23 shark_西瓜甜

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了PrometheusGranfana集成相关的知识，希望对你有一定的参考价值。

使用配置文件方式配置 Datasource

配置文件目录:
/etc/grafana/provisioning/datasources/

文件 datasource.yml 内容如下

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    # Access mode - proxy (server in the UI) or direct (browser in the UI).
    access: proxy
    url: http://prometheus 的 IP:9090
    #url: http://prometheus:9091
    jsonData:
      httpMethod: POST
      exemplarTraceIdDestinations:
        # Field with internal link pointing to data source in Grafana.
        # datasourceUid 的值可以是任意的值，但是需要是全局唯一。并且这个值是 dashboards 中使用到的.
        - datasourceUid: PBFA97CFB590B2093
          name: traceID

        # Field with external link.
        - name: traceID
          url: 'http://localhost:3000/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22Jaeger%22,%7B%22query%22:%22$$__value.raw%22%7D%5D'

使用配置文件方式配置 Dashboard

配置文件路径:
/etc/grafana/provisioning/dashboards/
将从官方下载好的 json 文件中的 annotations

任意文件名.json

  "annotations": 
    "list": [
      
       ...
        "datasource": "-- Grafana --",

修改为

"annotations": 
    "list": [
      
        ...
        "datasource": 
          "type": "datasource",
          "uid": "grafana"
        ,

sed -ri 's/"-- Grafana --",/\\n          "type": "datasource",\\n          "uid": "grafana"\\n        ,doc	/g' nodeExporter.json

再将所有的

$DS_TEST-PROMETHEUS

修改为

在 Datasource 中配置的 uid，这里假设是 PBFA97CFB590B2093

或者将 “$DS_PROMETHEUS”,

修改为


        "type": "prometheus",
        "uid": "PBFA97CFB590B2093"
      ,

sed -i  's#"\\$DS_PROMETHEUS",#\\n          "type": "prometheus",\\n          "uid": "PBFA97CFB590B2093"\\n        ,#gp' blackbox.json

使用文件配置创建和管理警报资源

官方文档 https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/

告警规则

/etc/grafana/provisioning/alerting/

在此目录下创建 yaml 文件，示例文件如下

# config file version
apiVersion: 1

# List of rule groups to import or update
groups:
  # <int> organization ID, default = 1
  - orgId: 1
    # <string, required> name of the rule group
    name: my_rule_group
    # <string, required> name of the folder the rule group will be stored in
    folder: my_first_folder
    # <duration, required> interval that the rule group should evaluated at
    interval: 60s
    # <list, required> list of rules that are part of the rule group
    rules:
      # <string, required> unique identifier for the rule
      - uid: my_id_1
        # <string, required> title of the rule that will be displayed in the UI
        title: my_first_rule
        # <string, required> which query should be used for the condition
        condition: A
        # <list, required> list of query objects that should be executed on each
        #                  evaluation - should be obtained trough the API
        data:
          - refId: A
            datasourceUid: '-100'
            model:
              conditions:
                - evaluator:
                    params:
                      - 3
                    type: gt
                  operator:
                    type: and
                  query:
                    params:
                      - A
                  reducer:
                    type: last
                  type: query
              datasource:
                type: __expr__
                uid: '-100'
              expression: 1==0
              intervalMs: 1000
              maxDataPoints: 43200
              refId: A
              type: math
        # <string> UID of a dashboard that the alert rule should be linked to
        dashboardUid: my_dashboard
        # <int> ID of the panel that the alert rule should be linked to
        panelId: 123
        # <string> the state the alert rule will have when no data is returned
        #          possible values: "NoData", "Alerting", "OK", default = NoData
        noDataState: Alerting
        # <string> the state the alert rule will have when the query execution
        #          failed - possible values: "Error", "Alerting", "OK"
        #          default = Alerting
        # <duration, required> for how long should the alert fire before alerting
        for: 60s
        # <map<string, string>> a map of strings to pass around any data
        annotations:
          some_key: some_value
        # <map<string, string> a map of strings that can be used to filter and
        #                      route alerts
        labels:
          team: sre_team_1

告警通道钉钉

/etc/grafana/provisioning/alerting/

dingding.yml

# config file version
apiVersion: 1

# List of contact points to import or update
contactPoints:
  # <int> organization ID, default = 1
  - orgId: 1
    # <string, required> name of the contact point
    name: dingding
    receivers:
      # <string, required> unique identifier for the receiver
      - uid: dingding
        type: dingding
        settings:
          # <string, required>
          url: https://oapi.dingtalk.com/robot/send?access_token=xxx
          # <string> options: link, actionCard
          # msgType: link
          msgType: actionCard
          # <string>
          message: |
             template "default.message" .

通知策略

/etc/grafana/provisioning/notifiers/

notifiers.yml

# config file version
apiVersion: 1

# List of notification policies
policies:
  # <int> organization ID, default = 1
  - orgId: 1
    # <string> name of the contact point that should be used for this route
    receiver: dingding
    # <list> The labels by which incoming alerts are grouped together. For example,
    #        multiple alerts coming in for cluster=A and alertname=LatencyHigh would
    #        be batched into a single group.
    #
    #        To aggregate by all possible labels use the special value '...' as
    #        the sole label name, for example:
    #        group_by: ['...']
    #        This effectively disables aggregation entirely, passing through all
    #        alerts as-is. This is unlikely to be what you want, unless you have
    #        a very low alert volume or your upstream notification system performs
    #        its own grouping.
    group_by: ['...']
    # <list> a list of matchers that an alert has to fulfill to match the node
    matchers:
      - alertname = Watchdog
      - severity =~ "warning|critical"
    # <list> Times when the route should be muted. These must match the name of a
    #        mute time interval.
    #        Additionally, the root node cannot have any mute times.
    #        When a route is muted it will not send any notifications, but
    #        otherwise acts normally (including ending the route-matching process
    #        if the `continue` option is not set)
    mute_time_intervals:
      - abc
    # <duration> How long to initially wait to send a notification for a group
    #            of alerts. Allows to collect more initial alerts for the same group.
    #            (Usually ~0s to few minutes), default = 30s
    group_wait: 30s
    # <duration> How long to wait before sending a notification about new alerts that
    #            are added to a group of alerts for which an initial notification has
    #            already been sent. (Usually ~5m or more), default = 5m
    group_interval: 5m
    # <duration>  How long to wait before sending a notification again if it has already
    #             been sent successfully for an alert. (Usually ~3h or more), default = 4h
    repeat_interval: 4h
    # <list> Zero or more child routes
    # routes:
    # ...

以上是关于PrometheusGranfana集成的主要内容，如果未能解决你的问题，请参考以下文章

PrometheusGranfana 通过文件配置实现自动化