PrometheusGranfana 通过文件配置实现自动化

Posted shark_西瓜甜

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PrometheusGranfana 通过文件配置实现自动化相关的知识,希望对你有一定的参考价值。

使用配置文件方式配置 Datasource

配置文件目录:
/etc/grafana/provisioning/datasources/

文件 datasource.yml 内容如下

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    # Access mode - proxy (server in the UI) or direct (browser in the UI).
    access: proxy
    url: http://prometheus 的 IP:9090
    #url: http://prometheus:9091
    jsonData:
      httpMethod: POST
      exemplarTraceIdDestinations:
        # Field with internal link pointing to data source in Grafana.
        # datasourceUid 的值可以是任意的值,但是需要是全局唯一。并且这个值是 dashboards 中使用到的.
        - datasourceUid: PBFA97CFB590B2093
          name: traceID

        # Field with external link.
        - name: traceID
          url: 'http://localhost:3000/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22Jaeger%22,%7B%22query%22:%22$$__value.raw%22%7D%5D'

使用配置文件方式配置 Dashboard

配置文件路径:
/etc/grafana/provisioning/dashboards/
将从官方下载好的 json 文件中的 annotations

任意文件名.json

  "annotations": 
    "list": [
      
       ...
        "datasource": "-- Grafana --",

修改为

"annotations": 
    "list": [
      
        ...
        "datasource": 
          "type": "datasource",
          "uid": "grafana"
        ,
sed -ri 's/"-- Grafana --",/\\n          "type": "datasource",\\n          "uid": "grafana"\\n        ,doc	/g' nodeExporter.json

再将所有的

$DS_TEST-PROMETHEUS

修改 为

在 Datasource 中配置的 uid, 这里假设是 PBFA97CFB590B2093

或者将 “$DS_PROMETHEUS”,

修改为


        "type": "prometheus",
        "uid": "PBFA97CFB590B2093"
      ,
sed -i  's#"\\$DS_PROMETHEUS",#\\n          "type": "prometheus",\\n          "uid": "PBFA97CFB590B2093"\\n        ,#gp' blackbox.json

使用文件配置创建和管理警报资源

官方文档 https://grafana.com/docs/grafana/latest/alerting/set-up/provision-alerting-resources/file-provisioning/

告警规则

/etc/grafana/provisioning/alerting/

在此目录下创建 yaml 文件,示例文件如下

# config file version
apiVersion: 1

# 要导入或更新的规则组列表
groups:
  # <int> 组织 ID, default = 1
  - orgId: 1

    # <string, required> 规则组的名称
    name: my_rule_group
    
    # <string, required> 规则组将存储在其中的文件夹的名称
    folder: my_first_folder
    
    # <duration, required> 规则检查的时间间隔
    interval: 60s
    
    # <list, required> 属于规则组的规则列表
    rules:
      # <string, required> 规则的唯一标识符
      - uid: my_id_1

        # <string, required> 将在UI中显示的规则的标题
        title: my_first_rule
        
        # <string, required> 条件应使用哪个查询
        condition: A
        
        # <list, required>应在每次评估中执行的查询对象列表-应通过API获取
        data:
          - refId: A
            # datasourceUid 数据源 ID
            datasourceUid: 'PBFA97CFB590B2093'
            model:
              # 条件
              conditions:
                - evaluator:
                    params:
                      - 3
                    type: gt
                  operator:
                    type: and
                  query:
                    params:
                      - A
                  reducer:
                    type: last
                  type: query
              datasource:
                type: __expr__
                uid: '-100'
              expression: 1==0
              intervalMs: 1000
              maxDataPoints: 43200
              refId: A
              type: math
        # <string> 警报规则应链接到的仪表板的UID
        dashboardUid: my_dashboard
        # <int> 警报规则应链接到的面板的ID
        panelId: 123
        # <string> 未返回数据时警报规则的状态
        #          可以设置的值: "NoData", "Alerting", "OK", default = NoData
        noDataState: Alerting
        # <string> 查询执行失败时警报规则的状态
        #          可以设置的值: "Error", "Alerting", "OK", default = Alerting
        
        # <duration, required> 警报规则被触发后持续多久才发出告警信息
        for: 60s
        
        # <map<string, string>> 描述信息,任意数据的 key: value
        annotations:
          some_key: some_value
        
        # <map<string, string> 可用于筛选和路由警报的字符串映射
        labels:
          team: sre_team_1

告警通道 钉钉

/etc/grafana/provisioning/alerting/

dingding.yml

# config file version
apiVersion: 1

# List of contact points to import or update
contactPoints:
  # <int> organization ID, default = 1
  - orgId: 1
    # <string, required> name of the contact point
    name: dingding
    receivers:
      # <string, required> unique identifier for the receiver
      - uid: dingding
        type: dingding
        settings:
          # <string, required>
          url: https://oapi.dingtalk.com/robot/send?access_token=xxx
          # <string> options: link, actionCard
          # msgType: link
          msgType: actionCard
          # <string>
          message: |
             template "default.message" . 

通知策略

/etc/grafana/provisioning/notifiers/

notifiers.yml

# config file version
apiVersion: 1

# List of notification policies
policies:
  # <int> organization ID, default = 1
  - orgId: 1
    # <string> name of the contact point that should be used for this route
    receiver: dingding
    # <list> The labels by which incoming alerts are grouped together. For example,
    #        multiple alerts coming in for cluster=A and alertname=LatencyHigh would
    #        be batched into a single group.
    #
    #        To aggregate by all possible labels use the special value '...' as
    #        the sole label name, for example:
    #        group_by: ['...']
    #        This effectively disables aggregation entirely, passing through all
    #        alerts as-is. This is unlikely to be what you want, unless you have
    #        a very low alert volume or your upstream notification system performs
    #        its own grouping.
    group_by: ['...']
    # <list> a list of matchers that an alert has to fulfill to match the node
    matchers:
      - alertname = Watchdog
      - severity =~ "warning|critical"
    # <list> Times when the route should be muted. These must match the name of a
    #        mute time interval.
    #        Additionally, the root node cannot have any mute times.
    #        When a route is muted it will not send any notifications, but
    #        otherwise acts normally (including ending the route-matching process
    #        if the `continue` option is not set)
    mute_time_intervals:
      - abc
    # <duration> How long to initially wait to send a notification for a group
    #            of alerts. Allows to collect more initial alerts for the same group.
    #            (Usually ~0s to few minutes), default = 30s
    group_wait: 30s
    # <duration> How long to wait before sending a notification about new alerts that
    #            are added to a group of alerts for which an initial notification has
    #            already been sent. (Usually ~5m or more), default = 5m
    group_interval: 5m
    # <duration>  How long to wait before sending a notification again if it has already
    #             been sent successfully for an alert. (Usually ~3h or more), default = 4h
    repeat_interval: 4h
    # <list> Zero or more child routes
    # routes:
    # ...

配置模板

# config file version
apiVersion: 1

# List of alert rule UIDs that should be deleted
deleteTemplates:
  # <int> organization ID, default = 1
  - orgId: 1
    # <string, required> name of the template, must be unique
    name: my_first_template

配置静音

# config file version
apiVersion: 1

# List of mute time intervals to import or update
muteTimes:
  # <int> organization ID, default = 1
  - orgId: 1
    # <string, required> name of the mute time interval, must be unique
    name: mti_1
    # <list> time intervals that should trigger the muting
    #        refer to https://prometheus.io/docs/alerting/latest/configuration/#time_interval-0
    time_intervals:
      - times:
          - start_time: '06:00'
            end_time: '23:59'
        weekdays: ['monday:wednesday', 'saturday', 'sunday']
        months: ['1:3', 'may:august', 'december']
        years: ['2020:2022', '2030']
        days_of_month: ['1:5', '-3:-1']

以上是关于PrometheusGranfana 通过文件配置实现自动化的主要内容,如果未能解决你的问题,请参考以下文章

docker 搭建prometheusgranfana

关于logback如何通过当前配置文件获取路径

通过现象看本质——Nginx配置文件详解

通过curl向nacos上传配置文件

如何通过 database.properties 文件使数据库配置可配置的 persistence.xml 文件

MAC通过配置文件连接数据库