将 Prometheus 运算符与 k8s 的 DB 卷一起使用

Posted 2023-02-15

技术标签:

【中文标题】将 Prometheus 运算符与 k8s 的 DB 卷一起使用【英文标题】：Use Prometheus operator with DB volume for k8s 【发布时间】：2019-08-02 01:11:07 【问题描述】：

我们正在尝试使用 Grafana 和 Prometheus Operator 监控 K8S。大多数指标都按预期工作，我能够看到具有正确值的仪表板，我们的系统包含 10 个节点，总共 500 个 pod。现在，当我重新启动 Prometheus 时，所有数据都已删除。我希望它保存两周。

我的问题是，如何定义 Prometheus 卷以将数据保留两周或 100GB 数据库。

我发现了以下内容（我们使用Prometheus 运算符）：

https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/storage.md

这是 Prometheus Operator 的配置

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus-operator
  name: prometheus-operator
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: prometheus-operator
  template:
    metadata:
      labels:
        k8s-app: prometheus-operator
    spec:
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --logtostderr=true
        - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0
        image: quay.io/coreos/prometheus-operator:v0.29.0
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http

这是 Prometheus 的配置

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      name: prometheus
      namespace: monitoring
      labels: 
        prometheus: prometheus
    spec:
      replica: 2
      serviceAccountName: prometheus
      serviceMonitorNamespaceSelector: 
      serviceMonitorSelector:
        matchLabels:
          role: observeable
      tolerations:
      - key: "WorkGroup"
        operator: "Equal"
        value: "operator"
        effect: "NoSchedule"
      - key: "WorkGroup"
        operator: "Equal"
        value: "operator"
        effect: "NoExecute"
      resources:
        limits:
          cpu: 8000m
          memory: 24000Mi
        requests:
          cpu: 6000m
          memory: 6000Mi
     storage:
       volumeClaimTemplate:
         spec:
        selector:
          matchLabels:
            app: prometheus
        resources:
          requests:
            storage: 100Gi

https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/storage.md

我们有文件系统（nfs），上面的存储配置不起作用，我的问题是：

nfs

volume

server

path

/path/to/prom/db

我们在系统中配置了 NFS。

如何与 Prometheus 结合？

由于我对pvc 和pv 没有深入了解，我创建了以下内容（不确定这些值，我的服务器是什么以及我应该提供什么路径）...

server: myServer
path: "/path/to/prom/db"

我应该放什么以及如何使 我的 Prometheus（即我在问题中提供的配置）来使用它？

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
    prometheus: prometheus
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce # required
  nfs:
    server: myServer
    path: "/path/to/prom/db"

除了nfs 之外，我是否可以将任何其他持久性卷用于我的用例？请指教如何。

【问题讨论】：

查询是否直接从 Prometheus 工作？我的意思是当你直接从 Prometheus UI 查询时。您还启用了审核日志记录吗？如果是，您能否看到 API 请求是否从 prometheus 服务帐户/用户发送到 API 服务器？ @JasonStanley - 感谢您的建议，我应该如何在 Prometheus pod=~"^$Pod$") 查询 UI 中使用它？我想运行查询以获取 all pods in the cluster 的数据 ...（所有节点 pod）在 prometheus UI 中，只需运行查询 kube_pod_container_resource_limits_cpu_cores 这应该会返回所有 pod 的一长串指标列表。如果此结果返回，则表示 prometheus 配置正常，需要在 Grafana 上进行一些调整。但是，如果您没有收到对查询的响应，那么问题出在您的 Prometheus 配置上。是的，您的查询只能是kube_pod_container_resource_limits_cpu_cores 【参考方案1】：

我最近开始使用算子图表，

并且设法在不定义 pv 和 pvc 的情况下添加持久性。

在新图表配置中添加持久性比您描述的要容易得多，只需编辑 prometheus.prometheusSpec 下的文件 /helm/vector-chart/prometheus-operator-chart/values.yaml： p>

storageSpec:
  volumeClaimTemplate:
    spec:
      storageClassName: prometheus
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi
    selector:

并添加此 /helm/vector-chart/prometheus-operator-chart/templates/prometheus/storageClass.yaml：

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: prometheus
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Retain
parameters:
  type: gp2
  zones: "ap-southeast-2a, ap-southeast-2b, ap-southeast-2c"
  encrypted: "true"

这将自动为您创建 pv 和一个 pvc，这将在 aws 中创建一个 ebs，将您的所有数据存储在其中。

【讨论】：

这是我一直在寻找的答案，谢谢。虽然我不需要创建存储类。我正在使用默认情况下有 2 个的 AKS。 default|managed-premium 可以使用以下命令查看它们kubectl get storageclass。【参考方案2】：

您必须使用持久化卷和卷声明（PV 和 PVC）来获取持久化数据。可以参考“https://kubernetes.io/docs/concepts/storage/persistent-volumes/”一定要仔细看上面的url中的配置、回收策略、访问方式、存储类型。

【讨论】：

嗯，我知道 :) ，问题是我无法从 Prometheus 中弄清楚，如果你能提供我的上下文示例，那就太好了我通常使用 helm install --name prometheus stable/prometheus 来安装 Prometheus、grafana 使用 helm 默认存储库。这里的选项一是检查整个 helm 图表或运行上面的命令，然后描述图表的所有组件。你一定会得到它。【参考方案3】：

要确定何时删除旧数据，请使用此开关--storage.tsdb.retention

例如--storage.tsdb.retention='7d'（默认情况下，Prometheus 保留数据 15 天）。

要完全删除数据，请使用此 API 调用：

$ curl -X POST -g 'http://<your_host>:9090/api/v1/admin/tsdb/<your_index>'

编辑

Kubernetes sn-p 示例

...
 spec:
      containers:
      - name: prometheus
        image: docker.io/prom/prometheus:v2.0.0
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention=7d'
        ports:
        - name: web
containerPort: 9090
...

【讨论】：

谢谢，我应该把这个参数放在 yaml 的什么地方？你能提供例子吗？如果我不提供音量，它会保存在哪里？谢谢，请看我的更新。我们正在使用 Prometheus 运算符，我将配置文件放在问题中（有两个文件 1 是运算符 2 是 Prometheus crd）我应该如何更新它，因为容器来自运算符而不是直接来自类型 promtheus crd 。 ...你能更新吗？在 spec.template.spec.containers.args 下的操作员配置中 --------- 请阅读 docker 中的持久卷概念。默认情况下，数据将存储在容器中，直到重新启动（因此可能是 5 分钟或数周）。重要的是容器是短暂的（生命周期短）。好的，不确定参数，您能否在答案中更新以明确我应该传递哪个文件，因为目前我在第二个文件（与操作员）中使用 args 而不是在crd, 这有点令人困惑，因为我没有使用 Prometheus docker 配置。我们正在使用操作员配置，如果您可以在答案中更新问题中的第二个文件，那就太好了【参考方案4】：

参考下面的代码。在 configmap 中将 storage-retention 定义为 7d 或所需的保留天数，并将其作为 env 变量加载到容器中，如下所示

      containers:
      - name: prometheus
        image: image: prom/prometheus:latest
        args:
          - '--storage.tsdb.path=/prometheus'
          - '--storage.tsdb.retention=$(STORAGE_RETENTION)'
          - '--web.enable-lifecycle'
          - '--storage.tsdb.no-lockfile'
          - '--config.file=/etc/prometheus/prometheus.yml'
        ports:
        - name: web
          containerPort: 9090
        env:
        - name: STORAGE_RETENTION
          valueFrom:
            configMapKeyRef:
              name: prometheus.cfg
              key: storage-retention

您可能需要在 prometheus 操作员文件中调整这些设置

【讨论】：

谢谢，我正在使用 Prometheus 运算符，请在问题中查看我的文件并在此上下文中提供此示例，因为运算符和 prometheus 之间存在一些差异......，2.我需要@987654322 @因为如果 pod 被杀死，保留期将无济于事...... 我需要定义音量，在问题中我有 Prometheus 和操作员的配置，我想念的是 nfs 配置 server 和 path ...我怎么能添加它们/配置等，就是这样...... 创建 nfs pv 然后用 pvc 绑定它。将 pvc 映射到部署 yaml 中的 prometheus 数据【参考方案5】：

提供有关我刚开始设置 kube-prometheus 操作员并遇到默认设置的存储问题以来收集到的信息。

使用下面的 helm show values 命令创建一个自定义 values.yaml，并使用默认值。

helm show values prometheus-com/kube-prometheus-stack -n monitoring > custom-values.yaml

然后开始更新 prometheus、alertmanager 和 grafana 部分以覆盖默认设置或添加自定义名称等...

关于存储选项，我在documentation 中看到以下定义自定义存储类或 PV/PVC（如果没有默认 SC 或其他原因）。

另外，here 是为所有 3 个 pod 使用 storageclass 的一个很好的例子。

【讨论】：

以上是关于将 Prometheus 运算符与 k8s 的 DB 卷一起使用的主要内容，如果未能解决你的问题，请参考以下文章

Prometheus-PQL

prometheus 与 k8s pv：打开查询日志文件时出错”file=/prometheus/queries.active err="open /prometheus/queries.a

无法将 K8s 服务添加为 prometheus 目标

K8S - 无法通过 - alertmanager 查看警报

Prometheus+Grafan监控k8s集群详解

helm v3 部署prometheus 与 grafana