关于Prometheus在K8S中的部署方案如何选择，以及分享手工部署的YAML

Posted 2023-05-26 ttropsstack

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了关于Prometheus在K8S中的部署方案如何选择，以及分享手工部署的YAML相关的知识，希望对你有一定的参考价值。

关于Prometheus部署方案的选择

在以往的分享中，有分享过使用Prometheus Operator来管理Prometheus。但，在此同时，又抛出了个问题：是手工将Prometheus部署到 Kubernetes 比较好还是使用Prometheus Operator来部署比较好？

对于技术的选型，往往是没有规定死是要用哪一项技术的，而是需要结合业务的需求、运维场景、自身对某项技术的掌握程度、以及其它更多的考量因素来共同决定的：

如果对 Kubernetes 中的 Prometheus 的自动化部署、管理和配置不是很熟悉，或者需要部署 Prometheus 集群和实现高可用性，那么使用 Prometheus Operator 是更好的选择。
Prometheus Operator 提供了简化 Prometheus 在 Kubernetes 中部署的功能，可以自动处理很多繁琐的任务，如自动部署 Prometheus 和 Alertmanager、自动创建监控目标和规则等。这样可以显著降低部署和维护 Prometheus 的难度和工作量，并增强 Prometheus 在 Kubernetes 中的可靠性和可用性。
如果有丰富的 Kubernetes 和 Prometheus 的经验，并且需要更加个性化的定制和控制，那么手工将 Prometheus 部署到 Kubernetes 中也是一个不错的选择。
手工部署虽然相对更复杂，但是也可以充分发挥 Kubernetes 的灵活性和可定制性，例如自定义 Kubernetes Service 和 Endpoints、更加细致的管理数据存储和备份等。这样可以满足更加个性化和定制化的需求，同时增加对 Prometheus 系统的深度理解和掌握。

所以，选择手工部署还是 Prometheus Operator，应该基于具体场景和需求进行综合考虑，以便更好地满足业务和运维的要求。

分享手工将Prometheus部署到K8S（供参考）

下面分享手工将Prometheus部署到 Kubernetes 的yaml，关于使用Prometheus Operator部署可参考我之前的分享或者参考官方文档即可。

提示：本案例中使用Prometheus的数据目录所在的后端存储是rook-ceph，可将其修改为您已有的后端存储，如原生的ceph、nfs等等。

apiVersion: v1
kind: Namespace
metadata:
  name: monitor
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitor
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    alerting:
      alertmanagers:
        - static_configs:
            - targets:
              # - alertmanager:9093
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    scrape_configs:
      - job_name: "prometheus"
        static_configs:
          - targets: ["localhost:9090"]
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-pvc
  namespace: monitor
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: rook-ceph-block
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus
  name: prometheus
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      initContainers:
      - name: "change-prometheus-data-dir-perm"
        image: busybox
        command: ["/bin/sh"]
        args: ["-c", "chown -R 65534:65534 /prometheus"]
        securityContext:
          privileged: true
        volumeMounts:
          - name: prometheus-storage
            mountPath: /prometheus
      containers:
      - image: prom/prometheus:latest
        name: prometheus
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - name: prometheus-storage
          mountPath: /prometheus
        - name: prometheus-config
          mountPath: /etc/prometheus
          readOnly: true
      volumes:
        - name: prometheus-config
          configMap: 
            name: prometheus-config
        - name: prometheus-storage
          persistentVolumeClaim:
            claimName: prometheus-pvc
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: prometheus
  name: prometheus
  namespace: monitor
spec:
  ports:
  - name: http-port
    nodePort: 30090
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: prometheus
  type: NodePort

注意：在上面的yaml中，initContainers 的作用是确保 /prometheus 目录以及其子目录的权限正确，因为 Prometheus 进程通常需要以非特权用户运行。同时，由于该 initContainers 是以特权模式运行的，因此可以确保 Prometheus 容器能够以正确的方式访问挂载的卷，而不会因为权限问题导致运行异常。

本文转载于WX公众号：不背锅运维（喜欢的盆友关注我们）：https://mp.weixin.qq.com/s/JlCgx1mkHqcF2e_ZqkafEw

用 Django 和 Kubernetes 部署 prometheus，如何让它抓取 Django 应用程序？

【中文标题】用 Django 和 Kubernetes 部署 prometheus，如何让它抓取 Django 应用程序？【英文标题】：Deployed prometheus with Django and Kubernetes, how to make it scrape the Django app? 【发布时间】：2021-12-26 17:01:00 【问题描述】：

我在 Kubernetes 中部署了一个 Django 项目，我正在尝试将 Prometheus 部署为监控工具。我已成功完成在项目中包含 django_prometheus 所需的所有步骤，并且在本地我可以转到 localhost:9090 并尝试查询指标。

我还将 Prometheus 部署到我的 Kubernetes 集群中，在 Prometheus pod 上运行 kubectl port-forward ... 后，我可以看到我的 Kubernetes 资源的一些指标。

我有点困惑的是如何使部署的 Django 应用程序指标像其他人一样在 Prometheus 仪表板上可用。我将我的应用程序部署在 default 命名空间中，并将 prometheus 部署在 monitoring 专用命名空间中。我想知道我在这里错过了什么。我是否需要根据工作人员的数量或类似的东西，将服务和部署上的端口从 8000 公开到 8005？

我的 Django 应用使用 gunicorn 运行，使用 supervisord，如下所示：

[program:gunicorn]
command=gunicorn --reload --timeout 200000 --workers=5 --limit-request-line 0 --limit-request-fields 32768 --limit-request-field_size 0 --chdir /code/ my_app.wsgi

my_app服务：

apiVersion: v1
kind: Service
metadata:
  name: my_app
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: my-app
  sessionAffinity: None
  type: ClusterIP

deployment.yaml 的精简版

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-app
  name: my-app-deployment
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: my-app
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - image: ...
        imagePullPolicy: IfNotPresent
        name: my-app
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: regcred
      restartPolicy: Always
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30

prometheus configmap

apiVersion: v1
data:
  prometheus.rules: |-
    ... some rules
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
    rule_files:
      - /etc/prometheus/prometheus.rules
    scrape_configs:
      - job_name: prometheus
        static_configs:
        - targets:
          - localhost:9090

      - job_name: my-app
        metrics_path: /metrics
        static_configs:
          - targets:
            - localhost:8000

      - job_name: 'node-exporter'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_endpoints_name]
          regex: 'node-exporter'
          action: keep

kind: ConfigMap
metadata:
  labels:
    name: prometheus-config
  name: prometheus-config
  namespace: monitoring

【问题讨论】：

提供的任何解决方案对您有帮助吗？我没有更改以正确应用这两个建议。当我这样做时，我会发布更新。 【参考方案1】：

如果 promehteus 与您的应用安装在同一个集群上，则您不必公开服务。您可以按照规则使用 Kubernetes DNS 解析在命名空间之间与应用程序通信：

SERVICENAME.NAMESPACE.svc.cluster.local

所以一种方法是将您的 prometheus 工作目标更改为类似的内容

  - job_name: speedtest-ookla
    metrics_path: /metrics
    static_configs:
      - targets:
          - 'my_app.default.svc.cluster.local:9000'

这是“手动”方式。更好的方法是使用 prometheus kubernetes_sd_config。它会自动发现您的服务并尝试抓取它们。

参考：https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

【讨论】：

我尝试了这个解决方案，但在 prometheus 仪表板Get "http://my_app.default.svc.cluster.local:8000/metrics": context deadline exceeded 中仍然出现错误。不过它适用于普罗米修斯。 @everspader 将端口更改为 80，而不是 8000。您的 Deployment 正在侦听 80 端口，与 Service 相同。另请查看@Marco 的答案。它会自动抓取您的部署和服务。【参考方案2】：

无需将应用程序暴露在集群之外。

利用 Kubernetes 服务发现，将作业添加到抓取服务、Pod 或两者：

- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: namespace
    regex: (.+)
  - regex: __meta_kubernetes_service_label_(.+)
    action: labelmap
  - regex: 'app_kubernetes_io_(.+)'
    action: labeldrop
  - regex: 'helm_sh_(.+)'
    action: labeldrop

- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: namespace
    regex: (.+)
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: host
    regex: (.+)
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: pod
    regex: (.+)
  - regex: __meta_kubernetes_pod_label_(.+)
    action: labelmap
  - regex: 'app_kubernetes_io_(.+)'
    action: labeldrop
  - regex: 'helm_sh_(.+)'
    action: labeldrop

然后，使用以下内容注释服务：

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "80"
    prometheus.io/path: "/metrics"

和部署：

spec:
  template:
    metadata:
     annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "80"
      prometheus.io/path: "/metrics"

【讨论】：

以上是关于关于Prometheus在K8S中的部署方案如何选择，以及分享手工部署的YAML的主要内容，如果未能解决你的问题，请参考以下文章