无法将 K8s 服务添加为 prometheus 目标
Posted
技术标签:
【中文标题】无法将 K8s 服务添加为 prometheus 目标【英文标题】:Unable to add a K8s service as prometheus target 【发布时间】:2021-09-07 21:14:46 【问题描述】:我希望我的 prometheus 服务器从 pod 中抓取指标。
我按照以下步骤操作:
-
使用部署创建了一个 pod -
kubectl apply -f sample-app.deploy.yaml
使用kubectl apply -f sample-app.service.yaml
暴露相同
使用helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
部署的 Prometheus 服务器
使用kubectl apply -f service-monitor.yaml
创建了一个serviceMonitor 来为prometheus 添加一个目标。
所有 pod 都在运行,但是当我打开 prometheus 仪表板时,我没有看到 sample-app service 作为 prometheus 目标,位于仪表板 UI 中的 status>targets 下。
我已验证以下内容:
-
当我执行
kubectl get servicemonitors
时,我可以看到sample-app
我可以看到 sample-app 在/metrics
下以 prometheus 格式公开指标
此时我进一步调试,使用进入prometheus pod
kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh
,并看到 proemetheus 配置 (/etc/config/prometheus.yml) 没有将 sample-app 作为作业之一,所以我编辑了 configmap 使用
kubectl edit cm prometheus-server -o yaml
已添加
- job_name: sample-app
static_configs:
- targets:
- sample-app:8080
假设所有其他字段,例如 scraping 间隔,scrape_timeout 保持默认值。
我可以看到 /etc/config/prometheus.yml 中反映了同样的情况,但 prometheus 仪表板仍然没有将 sample-app
显示为 status>targets 下的目标。
以下是 prometheus-server 和服务监视器的 yaml。
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
autopilot.gke.io/resource-adjustment: '"input":"containers":["name":"prometheus-server-configmap-reload","name":"prometheus-server"],"output":"containers":["limits":"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi","requests":"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi","name":"prometheus-server-configmap-reload","limits":"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi","requests":"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi","name":"prometheus-server"],"modified":true'
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: prom
creationTimestamp: "2021-06-24T10:42:31Z"
generation: 1
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
name: prometheus-server
namespace: prom
resourceVersion: "6983855"
selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
uid: <some-uid>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.5.0
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 10
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 4
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-06-24T10:43:25Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-24T10:42:31Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: ReplicaSet "prometheus-server-65b759cb95" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
服务监视器的 yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":"annotations":,"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":"app":"sample-app","release":"prometheus","name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c","spec":"endpoints":["port":"http"],"selector":"matchLabels":"app":"sample-app","release":"prometheus"
creationTimestamp: "2021-06-24T07:55:58Z"
generation: 2
labels:
app: sample-app
release: prometheus
name: sample-app
namespace: prom
resourceVersion: "6904642"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
uid: <some-uid>
spec:
endpoints:
- port: http
selector:
matchLabels:
app: sample-app
release: prometheus
【问题讨论】:
您是否尝试过端口转发您的示例应用程序并获取需要被 prometheus 抓取的 /metrics 端点?您的 /metrics 端点是否可用且正常工作? 是的。 pod 正在以 prometheus 格式向 /metrics 端点发送指标。使用端口转发验证 您的服务有端点吗?尝试 kubectl 获取端点并检查输出 @meaningqo 是的服务有端点。我可以curl --request GET --url 'http://my_endpoint_ip:8080/metrics'
如果你运行的是prometheus operator service monitor,你不需要手动编辑config map
【参考方案1】:
您需要使用包含 Prometheus 运算符的 prometheus-community/kube-prometheus-stack
图表,以便根据 ServiceMonitor 资源自动更新 Prometheus 的配置。
您使用的prometheus-community/prometheus
图表不包括在 Kubernetes API 中监视 ServiceMonitor 资源并相应更新 Prometheus 服务器的 ConfigMap 的 Prometheus 运算符。
您的集群中似乎安装了必要的 CustomResourceDefinitions (CRD),否则您将无法创建 ServiceMonitor 资源。这些未包含在 prometheus-community/prometheus
图表中,因此它们之前可能已添加到您的集群中。
【讨论】:
我在 GKE 自动驾驶仪集群上运行这些工作负载,在部署prometheus-community/kube-prometheus-stack
时出现“mutatingwebhookconfigurations access denied”错误。看起来那是 GKE 自动驾驶的limitation。让我试试标准集群。
我在标准集群上尝试了你提供的建议,它有效。以上是关于无法将 K8s 服务添加为 prometheus 目标的主要内容,如果未能解决你的问题,请参考以下文章
k8s上搭建loki日志服务并通过prometheus进行错误日志告警
一段时间内 K8s 服务的正常运行时间 - Prometheus?
使用 Prometheus 在 K8s 服务端点上测量 40 倍和 50 倍的错误?