无法在 Prometheus Operator 中获取 Spring Boot 应用程序的指标

Posted

技术标签:

【中文标题】无法在 Prometheus Operator 中获取 Spring Boot 应用程序的指标【英文标题】:Unable the get the metrics of spring boot application in Prometheus Operator 【发布时间】:2021-06-23 00:29:54 【问题描述】:

我正在尝试从我的 prometheus 运算符中的 spring boot 应用程序中获取指标: eks:版本。 1.18 kube-prometheus 堆栈: 版本:12.12.1 应用版本:0.44.0

我检查了一下,应用程序确实通过端点提取了指标:

http://myloadbalancer/internal-gateway/actuator/prometheus

# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage 0.013852972596312008
# HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
# TYPE process_cpu_usage gauge
process_cpu_usage 0.0
# HELP jvm_gc_pause_seconds Time spent in GC pause
# TYPE jvm_gc_pause_seconds summary
jvm_gc_pause_seconds_countaction="end of major GC",cause="Allocation Failure", 4.0
jvm_gc_pause_seconds_sumaction="end of major GC",cause="Allocation Failure", 0.922
jvm_gc_pause_seconds_countaction="end of minor GC",cause="Allocation Failure", 235.0
jvm_gc_pause_seconds_sumaction="end of minor GC",cause="Allocation Failure", 2.584
# HELP jvm_gc_pause_seconds_max Time spent in GC pause
# TYPE jvm_gc_pause_seconds_max gauge
jvm_gc_pause_seconds_maxaction="end of major GC",cause="Allocation Failure", 0.0
jvm_gc_pause_seconds_maxaction="end of minor GC",cause="Allocation Failure", 0.0
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
# TYPE jvm_gc_memory_allocated_bytes_total counter
jvm_gc_memory_allocated_bytes_total 8.888016704E9
# HELP tomcat_sessions_active_current_sessions  
# TYPE tomcat_sessions_active_current_sessions gauge
tomcat_sessions_active_current_sessions 0.0
# HELP tomcat_sessions_alive_max_seconds  
# TYPE tomcat_sessions_alive_max_seconds gauge
tomcat_sessions_alive_max_seconds 0.0
# HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
# TYPE jvm_gc_memory_promoted_bytes_total counter
jvm_gc_memory_promoted_bytes_total 1.13497864E8
# HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
# TYPE jvm_buffer_memory_used_bytes gauge
jvm_buffer_memory_used_bytesid="mapped", 0.0
jvm_buffer_memory_used_bytesid="direct", 509649.0
# HELP system_cpu_count The number of processors available to the Java virtual machine
# TYPE system_cpu_count gauge
system_cpu_count 1.0
# HELP tomcat_sessions_created_sessions_total  
# TYPE tomcat_sessions_created_sessions_total counter
tomcat_sessions_created_sessions_total 0.0
# HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC
# TYPE jvm_gc_live_data_size_bytes gauge
jvm_gc_live_data_size_bytes 8.5375192E7
# HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
# TYPE jvm_classes_unloaded_classes_total counter
jvm_classes_unloaded_classes_total 199.0
# HELP tomcat_sessions_active_max_sessions  
# TYPE tomcat_sessions_active_max_sessions gauge
tomcat_sessions_active_max_sessions 0.0
# HELP process_files_open_files The open file descriptor count
# TYPE process_files_open_files gauge
process_files_open_files 66.0
# HELP logback_events_total Number of error level events that made it to the logs
# TYPE logback_events_total counter
logback_events_totallevel="warn", 2.0
logback_events_totallevel="debug", 0.0
logback_events_totallevel="error", 0.0
logback_events_totallevel="trace", 0.0
logback_events_totallevel="info", 443.0
# HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool
# TYPE jvm_gc_max_data_size_bytes gauge
jvm_gc_max_data_size_bytes 5.36870912E8
# HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
# TYPE jvm_buffer_count_buffers gauge
jvm_buffer_count_buffersid="mapped", 0.0
jvm_buffer_count_buffersid="direct", 18.0
# HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
# TYPE jvm_buffer_total_capacity_bytes gauge
jvm_buffer_total_capacity_bytesid="mapped", 0.0
jvm_buffer_total_capacity_bytesid="direct", 509649.0
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytesarea="heap",id="Tenured Gen", 1.4229504E8
jvm_memory_committed_bytesarea="nonheap",id="CodeHeap 'profiled nmethods'", 2.9229056E7
jvm_memory_committed_bytesarea="heap",id="Eden Space", 5.7081856E7
jvm_memory_committed_bytesarea="nonheap",id="Metaspace", 1.01359616E8
jvm_memory_committed_bytesarea="nonheap",id="CodeHeap 'non-nmethods'", 2555904.0
jvm_memory_committed_bytesarea="heap",id="Survivor Space", 7077888.0
jvm_memory_committed_bytesarea="nonheap",id="Compressed Class Space", 1.31072E7
jvm_memory_committed_bytesarea="nonheap",id="CodeHeap 'non-profiled nmethods'", 1.1599872E7
# HELP spring_kafka_listener_seconds_max Kafka Listener Timer
# TYPE spring_kafka_listener_seconds_max gauge
spring_kafka_listener_seconds_maxexception="ListenerExecutionFailedException",name="fgMessageConsumer-0",result="failure", 0.0
spring_kafka_listener_seconds_maxexception="none",name="fgMessageConsumer-0",result="success", 0.0
# HELP spring_kafka_listener_seconds Kafka Listener Timer
# TYPE spring_kafka_listener_seconds summary
spring_kafka_listener_seconds_countexception="ListenerExecutionFailedException",name="fgMessageConsumer-0",result="failure", 0.0
spring_kafka_listener_seconds_sumexception="ListenerExecutionFailedException",name="fgMessageConsumer-0",result="failure", 0.0
spring_kafka_listener_seconds_countexception="none",name="fgMessageConsumer-0",result="success", 9.0
spring_kafka_listener_seconds_sumexception="none",name="fgMessageConsumer-0",result="success", 16.017111464
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytesarea="heap",id="Tenured Gen", 5.36870912E8
jvm_memory_max_bytesarea="nonheap",id="CodeHeap 'profiled nmethods'", 1.22912768E8
jvm_memory_max_bytesarea="heap",id="Eden Space", 2.14827008E8
jvm_memory_max_bytesarea="nonheap",id="Metaspace", -1.0
jvm_memory_max_bytesarea="nonheap",id="CodeHeap 'non-nmethods'", 5828608.0
jvm_memory_max_bytesarea="heap",id="Survivor Space", 2.6804224E7
jvm_memory_max_bytesarea="nonheap",id="Compressed Class Space", 1.073741824E9
jvm_memory_max_bytesarea="nonheap",id="CodeHeap 'non-profiled nmethods'", 1.22916864E8
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytesarea="heap",id="Tenured Gen", 8.6654784E7
jvm_memory_used_bytesarea="nonheap",id="CodeHeap 'profiled nmethods'", 2.382144E7
jvm_memory_used_bytesarea="heap",id="Eden Space", 7444976.0
jvm_memory_used_bytesarea="nonheap",id="Metaspace", 9.7431448E7
jvm_memory_used_bytesarea="nonheap",id="CodeHeap 'non-nmethods'", 1346432.0
jvm_memory_used_bytesarea="heap",id="Survivor Space", 571600.0
jvm_memory_used_bytesarea="nonheap",id="Compressed Class Space", 1.1687056E7
jvm_memory_used_bytesarea="nonheap",id="CodeHeap 'non-profiled nmethods'", 1.1500544E7
# HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
# TYPE jvm_classes_loaded_classes gauge
jvm_classes_loaded_classes 16917.0
# HELP tomcat_sessions_rejected_sessions_total  
# TYPE tomcat_sessions_rejected_sessions_total counter
tomcat_sessions_rejected_sessions_total 0.0
# HELP process_start_time_seconds Start time of the process since unix epoch.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.616689221264E9
# HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
# TYPE jvm_threads_peak_threads gauge
jvm_threads_peak_threads 37.0
# HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads
# TYPE jvm_threads_live_threads gauge
jvm_threads_live_threads 36.0
# HELP system_load_average_1m The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
# TYPE system_load_average_1m gauge
system_load_average_1m 0.0
# HELP jvm_threads_daemon_threads The current number of live daemon threads
# TYPE jvm_threads_daemon_threads gauge
jvm_threads_daemon_threads 30.0
# HELP tomcat_sessions_expired_sessions_total  
# TYPE tomcat_sessions_expired_sessions_total counter
tomcat_sessions_expired_sessions_total 0.0
# HELP jvm_threads_states_threads The current number of threads having NEW state
# TYPE jvm_threads_states_threads gauge
jvm_threads_states_threadsstate="runnable", 10.0
jvm_threads_states_threadsstate="blocked", 0.0
jvm_threads_states_threadsstate="waiting", 17.0
jvm_threads_states_threadsstate="timed-waiting", 9.0
jvm_threads_states_threadsstate="new", 0.0
jvm_threads_states_threadsstate="terminated", 0.0
# HELP process_uptime_seconds The uptime of the Java virtual machine
# TYPE process_uptime_seconds gauge
process_uptime_seconds 45380.981
# HELP http_server_requests_seconds  
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_countexception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/health", 6032.0
http_server_requests_seconds_sumexception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/health", 5.492759869
# HELP http_server_requests_seconds_max  
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_maxexception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/health", 7.97605E-4
# HELP process_files_max_files The maximum file descriptor count
# TYPE process_files_max_files gauge
process_files_max_files 1048576.0

所以这一切都很好。

这是我的 ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: internal-gateway-service-monitor
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: internal-gateway
  endpoints:
  - port: http
    path: '/actuator/prometheus'
    interval: 10s
    honorLabels: true

这是我的服务:

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: perf4-backend
    meta.helm.sh/release-namespace: perf4
  creationTimestamp: "2021-03-23T13:00:47Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: 
          f:meta.helm.sh/release-name: 
          f:meta.helm.sh/release-namespace: 
        f:labels:
          .: 
          f:app.kubernetes.io/managed-by: 
      f:spec:
        f:externalTrafficPolicy: 
        f:ports:
          .: 
          k:"port":80,"protocol":"TCP":
            .: 
            f:name: 
            f:port: 
            f:protocol: 
            f:targetPort: 
        f:selector:
          .: 
          f:app: 
        f:sessionAffinity: 
        f:type: 
    manager: Go-http-client
    operation: Update
    time: "2021-03-23T13:00:47Z"
  name: internal-gateway
  namespace: perf4
  resourceVersion: "18659"
  selfLink: /api/v1/namespaces/perf4/services/internal-gateway
  uid: 75f89f23-d76e-4701-80f9-a029ce0f1153
spec:
  clusterIP: 172.20.105.66
  externalTrafficPolicy: Cluster
  ports:
  - name: http
    nodePort: 31500
    port: 80
    protocol: TCP
    targetPort: 8070
  selector:
    app: internal-gateway
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: 

这是我的 pod yaml: (删除了不必要的字段)

apiVersion: v1
kind: Pod
metadata:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    kubernetes.io/psp: eks.privileged
  generateName: fg-internal-gateway-deployment-76cd98ccd8-
  labels:
    app: internal-gateway
    pod-template-hash: 76cd98ccd8
    version: "92095"
  
  name: fg-internal-gateway-deployment-76cd98ccd8-ksmgt
  namespace: perf4
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: fg-internal-gateway-deployment-76cd98ccd8
    uid: 69301225-d013-47e4-a126-b525f39ce608
  resourceVersion: "801092"
  selfLink: /api/v1/namespaces/perf4/pods/fg-internal-gateway-deployment-76cd98ccd8-ksmgt
  uid: 5fedee50-b572-4949-8055-9e58a7053b6a
    image: 
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8070
        scheme: HTTP
      initialDelaySeconds: 140
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 1
    name: internal-gateway
    ports:
    - containerPort: 8070
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8070
        scheme: HTTP
      initialDelaySeconds: 140
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "1"
        memory: 3Gi
      requests:
        cpu: "1"
        memory: 3Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-vcnjm
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: 
  nodeSelector:
    role: fgworkers
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: 
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: gated
    operator: Equal
    value: "true"
  - key: preprod
    operator: Equal
    value: "true"
  - key: staging
    operator: Equal
    value: "true"
  - key: fgworkers
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-vcnjm
    secret:
      defaultMode: 420
      secretName: default-token-vcnjm
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-03-25T14:42:35Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-03-25T14:45:14Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-03-25T14:45:14Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-03-25T14:42:35Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: 
    image: 
    imageID: 
    lastState: 
    name: internal-gateway
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-03-25T14:42:41Z"
  hostIP: 
  phase: Running
  podIP:
  podIPs:
  - ip: 
  qosClass: Guaranteed
  startTime: "2021-03-25T14:42:35Z"

我使用了标签 app: internal-gateway 与我的 pod 规格相同。

这就是我在普罗米修斯中得到的:

可能是什么问题?

【问题讨论】:

请删除图片并替换为文字。将文本放在反引号 ` 之间,并确保它在其中左对齐,否则会看起来很糟糕。 好的,谢谢,我会这样做的 好多了,谢谢。我已经在最后粘贴了图片——我猜你还没有足够的权限来做这个。 【参考方案1】:

问题是servicemonitor找不到你的服务

问题是您在 servicemonitor 定义中的选择器没有选择服务的标签

解决方案: 将服务定义的标签更改为与您的 servicemonitor 的 matchLabeles 定义相同 像这样:

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: perf4-backend
    meta.helm.sh/release-namespace: perf4
  creationTimestamp: "2021-03-23T13:00:47Z"
  labels:
    app: internal-gateway

【讨论】:

【参考方案2】:

确保检查您在 service 和 serviceMonitor 中定义的 端口名称 是否相同,我也遇到了同样的问题,所以我使用了相同的名称,并且它开始以正确的应用标签显示 p>

【讨论】:

以上是关于无法在 Prometheus Operator 中获取 Spring Boot 应用程序的指标的主要内容,如果未能解决你的问题,请参考以下文章

stable/prometheus-operator 持久 grafana 组织名称

prometheus operator - 启用对所有命名空间中所有内容的监控

prometheus-operator 配置企业微信报警

如何在使用 grafana 和 prometheus-operator 时配置电子邮件警报

Operator部署Prometheus

聊聊 Prometheus Operator