容器化 Prometheus 无法抓取 JMX-exporter

Posted

技术标签:

【中文标题】容器化 Prometheus 无法抓取 JMX-exporter【英文标题】:Containerized Prometheus fails to scrape JMX-exporter 【发布时间】:2021-08-26 15:24:09 【问题描述】:

我想为我的 Spark 应用程序添加指标,我使用 JMX-exporter 将指标公开给 Prometheus。作为第一步,我希望看到 Prometheus 成功连接到 JMX-exporter 并抓取一些现有的 spark 指标。我跟着this回答,我执行如下命令:

spark-shell --conf "spark.driver.extraJavaOptions=-javaagent:jmx_prometheus_javaagent-0.10.jar=8888:.../spark.yml"

我找到了一个 spark.yml 文件here

当我访问 http://localhost:8888/metrics 时,我看到了很多指标,这是其中的一部分:

# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 57.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 50.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 58.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 60.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 0.018020101
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 0.0
# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_infoversion="11.0.9+11",vendor="Oracle Corporation", 1.0
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_usedarea="heap", 1.83810352E8
jvm_memory_bytes_usedarea="nonheap", 1.324068E8
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committedarea="heap", 5.36870912E8
jvm_memory_bytes_committedarea="nonheap", 1.39730944E8
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_maxarea="heap", 1.073741824E9
jvm_memory_bytes_maxarea="nonheap", -1.0
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_usedpool="CodeHeap 'non-nmethods'", 1330816.0
jvm_memory_pool_bytes_usedpool="Metaspace", 9.090232E7
jvm_memory_pool_bytes_usedpool="CodeHeap 'profiled nmethods'", 2.3704192E7
jvm_memory_pool_bytes_usedpool="Compressed Class Space", 1.1603552E7
jvm_memory_pool_bytes_usedpool="G1 Eden Space", 7.2351744E7
jvm_memory_pool_bytes_usedpool="G1 Old Gen", 9.3632816E7
jvm_memory_pool_bytes_usedpool="G1 Survivor Space", 1.7825792E7
jvm_memory_pool_bytes_usedpool="CodeHeap 'non-profiled nmethods'", 4865920.0
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committedpool="CodeHeap 'non-nmethods'", 2555904.0
jvm_memory_pool_bytes_committedpool="Metaspace", 9.490432E7
jvm_memory_pool_bytes_committedpool="CodeHeap 'profiled nmethods'", 2.3724032E7
jvm_memory_pool_bytes_committedpool="Compressed Class Space", 1.3631488E7
jvm_memory_pool_bytes_committedpool="G1 Eden Space", 2.71581184E8
jvm_memory_pool_bytes_committedpool="G1 Old Gen", 2.47463936E8
jvm_memory_pool_bytes_committedpool="G1 Survivor Space", 1.7825792E7
jvm_memory_pool_bytes_committedpool="CodeHeap 'non-profiled nmethods'", 4915200.0
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_maxpool="CodeHeap 'non-nmethods'", 5836800.0
jvm_memory_pool_bytes_maxpool="Metaspace", -1.0
jvm_memory_pool_bytes_maxpool="CodeHeap 'profiled nmethods'", 1.22908672E8
jvm_memory_pool_bytes_maxpool="Compressed Class Space", 1.073741824E9
jvm_memory_pool_bytes_maxpool="G1 Eden Space", -1.0
jvm_memory_pool_bytes_maxpool="G1 Old Gen", 1.073741824E9
jvm_memory_pool_bytes_maxpool="G1 Survivor Space", -1.0
jvm_memory_pool_bytes_maxpool="CodeHeap 'non-profiled nmethods'", 1.22912768E8
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 10829.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 10829.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 0.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 23.438644
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.623251436259E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 412.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 10240.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_countgc="G1 Young Generation", 10.0
jvm_gc_collection_seconds_sumgc="G1 Young Generation", 0.257
jvm_gc_collection_seconds_countgc="G1 Old Generation", 0.0
jvm_gc_collection_seconds_sumgc="G1 Old Generation", 0.0 

我的 prometheus.yml 包含以下内容:

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
- job_name: prometheus
  static_configs:
  - targets: ['localhost:9090']

- job_name: "spark_streaming_app"
  scrape_interval: "5s"
  static_configs:
  - targets: ['localhost:8888']

当我在 localhost:9090/targets 访问 Prometheus UI 时,我可以看到 prometheus 目标已启动,而 spark_streaming_app 已关闭。在我看来,指标成功暴露并显示在 localhost:8888 但普罗米修斯未能抓取它们。

知道我做错了什么吗?

【问题讨论】:

【参考方案1】:

Prometheus 是“容器化”的,容器的localhost 就是容器本身。因此,Prometheus 无法在 8888 端口上抓取指标。

如果您使用的是 Docker Desktop (MacOS/Windows),请使用 host.docker.internal 而不是 prometheus.yml 中的 localhost 用于主机上运行的目标。

在 Linux 上以主机网络模式运行 Prometheus 容器,无需更改配置。

【讨论】:

以上是关于容器化 Prometheus 无法抓取 JMX-exporter的主要内容,如果未能解决你的问题,请参考以下文章

如何找到 helm 配置的 prometheus blackbox exporter 容器正在抓取的目标

容器化部署Prometheus

容器化部署Prometheus

使用 Prometheus 监控容器化 Spark v2.1 应用程序

ECS 中 docker 任务的 Prometheus 指标抓取

Prometheus - Docker/JVM 监控