kubelet 指标未出现在普罗米修斯中
Posted
技术标签:
【中文标题】kubelet 指标未出现在普罗米修斯中【英文标题】:Kubelet metrics not appearing in prometheus 【发布时间】:2021-09-13 09:45:25 【问题描述】:我必须为我的 EKS 集群设置一个监控环境。 Prometheus 在外部节点上运行,我正在尝试使用节点导出器守护程序集来获取指标。 但是在普罗米修斯上,当我看到目标时,我看不到任何目标,而不仅仅是本地主机。
Kubernetes_sd_config 块
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 15s
static_configs:
- targets: ['localhost:9100']
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
api_server: https:// kubernetes_api_server_addr
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: keep
regex: default;kubernetes;https
- target_label: __address__
replacement: kubernetes_api_server_addr
- job_name: 'kubernetes-kube-state'
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
kubernetes_sd_configs:
- role: pod
api_server: https:// kubernetes_api_server_addr
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
scheme: https
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
regex: .*true.*
action: keep
- target_label: __address__
replacement: kubernetes_api_server_addr
- source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
regex: 'node-exporter;(.*)'
action: replace
target_label: nodename
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name]
regex: (.+);(.+)
target_label: __metrics_path__
replacement: /api/v1/namespaces/$1/pods/$2/proxy/metrics
###################################################################################
# Scrape config for nodes (kubelet). #
# #
# Rather than connecting directly to the node, the scrape is proxied though the #
# Kubernetes apiserver. This means it will work if Prometheus is running out of #
# cluster, or can't connect to nodes for some other reason (e.g. because of #
# firewalling). #
###################################################################################
- job_name: 'kubernetes-kubelet'
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
kubernetes_sd_configs:
- role: node
api_server: https:// kubernetes_api_server_addr
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes_api_server_addr
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
kubernetes_sd_configs:
- role: node
api_server: https:// kubernetes_api_server_addr
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes_api_server_addr
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
###################################################################################
# Example scrape config for service endpoints. #
# #
# The relabeling allows the actual service scrape endpoint to be configured #
# for all or only some endpoints. #
###################################################################################
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
api_server: https:// kubernetes_api_server_addr
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
#########################################################################################
# Example scrape config for probing services via the Blackbox Exporter. #
# #
# The relabeling allows the actual service scrape endpoint to be configured #
# for all or only some services. #
#########################################################################################
- job_name: 'kubernetes-services'
kubernetes_sd_configs:
- role: service
api_server: https:// kubernetes_api_server_addr
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- target_label: __address__
replacement: kubernetes_api_server_addr
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
regex: (.+);(.+)
target_label: __metrics_path__
replacement: /api/v1/namespaces/$1/services/$2/proxy/metrics
##################################################################################
# Example scrape config for pods #
# #
# The relabeling allows the actual pod scrape to be configured #
# for all the declared ports (or port-free target if none is declared) #
# or only some ports. #
##################################################################################
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
api_server: https:// kubernetes_api_server_addr
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
relabel_configs:
- source_labels: [__address__, __meta_kubernetes_pod_annotation_example_io_scrape_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pods
- job_name: 'kubernetes-service-endpoints-e'
kubernetes_sd_configs:
- role: endpoints
api_server: https:// kubernetes_api_server_addr
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /etc/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (\d+)
target_label: __meta_kubernetes_pod_container_port_number
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
regex: ()
target_label: __meta_kubernetes_service_annotation_prometheus_io_path
replacement: /metrics
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_pod_container_port_number, __meta_kubernetes_service_annotation_prometheus_io_path]
target_label: __metrics_path__
regex: (.+);(.+);(.+);(.+)
replacement: /api/v1/namespaces/$1/services/$2:$3/proxy$4
- target_label: __address__
replacement: kubernetes_api_server_addr
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: instance
这是我的 prometheus 实例上的 Prometheus.yml 文件。
Prometheus 实例日志 /var/log/messages
Jul 1 15:18:53 ip-XXXXXXXXXXX prometheus: ts=2021-07-01T15:18:53.655Z caller=log.go:124 component=k8s_client_runtime level=debug func=Verbose.Infof msg="Listing and watching *v1.Endpoints from pkg/mod/k8s.io/client-go@v0.21.1/tools/cache/reflector.go:167"
Jul 1 15:18:53 ip-XXXXXXXXXXX prometheus: ts=2021-07-01T15:18:53.676Z caller=log.go:124 component=k8s_client_runtime level=debug func=Infof msg="GET https://XXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api/v1/endpoints?limit=500&resourceVersion=0 in 20 milliseconds"
Jul 1 15:18:53 ip-XXXXXXXXXXX prometheus: ts=2021-07-01T15:18:53.676Z caller=log.go:124 component=k8s_client_runtime level=error func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.21.1/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get \"https://XXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api/v1/endpoints?limit=500&resourceVersion=0\": x509: certificate signed by unknown authority"
Jul 1 15:18:56 ip-XXXXXXXXXXX prometheus: ts=2021-07-01T15:18:56.445Z caller=log.go:124 component=k8s_client_runtime level=debug func=Verbose.Infof msg="Listing and watching *v1.Pod from pkg/mod/k8s.io/client-go@v0.21.1/tools/cache/reflector.go:167"
Jul 1 15:18:56 ip-XXXXXXXXXXX prometheus: ts=2021-07-01T15:18:56.445Z caller=log.go:124 component=k8s_client_runtime level=debug func=Infof msg="GET https://XXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api/v1/pods?limit=500&resourceVersion=0 in 0 milliseconds"
Jul 1 15:18:56 ip-XXXXXXXXXXX prometheus: ts=2021-07-01T15:18:56.445Z caller=log.go:124 component=k8s_client_runtime level=error func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.21.1/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://XXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api/v1/pods?limit=500&resourceVersion=0\": unable to read authorization credentials file /etc/prometheus/token: open /etc/prometheus/token: no such file or directory"
【问题讨论】:
请提供您的 prometheus 配置文件中的 kubernetes_sd_config 块。 我已经上传了代码文件...我是 Kubernetes 和 prometheus 的新手.. 当您查看服务发现时,您是否看到从 kubernetes 集群中删除的目标,或者您什么也没看到? 我只能看到一个目标“localhost:9100”。 那么您的 prometheus 可能无法在 kubernetes 集群上进行身份验证。您在普罗米修斯中看到了什么日志?你能提高它的日志级别吗? 【参考方案1】:您分享的日志指出了问题:
... unable to read authorization credentials file /etc/prometheus/token: open /etc/prometheus/token: no such file or directory"
集群内工作负载的令牌文件默认安装在/var/run/secrets/kubernetes.io/serviceaccount/token
,但是由于您提到 Prometheus 正在“在外部节点上”运行(不知道您的意思),这可能对您有用,也可能没用可以改变。
【讨论】:
“在外部节点上”意味着我已经在 ec2 实例中配置了 Prometheus,并且我在 EKS 集群上部署了 API(我想从 prometheus 进行监控)。所以,这一直保存在一个单独的实例上,这样如果将来集群关闭,那么我的监控基础设施也会启动并运行。此外,我已经通过使用“Kube-state-metrics”成功实现了这一点。但是,我想知道为什么我不能使用“节点导出器”来做到这一点。以上是关于kubelet 指标未出现在普罗米修斯中的主要内容,如果未能解决你的问题,请参考以下文章