prometheus作为监控，如何计算kubernetes中容器的cpu使用率？

Posted 2023-02-15

技术标签:

【中文标题】prometheus作为监控，如何计算kubernetes中容器的cpu使用率？【英文标题】：How to calculate containers' cpu usage in kubernetes with prometheus as monitoring? 【发布时间】：2017-03-12 15:27:04 【问题描述】：

我想计算一个 kubernetes 集群中所有 pod 的 cpu 使用率。我发现 prometheus 中的两个指标可能有用：

container_cpu_usage_seconds_total: Cumulative cpu time consumed per cpu in seconds.
process_cpu_seconds_total: Total user and system CPU time spent in seconds.

Cpu Usage of all pods = increment per second of sum(container_cpu_usage_seconds_totalid="/")/increment per second of sum(process_cpu_seconds_total)

但是，我发现container_cpu_usageid="/" 每秒的增量大于sum(process_cpu_seconds_total) 的增量。所以使用量可能会大于1...

【问题讨论】：

【参考方案1】：

我用来获取集群级别的 CPU 使用率：

sum (rate (container_cpu_usage_seconds_totalid="/"[1m])) / sum (machine_cpu_cores) * 100

我还跟踪每个 pod 的 CPU 使用率。

sum (rate (container_cpu_usage_seconds_totalimage!=""[1m])) by (pod_name)

我在 GitHub 上有一个完整的 kubernetes-prometheus 解决方案，也许可以帮助您获得更多指标：https://github.com/camilb/prometheus-kubernetes

【讨论】：

我能否确认sum (rate (container_cpu_usage_seconds_totalid="/"[1m])) / sum (machine_cpu_cores) * 100 代表的是cpu 使用率的百分比，还是容器消耗的核心数？我在所有容器中使用sum (rate (container_cpu_usage_seconds_totalid="/"[1m])) / sum (machine_cpu_cores) * 100 得到一些奇怪的结果，我得到一个介于 0 和 1 之间的数字，但是对于 nginx-ingress-controller 和 fluentd-gcp，我从 0 到 3.. . 如何使用 prompql 计算 pod 的内存使用量？您使用哪个指标来计算当前使用的核心数？ @Camil 我正在你的 github 中寻找更多指标，但我没有找到任何人......他们在哪里？【参考方案2】：

您也可以使用以下查询：

avg (rate (container_cpu_usage_seconds_totalid="/"[1m]))

【讨论】：

【参考方案3】：

我创建了自己的 prometheus 导出器 (https://github.com/google-cloud-tools/kube-eagle)，主要是为了更好地了解每个节点的资源利用率。但它也提供了一种更直观的方式来监控您的 CPU 和 RAM 资源。获取集群范围内 CPU 使用率的查询如下所示：

sum(eagle_pod_container_resource_usage_cpu_cores)

但您也可以通过命名空间、节点或节点池轻松获取 CPU 使用率。

【讨论】：

这个答案被低估了/很棒的工具。 prometheus 的一个大问题是缺乏标准化。 kubernetes 资源限制和请求是基于milli cpu Prometheus Metrics 没有在 Milli CPU 上标准化是没有意义的，我知道 Prometheus 不只是在 Kubernetes 上运行，但你不能同时导出这两种指标样式为了标准化，或者甚至 [classic cpu % used] * 100 / 1000 进行逻辑转换为毫 CPU？【参考方案4】：

我更喜欢按照doc 使用这个指标

sum(rate(container_cpu_usage_seconds_totalname!~".*prometheus.*", image!="", container_name!="POD"[5m])) by (pod_name, container_name) /
sum(container_spec_cpu_quotaname!~".*prometheus.*", image!="", container_name!="POD"/container_spec_cpu_periodname!~".*prometheus.*", image!="", container_name!="POD") by (pod_name, container_name)

【讨论】：

这似乎在所有情况下都不能很好地工作，它显示不应该存在的负数

以上是关于prometheus作为监控，如何计算kubernetes中容器的cpu使用率？的主要内容，如果未能解决你的问题，请参考以下文章