prometheus 监控容器内存 [重复]
Posted
技术标签:
【中文标题】prometheus 监控容器内存 [重复]【英文标题】:prometheus monitor container memory [duplicate] 【发布时间】:2022-01-13 03:58:29 【问题描述】:通过监控容器使用的真实内存,发现所有容器的真实内存都大于所有物理节点的真实内存。这很奇怪。
但是,我在监控的metrics中发现没有container_Name字段,如果没有移除container_Name字段。这时候才能发现容器的实际内存是合理的
为什么会出现这种情况(PS:container_name!= "pod" 被排除在外
sum(sum(container_memory_rsscontainer_name!="POD",container_name=~"[a-z].*") by (container_name))/1024^4
sum(sum(container_memory_rsscontainer_name!="POD") by (container_name))/1024^4
【问题讨论】:
【参考方案1】:这是我们用于映射容器内存指标的方法
按(容器、pod、命名空间、节点、作业)求和(container_memory_rsscontainer != "POD", image != "", container != "")
要回答您的具体问题,为什么价值更高?那是因为它包括节点内存本身。
kubelet (cadvisor) 报告多个组的内存指标,例如,id="/" 是根 cgroup(即整个节点)的指标
例如在我的设置中,以下指标是节点内存
endpoint="https-metrics", id="/", instance="10.0.84.2:10250", job="kubelet", metrics_path="/metrics/cadvisor", node="ip-10-xx-x-x.us-west-2.compute.internal", service="kube-prometheus-stack-kubelet"
同样在www.asserts.ai,我们使用 rss 的最大值、工作和使用指标来得出容器使用的实际内存。
请参阅下面对我们的记录规则的参考
#
- record: asserts:container_memory
expr: sum by (container, pod, namespace, node, job, asserts_env, asserts_site)(container_memory_rsscontainer != "POD", image != "", container != "")
labels:
source: rss
- record: asserts:container_memory
expr: sum by (container, pod, namespace, node, job, asserts_env, asserts_site)(container_memory_working_set_bytescontainer != "POD", image != "", container != "")
labels:
source: working
- record: asserts:container_memory
# why sum ? multiple copies of same container may be running on same pod
expr: sum by (container, pod, namespace, node, job, asserts_env, asserts_site)
(
container_memory_usage_bytes container != "POD", image != "", container != "" -
container_memory_cache container != "POD", image != "", container != ""-
container_memory_swap container != "POD", image != "", container != ""
)
labels:
source: usage
# For KPI Rollup Purposes
- record: asserts:resource:usage
expr: |-
max without (source) (asserts:container_memory)
* on (namespace, pod, asserts_env, asserts_site) group_left(workload) asserts:mixin_pod_workload
【讨论】:
以上是关于prometheus 监控容器内存 [重复]的主要内容,如果未能解决你的问题,请参考以下文章
容器云平台监控告警体系—— Golang应用接入Prometheus