K8S HPA - 无法从外部指标 API 获取指标
Posted
技术标签:
【中文标题】K8S HPA - 无法从外部指标 API 获取指标【英文标题】:K8S HPA - Cannot fetch metrics from External metrics API 【发布时间】:2021-07-12 17:25:32 【问题描述】:我正在尝试让 Kafka 主题延迟进入 Prometheus,最后进入 APIServer,以便为我的应用程序使用外部指标 HPA。
我收到错误没有从外部指标 API 返回指标
70m Warning FailedGetExternalMetric horizontalpodautoscaler/kafkademo-hpa unable to get external metric default/kafka_lag_metric_sm0ke/&LabelSelectorMatchLabels:map[string]stringtopic: prices,,MatchExpressions:[]LabelSelectorRequirement,: no metrics returned from external metrics API
66m Warning FailedComputeMetricsReplicas horizontalpodautoscaler/kafkademo-hpa invalid metrics (1 invalid out of 1), first error is: failed to get external metric kafka_lag_metric_sm0ke: unable to get external metric default/kafka_lag_metric_sm0ke/&LabelSelectorMatchLabels:map[string]stringtopic: prices,,MatchExpressions:[]LabelSelectorRequirement,: no metrics returned from external metrics API
发生这种情况即使我在查询外部 API 时可以看到以下输出:
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1 | jq
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": [
"name": "kafka_lag_metric_sm0ke",
"singularName": "",
"namespaced": true,
"kind": "ExternalMetricValueList",
"verbs": [
"get"
]
]
设置如下:
卡夫卡:v2.7.0 普罗米修斯:v2.26.0 Prometheus 适配器:v0.8.3Prometheus 适配器值
rules:
external:
- seriesQuery: 'kafka_consumergroup_group_lagtopic="prices"'
resources:
template: <<.Resource>>
name:
as: "kafka_lag_metric_sm0ke"
metricsQuery: 'avg by (topic) (round(avg_over_time(<<.Series>><<.LabelMatchers>>[1m])))'
HPA
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: kafkademo-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: kafkademo
minReplicas: 3
maxReplicas: 12
metrics:
- type: External
external:
metricName: kafka_lag_metric_sm0ke
metricSelector:
matchLabels:
topic: prices
targetValue: 5
HPA 信息
kubectl describe hpa kafkademo-hpa
Name: kafkademo-hpa
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Sat, 17 Apr 2021 20:01:29 +0300
Reference: Deployment/kafkademo
Metrics: ( current / target )
"kafka_lag_metric_sm0ke" (target value): <unknown> / 5
Min replicas: 3
Max replicas: 12
Deployment pods: 3 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetExternalMetric the HPA was unable to compute the replica count: unable to get external metric default/kafka_lag_metric_sm0ke/&LabelSelectorMatchLabels:map[string]stringtopic: prices,,MatchExpressions:[]LabelSelectorRequirement,: no metrics returned from external metrics API
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedComputeMetricsReplicas 70m (x335 over 155m) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get external metric kafka_lag_metric_sm0ke: unable to get external metric default/kafka_lag_metric_sm0ke/&LabelSelectorMatchLabels:map[string]stringtopic: prices,,MatchExpressions:[]LabelSelectorRequirement,: no metrics returned from external metrics API
Warning FailedGetExternalMetric 2m30s (x366 over 155m) horizontal-pod-autoscaler unable to get external metric default/kafka_lag_metric_sm0ke/&LabelSelectorMatchLabels:map[string]stringtopic: prices,,MatchExpressions:[]LabelSelectorRequirement,: no metrics returned from external metrics API
-- 编辑 1
当我查询默认命名空间时,我得到了这个:
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/default/kafka_lag_metric_sm0ke |jq
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": ,
"items": []
我可以看到“项目”字段为空。这是什么意思?
我似乎不明白幕后发生的一连串事件。
AFAIK 这就是发生的事情。 这是正确的吗?
prometheus-adapter 查询 Prometheus,执行 seriesQuery,计算 metricsQuery 并创建“kafka_lag_metric_sm0ke” 它向 api 服务器注册一个端点以获取外部指标。 API 服务器将根据该端点定期更新其统计信息。 HPA 从 API 服务器检查“kafka_lag_metric_sm0ke”并根据提供的值执行缩放。我似乎也不明白命名空间在这一切中的重要性。我可以看到 stat 是命名空间的。这是否意味着每个命名空间将有 1 个统计信息?这有什么意义?
【问题讨论】:
【参考方案1】:在我提出问题后回答我自己的问题的悠久传统中,上述配置有什么问题。
错误在于prometheus-adapter yaml:
rules:
external:
- seriesQuery: 'kafka_consumergroup_group_lagtopic="prices"'
resources:
template: <<.Resource>>
name:
as: "kafka_lag_metric_sm0ke"
metricsQuery: 'avg by (topic) (round(avg_over_time(<<.Series>><<.LabelMatchers>>[1m])))'
我删除了<<.LabelMatchers>>
,现在它可以工作了:
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/default/kafka_lag_metric_sm0ke |jq
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": ,
"items": [
"metricName": "kafka_lag_metric_sm0ke",
"metricLabels":
"topic": "prices"
,
"timestamp": "2021-04-21T16:55:18Z",
"value": "0"
]
我仍然不确定它为什么会起作用。我知道在这种情况下<<.LabelMatchers>>
将被替换为不会产生有效查询的东西,但我不知道它是什么。
【讨论】:
以上是关于K8S HPA - 无法从外部指标 API 获取指标的主要内容,如果未能解决你的问题,请参考以下文章
k8s Metrics Server 获取资源指标与 hpa 部署