水平 pod 自动缩放不起作用:`无法获取资源 cpu 的指标:没有从 heapster 返回的指标`
Posted
技术标签:
【中文标题】水平 pod 自动缩放不起作用:`无法获取资源 cpu 的指标:没有从 heapster 返回的指标`【英文标题】:Horizontal pod autoscaling not working: `unable to get metrics for resource cpu: no metrics returned from heapster` 【发布时间】:2017-10-13 14:08:01 【问题描述】:我正在尝试在使用 kubeadm 安装 Kubernetes 后创建一个水平 pod 自动缩放。
主要症状是kubectl get hpa
将TARGETS
列中的CPU指标返回为“未定义”:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
fibonacci Deployment/fibonacci <unknown> / 50% 1 3 1 1h
经过进一步调查,hpa
似乎正在尝试从 Heapster 接收 CPU 指标 - 但在我的配置中,cAdvisor 正在提供 cpu 指标。
我根据kubectl describe hpa fibonacci
的输出做出这个假设:
Name: fibonacci
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Sun, 14 May 2017 18:08:53 +0000
Reference: Deployment/fibonacci
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 1
Max replicas: 3
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 3s 148 horizontal-pod-autoscaler Warning FailedGetResourceMetric unable to get metrics for resource cpu: no metrics returned from heapster
1h 3s 148 horizontal-pod-autoscaler Warning FailedComputeMetricsReplicas failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from heapster
为什么hpa
尝试从 heapster 而不是 cAdvisor 接收此指标?
我该如何解决这个问题?
请在下面找到我的部署,以及/var/log/container/kube-controller-manager.log
的内容以及kubectl get pods --namespace=kube-system
和kubectl describe pods
的输出
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: fibonacci
labels:
app: fibonacci
spec:
template:
metadata:
labels:
app: fibonacci
spec:
containers:
- name: fibonacci
image: oghma/fibonacci
ports:
- containerPort: 8088
resources:
requests:
memory: "64Mi"
cpu: "75m"
limits:
memory: "128Mi"
cpu: "100m"
---
kind: Service
apiVersion: v1
metadata:
name: fibonacci
spec:
selector:
app: fibonacci
ports:
- protocol: TCP
port: 8088
targetPort: 8088
externalIPs:
- 192.168.66.103
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: fibonacci
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: fibonacci
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 50
$ kubectl describe pods
Name: fibonacci-1503002127-3k755
Namespace: default
Node: kubernetesnode1/192.168.66.101
Start Time: Sun, 14 May 2017 17:47:08 +0000
Labels: app=fibonacci
pod-template-hash=1503002127
Annotations: kubernetes.io/created-by="kind":"SerializedReference","apiVersion":"v1","reference":"kind":"ReplicaSet","namespace":"default","name":"fibonacci-1503002127","uid":"59ea64bb-38cd-11e7-b345-fa163edb1ca...
Status: Running
IP: 192.168.202.1
Controllers: ReplicaSet/fibonacci-1503002127
Containers:
fibonacci:
Container ID: docker://315375c6a978fd689f4ba61919c15f15035deb9139982844cefcd46092fbec14
Image: oghma/fibonacci
Image ID: docker://sha256:26f9b6b2c0073c766b472ec476fbcd2599969b6e5e7f564c3c0a03f8355ba9f6
Port: 8088/TCP
State: Running
Started: Sun, 14 May 2017 17:47:16 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 128Mi
Requests:
cpu: 75m
memory: 64Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-45kp8 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-45kp8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-45kp8
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events: <none>
$ kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
calico-etcd-k1g53 1/1 Running 0 2h
calico-node-6n4gp 2/2 Running 1 2h
calico-node-nhmz7 2/2 Running 0 2h
calico-policy-controller-1324707180-65m78 1/1 Running 0 2h
etcd-kubernetesmaster 1/1 Running 0 2h
heapster-1428305041-zjzd1 1/1 Running 0 1h
kube-apiserver-kubernetesmaster 1/1 Running 0 2h
kube-controller-manager-kubernetesmaster 1/1 Running 0 2h
kube-dns-3913472980-gbg5h 3/3 Running 0 2h
kube-proxy-1dt3c 1/1 Running 0 2h
kube-proxy-tfhr9 1/1 Running 0 2h
kube-scheduler-kubernetesmaster 1/1 Running 0 2h
monitoring-grafana-3975459543-9q189 1/1 Running 0 1h
monitoring-influxdb-3480804314-7bvr3 1/1 Running 0 1h
$ cat /var/log/container/kube-controller-manager.log
"log":"I0514 17:47:08.631314 1 event.go:217] Event(v1.ObjectReferenceKind:\"Deployment\", Namespace:\"default\", Name:\"fibonacci\", UID:\"59e980d9-38cd-11e7-b345-fa163edb1ca6\", APIVersion:\"extensions\", ResourceVersion:\"1303\", FieldPath:\"\"): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set fibonacci-1503002127 to 1\n","stream":"stderr","time":"2017-05-14T17:47:08.63177467Z"
"log":"I0514 17:47:08.650662 1 event.go:217] Event(v1.ObjectReferenceKind:\"ReplicaSet\", Namespace:\"default\", Name:\"fibonacci-1503002127\", UID:\"59ea64bb-38cd-11e7-b345-fa163edb1ca6\", APIVersion:\"extensions\", ResourceVersion:\"1304\", FieldPath:\"\"): type: 'Normal' reason: 'SuccessfulCreate' Created pod: fibonacci-1503002127-3k755\n","stream":"stderr","time":"2017-05-14T17:47:08.650826398Z"
"log":"E0514 17:49:00.873703 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:49:00.874034952Z"
"log":"E0514 17:49:30.884078 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:49:30.884546461Z"
"log":"E0514 17:50:00.896563 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:50:00.89688734Z"
"log":"E0514 17:50:30.906293 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:50:30.906825794Z"
"log":"E0514 17:51:00.915996 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:51:00.916348218Z"
"log":"E0514 17:51:30.926043 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:51:30.926367623Z"
"log":"E0514 17:52:00.936574 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:52:00.936903072Z"
"log":"E0514 17:52:30.944724 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:52:30.945120508Z"
"log":"E0514 17:53:00.954785 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:53:00.955126309Z"
"log":"E0514 17:53:30.970454 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:53:30.972996568Z"
"log":"E0514 17:54:00.980735 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:54:00.981098832Z"
"log":"E0514 17:54:30.993176 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:54:30.993538841Z"
"log":"E0514 17:55:01.002941 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:55:01.003265908Z"
"log":"W0514 17:55:06.511756 1 reflector.go:323] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:192: watch of \u003cnil\u003e ended with: etcdserver: mvcc: required revision has been compacted\n","stream":"stderr","time":"2017-05-14T17:55:06.511957851Z"
"log":"E0514 17:55:31.013415 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:55:31.013776243Z"
"log":"E0514 17:56:01.024507 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:56:01.0248332Z"
"log":"E0514 17:56:31.036191 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:56:31.036606698Z"
"log":"E0514 17:57:01.049277 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:57:01.049616359Z"
"log":"E0514 17:57:31.064104 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:57:31.064489485Z"
"log":"E0514 17:58:01.073988 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:58:01.074339488Z"
"log":"E0514 17:58:31.084511 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:58:31.084839352Z"
"log":"E0514 17:59:01.096507 1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:59:01.096896254Z"
【问题讨论】:
什么是kubernetes版本?这是在 AWS 还是 Google Cloud 上? 您解决了这个问题吗? 【参考方案1】:有一个选项可以在集群池上启用自动缩放,请务必先打开它。
然后应用你的 hpa,别忘了在 k8s 控制器上设置 cpu、内存请求、限制
需要注意的一点是,如果您的 pod 上有多个容器,那么您应该为每个容器指定 cpu、内存请求和限制
【讨论】:
有一个选项可以在集群池上启用自动缩放确保首先打开它.....知道如何打开它吗?【参考方案2】:我在其他应用中也看到了这一点:HPA API 中似乎存在错误。
解决方案可以改用复制控制器 scaleref:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: fibonacci
namespace: ....
spec:
scaleRef:
kind: ReplicationController
name: fibonacci
subresource: scale
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 50
未经测试,因此可能需要对scaleRef
进行一些编辑(您使用的是scaleTargetRef
)
【讨论】:
【参考方案3】:您可以从部署中删除限制并尝试一下。在我的部署中,我只使用了 REQUESTS 来获取资源并且它起作用了。如果您看到 Horizontal Pod Autoscaler (HPA) 正在工作,那么稍后您也可以使用 LIMITS。 This discussion 告诉您,仅使用 REQUESTS 就足以执行 HPA。
【讨论】:
【参考方案4】:如果在您的部署中拥有多个容器,请确保您已在所有容器中指定资源限制。
【讨论】:
【参考方案5】:Tl;dr:如果您使用 AWS EKS 并指定 .spec.templates.spec.containers.<resources|limits>
不起作用,则问题可能是您没有安装 Kubernetes Metrics Server。
我在使用 AWS EKS 时遇到了 Kubernetes HPA 的这个问题。在寻找解决方案时,我遇到了以下命令并决定运行它以查看是否安装了 Metrics Server:
kubectl 获取 pods -n kube-system
我没有安装它。事实证明,AWS 有 this doc 声明默认情况下 Metrics Server 没有安装在 EKS 集群上。所以我按照文档建议的步骤安装服务器:
- Deploy the Metrics Server with the following command:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
- Verify that the metrics-server deployment is running the desired number of pods with the following command.
kubectl get deployment metrics-server -n kube-system
Output
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1
这就是我的解决方案。一旦 Metric Server 在我的集群上,我就成功创建了能够获取有关其目标 pod/资源的使用信息的 HPA。
PS:你也可以再次运行kubectl get pods -n kube-system
确认安装。
【讨论】:
【参考方案6】:如果您使用的是 GKE 1.9.x
有一些错误,需要先禁用自动缩放,然后重新启用它。这将提供当前值来代替 unknown
尝试更新到可用的最新 GKE。
【讨论】:
以上是关于水平 pod 自动缩放不起作用:`无法获取资源 cpu 的指标:没有从 heapster 返回的指标`的主要内容,如果未能解决你的问题,请参考以下文章
使用容器中应用程序公开的 REST API 进行水平 Pod 自动缩放