k8s hpa无法获取cpu信息[关闭]

Posted 2023-03-04

技术标签:

【中文标题】k8s hpa无法获取cpu信息[关闭]【英文标题】：k8s hpa can't get the cpu information [closed] 【发布时间】：2020-04-03 08:56:48 【问题描述】：

我设置了一个hpa使用命令

sudo kubectl autoscale deployment e7-build-64 --cpu-percent=50 --min=1 --max=2 -n k8s-demo

sudo kubectl get hpa -n k8s-demo

NAME              REFERENCE                TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
e7-build-64       Deployment/e7-build-64   <unknown>/50%   1         2         1          15m

sudo kubectl 描述 hpa e7-build-64 -n k8s-demo

Name:                                                  e7-build-64
Namespace:                                             k8s-demo
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Tue, 10 Dec 2019 15:34:24 +0800
Reference:                                             Deployment/e7-build-64
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 50%
Min replicas:                                          1
Max replicas:                                          2
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Events:
  Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Warning  FailedComputeMetricsReplicas  13m (x12 over 16m)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedGetResourceMetric       74s (x61 over 16m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API

在 deployment.yaml 中我添加了资源请求和限制

resources:
  limits:
    memory: "16Gi"
    cpu: "4000m"
  requests: 
    memory: "4Gi"
    cpu: "2000m"

kubectl 版本

Client Version: version.InfoMajor:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"
Server Version: version.InfoMajor:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"

然后我尝试设置 hpa 使用 yaml

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-e7-build-64
  namespace: k8s-demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: e7-build-64
  minReplicas: 1
  maxReplicas: 2
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 10

它仍然有一些错误 sudo kubectl 描述 hpa hpa-e7-build-64 -n k8s-demo

Name:                                                  hpa-e7-build-64
Namespace:                                             k8s-demo
Labels:                                                <none>
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
                                                         "apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":"annotations":,"name":"hpa-e7-build-64","namespace":"k8...
CreationTimestamp:                                     Tue, 10 Dec 2019 14:24:07 +0800
Reference:                                             Deployment/e7-build-64
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 10%
Min replicas:                                          1
Max replicas:                                          2
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedGetResourceMetric       59m (x141 over 94m)    horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
  Warning  FailedGetResourceMetric       54m (x2 over 54m)      horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
  Warning  FailedComputeMetricsReplicas  39m (x58 over 53m)     horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedGetResourceMetric       4m29s (x197 over 53m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API

我已经执行了以下命令：

git clone https://github.com/kubernetes-incubator/metrics-server.git (fetch)
cd metrics-server/deploy
sudo kubectl create -f 1.8+/

有人知道怎么解决吗？

更新：

sudo kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/k8s-demo/pods"
"kind":"PodMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":"selfLink":"/apis/metrics.k8s.io/v1beta1/namespaces/k8s-demo/pods","items":[]

以及 pod 信息：

sudo kubectl describe pod metrics-server-795b774c76-fs8hw -n kube-system
Name:         metrics-server-795b774c76-fs8hw
Namespace:    kube-system
Priority:     0
Node:         nandoc-95/192.168.33.225
Start Time:   Tue, 10 Dec 2019 15:04:14 +0800
Labels:       k8s-app=metrics-server
              pod-template-hash=795b774c76
Annotations:  cni.projectcalico.org/podIP: 10.0.229.135/32
Status:       Running
IP:           10.0.229.135
IPs:
  IP:           10.0.229.135
Controlled By:  ReplicaSet/metrics-server-795b774c76
Containers:
  metrics-server:
    Container ID:  docker://2c6dd8c50938bc9ab536c78b73773aa7a9eedd60a6974805beec58e8ee9fde3c
    Image:         k8s.gcr.io/metrics-server-amd64:v0.3.6
    Image ID:      docker-pullable://k8s.gcr.io/metrics-server-amd64@sha256:c9c4e95068b51d6b33a9dccc61875df07dc650abbf4ac1a19d58b4628f89288b
    Port:          4443/TCP
    Host Port:     0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=4443
    State:          Running
      Started:      Tue, 10 Dec 2019 15:05:13 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-xjgpx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  metrics-server-token-xjgpx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  metrics-server-token-xjgpx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

sudo kubectl get pods --all-namespaces -o wide

NAMESPACE              NAME                                         READY   STATUS    RESTARTS   AGE    IP               NODE              NOMINATED NODE   READINESS GATES
k8s-demo               k8s-pod-e7-build-32-7bb5bc7c6-s2zsr          1/1     Running   0          32m    10.0.100.198     nandoc-94         <none>           <none>
k8s-demo               k8s-pod-e7-build-64-d5c659d6b-5hv6m          1/1     Running   0          31m    10.0.229.137     nandoc-95         <none>           <none>
kube-system            calico-kube-controllers-55754f75c-82np8      1/1     Running   0          5d     10.0.126.1       nandoc-93         <none>           <none>
kube-system            calico-node-2dxmp                            1/1     Running   0          2d5h   192.168.33.225   nandoc-95         <none>           <none>
kube-system            calico-node-7ms8t                            1/1     Running   0          28d    192.168.33.223   nandoc-93         <none>           <none>
kube-system            calico-node-hdw25                            1/1     Running   1          21d    192.168.33.224   nandoc-94         <none>           <none>
kube-system            calico-node-j4jv4                            0/1     Running   0          27d    192.168.37.173   cyuan-k8s-node1   <none>           <none>
kube-system            calicoctl                                    1/1     Running   0          6d     192.168.33.224   nandoc-94         <none>           <none>
kube-system            coredns-5644d7b6d9-n9z5m                     1/1     Running   0          5d     10.0.126.2       nandoc-93         <none>           <none>
kube-system            coredns-5644d7b6d9-txcm4                     1/1     Running   0          5d     10.0.100.194     nandoc-94         <none>           <none>
kube-system            etcd-nandoc-93                               1/1     Running   0          28d    192.168.33.223   nandoc-93         <none>           <none>
kube-system            kube-apiserver-nandoc-93                     1/1     Running   0          28d    192.168.33.223   nandoc-93         <none>           <none>
kube-system            kube-controller-manager-nandoc-93            1/1     Running   0          28d    192.168.33.223   nandoc-93         <none>           <none>
kube-system            kube-proxy-5jlfc                             1/1     Running   0          27d    192.168.37.173   cyuan-k8s-node1   <none>           <none>
kube-system            kube-proxy-7t7b7                             1/1     Running   0          28d    192.168.33.223   nandoc-93         <none>           <none>
kube-system            kube-proxy-j5b4c                             1/1     Running   0          2d5h   192.168.33.225   nandoc-95         <none>           <none>
kube-system            kube-proxy-jj256                             1/1     Running   1          21d    192.168.33.224   nandoc-94         <none>           <none>
kube-system            kube-scheduler-nandoc-93                     1/1     Running   0          28d    192.168.33.223   nandoc-93         <none>           <none>
kube-system            metrics-server-795b774c76-fs8hw              1/1     Running   0          24h    10.0.229.135     nandoc-95         <none>           <none>
kubernetes-dashboard   dashboard-metrics-scraper-76585494d8-wqgks   1/1     Running   0          5d     10.0.126.3       nandoc-93         <none>           <none>
kubernetes-dashboard   kubernetes-dashboard-b65488c4-qh95m          1/1     Running   0          5d     10.0.126.4       nandoc-93         <none>           <none>

sudo kubectl get hpa --all-namespaces -o wide

NAMESPACE   NAME                  REFERENCE                        TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
k8s-demo    hpa-e7-build-32       Deployment/k8s-pod-e7-build-32   <unknown>/10%   1         2         1          85s
k8s-demo    hpa-e7-build-64       Deployment/k8s-pod-e7-build-64   <unknown>/10%   1         2         1          79s
k8s-demo    k8s-pod-e7-build-64   Deployment/k8s-pod-e7-build-64   <unknown>/50%   1         2         1          16s

我更新了 pod 名称并重新创建了 hpa，添加了前缀 k8s-pod-today.so 输出与以前不同。

【问题讨论】：

下面的返回kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/k8s-demo/pods"是什么？检查您的 metric-server pod 是否运行正常。给我们 kubectl describe pod 的输出 @weibeld 我更新了这个问题描述中的输出。 @EAT_Py pod 正在运行。 kube-system metrics-server-795b774c76-fs8hw 1/1 Running 0 18h Pod 运行在哪个命名空间中？ kubectl get pods -n k8s-demo 的输出是什么？ 【参考方案1】：

对于 Kubernetes 1.18 和 Metrics v0.3.7，我们应该编辑 metrics-server 部署以反映以下参数：

args:
  - --kubelet-insecure-tls
  - --kubelet-preferred-address-types=InternalIP
  - --cert-dir=/tmp
  - --secure-port=4443

【讨论】：

--kubelet-insecure-tls=true 并通过此解决方法解决了问题【参考方案2】：

谢谢 weibeld 和 EAT_Py。我已经解决了这个问题。调试过程：

sudo kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/k8s-demo/pods"
sudo kubectl -n kube-system logs metrics-server-795b774c76-t2rj7
sudo kubectl top node nandoc-94   -->can't get info
sudo kubectl top pod k8s-pod-e7-build-32-7bb5bc7c6-s2zsr   -->can't get info

metrics-server 的日志有一些错误信息：

kubelet_summary:nandoc-93: unable to fetch metrics from Kubelet nandoc-93 (nandoc-93): Get https://nandoc-93:10250/stats/summary?only_cpu_and_memory=true: x509: certificate signed by unknown authority]

然后根据https://github.com/kubernetes-sigs/metrics-server/issues/146 我编辑 metrics-server/deploy/1.8+/metrics-server-deployment.yaml 并添加命令

  - name: metrics-server
    image: k8s.gcr.io/metrics-server-amd64:v0.3.6
    command:
    - /metrics-server
    - --kubelet-insecure-tls

kubectl apply -f metrics-server-deployment.yaml

之后，kubectl top pod 工作正常。而 hpa 现在可以工作了。再次感谢。

sudo kubectl get hpa --all-namespaces
NAMESPACE   NAME              REFERENCE                        TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
k8s-demo    hpa-e7-build-32   Deployment/k8s-pod-e7-build-32   0%/10%    1         2         1          19h
k8s-demo    hpa-e7-build-64   Deployment/k8s-pod-e7-build-64   0%/10%    1         2         1          19h

【讨论】：

# deployments "kube-state-metrics" was not valid: # * : Invalid value: "The modified file failed validation": [couldn't get version/kind; json 解析错误：无效字符 'a' 寻找值的开头，[无效字符 'a' 寻找值的开头，无效字符 'a' 寻找值的开头]] # 当我添加规范时：容器：-图像： quay.io/coreos/kube-state-metrics:v1.9.5 命令： - /metrics-server - --kubelet-insecure-tls

以上是关于k8s hpa无法获取cpu信息[关闭]的主要内容，如果未能解决你的问题，请参考以下文章