k8s仪表板:指标客户端健康检查失败
Posted
技术标签:
【中文标题】k8s仪表板:指标客户端健康检查失败【英文标题】:k8s dashboard: Metric client health check failed 【发布时间】:2020-12-30 13:13:47 【问题描述】:我使用以下命令安装 k8s 仪表板:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.4/aio/deploy/recommended.yaml
然后我查看dashboard pod的日志:
$ kubectl -n kubernetes-dashboard logs -f kubernetes-dashboard-665f4c5ff-wcrj9
2020/09/12 04:19:10 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:19:43 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:20:17 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:20:50 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:21:23 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:21:56 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:22:29 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
kubeadm 版本:1.19 kubectl 版本:1.19
谁能帮帮我?
【问题讨论】:
【参考方案1】:提供一些背景信息:安装 Kubernetes Dashboard 后,您将安装一个提供 Dashboard 的 Pod 以及一个负责从 Kubernetes Metrics API 抓取指标的 Pod,Dashboard Metrics Scraper 。仪表板委托给刮板,希望通过其 K8s 服务解决它:“dashboard-metrics-scraper”。
在您的情况下,找不到此服务。执行“kubectl get service -n kubernetes-dashboard
”来查看爬虫服务是否被删除或重命名。如果已删除,请重新应用 Dashboard 安装 yamls 以重新创建它。
【讨论】:
Scraper 服务未被删除或重命名。 dashboard-metrics-scraper pod 的日志有一些错误:"level":"error","msg":"Error scraping node metrics: the server could not find the requested resource (get nodes.metrics.k8s.io)","time":"2020-09-13T02:52:38Z"
您是否创建了metrics server? @任【参考方案2】:
我无法重现您的问题,但您可以尝试以下一些步骤来调试问题:
Metric client health check failed: ... Retrying in 30 seconds
错误仅在the dashboard's source code 中出现一次,当健康检查失败时。
HealthCheck 本身就是对 api-server 的代理请求。
使用以下命令测试代理是否正常工作。
$ kubectl get --raw "/api/v1/namespaces/kubernetes-dashboard/services/dashboard-metrics-scraper/proxy/healthz"
它应该返回:URL: /healthz
。如果没有,dashboard-metrics-scraper
服务或 pod 很可能有问题。确保服务存在并且 pod 正在运行并准备就绪。
如果它对您有用(来自 cli),但它仍然不适用于 kubernetes-dashboard
,这意味着您应该检查 kubernetes-dashboard
的 RBAC 权限。确保 kubernetes-dashboard 拥有proxy
的权限。
您看到的第二个错误:
"level":"error","msg":"Error scraping node metrics: the server could not find the requested resource (get nodes.metrics.k8s.io)","time":"2020-09-13T02:52:38Z"
表示您的集群中没有部署指标服务器。更多信息请查看metrics-server github repo。
【讨论】:
您可以通过删除“dashboard-metrics-scraper”服务并检查“kubernetes-dashboard”pod 的日志来复制问题kubectl get --raw...
有效吗? @FritzDuchardt
如果服务到位 - 是
试试下面的参数--as "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard"
@FritzDuchardt
对不起@FritzDuchardt。我以为你是 OP,只是意识到你不是。在这种情况下,这种对话没有意义。【参考方案3】:
我在 kubernetes 1.20.1-00 ubuntu 20.04 上。我得到了
"level":"error","msg":"Error scraping node metrics: the server could not find the requested resource (get nodes.metrics.k8s.io)","time":"2020-09-13T02:52:38Z"
错误,因为我在部署 metric server 之前部署了带有 metric scraper 的 kubernetes 仪表板。在该配置下运行一天后,我的指标刮板 pod 日志中仍然出现“错误抓取节点...”。
我通过将 metric scraper 部署缩放为 0(零)然后将其缩放回所需的 pod 数量(在我的情况下为 3)解决了这个问题。
一旦指标刮板 pod 启动,日志中的错误消息就会立即消失。
我并不是暗示这是正确的修复,只是从看到相同的错误中观察到的。这可能是由于像我一样以错误的顺序简单地部署指标服务器和 Kubernetes 仪表板造成的。
【讨论】:
以上是关于k8s仪表板:指标客户端健康检查失败的主要内容,如果未能解决你的问题,请参考以下文章