从节点池开始GKE非常慢 - 集群和k8s / gcloud api不可用

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了从节点池开始GKE非常慢 - 集群和k8s / gcloud api不可用相关的知识,希望对你有一定的参考价值。

我们现在有一个由7个节点和9个微服务组成的GKE集群。我们还默认添加2个节点池,其中包含2个节点。我们使用istio在微服务之间进行负载平衡。

我们的CI环境使用脚本创建所有内容。问题是群集可能需要几分钟才能与nodepool一起使用。

我的主要问题是:为什么api在此期间无法使用?

kube-system的日志中也存在很多错误,这里有一个小的摘录:

k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused github.com/GoogleCloudPlatform/k8s-stackdriver/event-exporter/watchers/watcher.go:55: Failed to list *v1.Event: Get https://10.0.0.1:443/api/v1/events?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "ERROR: logging before flag.Parse: E1114 09:50:42.925080 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused " k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused "ERROR: logging before flag.Parse: E1114 09:50:42.873176 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused " k8s.io/heapster/metrics/heapster.go:331: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/processors/namespace_based_enricher.go:90: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/heapster/metrics/util/util.go:32: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused Error while getting cluster status: Get https://10.0.0.1:443/api/v1/nodes: dial tcp 10.0.0.1:443: getsockopt: connection refused k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.0.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.0.0.1:443: connect: connection refused github.com/GoogleCloudPlatform/k8s-stackdriver/event-exporter/watchers/watcher.go:55: Failed to list *v1.Event: Get https://10.0.0.1:443/api/v1/events?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/heapster.go:254: Failed to list *v1.Pod: Get https://10.0.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/processors/namespace_based_enricher.go:85: Failed to list *v1.Namespace: Get https://10.0.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "ERROR: logging before flag.Parse: E1114 09:50:41.824128 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.0.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: getsockopt: connection refused "

答案

创建GCE资源需要时间。在任何环境中,配置VM和/或多个VM通常需要一些时间。端点不可用,因为主服务器尚未就绪。创建集群后,您可以添加2个额外的节点池而不会中断主节点。

以上是关于从节点池开始GKE非常慢 - 集群和k8s / gcloud api不可用的主要内容,如果未能解决你的问题,请参考以下文章

如何在 GKE 上调试节点健康错误?

如何在terraform中更改GKE Cluster的节点池中的节点名称?

如何使用 Terraform 创建一个健康的 VPC-Native GKE 集群?

旧光泽的 GKE 集群自动扩缩器配置文件

是否可以在 GKE 的区域集群中创建仅限区域的节点池?

GKE terraform 的标签更改导致整个集群崩溃