k8s node节点重新加入集群失败问题解决

Posted 江晓龙的技术博客

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了k8s node节点重新加入集群失败问题解决相关的知识,希望对你有一定的参考价值。

kubeadm 安装的 k8s 集群 delete node 后重新添加回集群问题解决

1.问题描述

k8smaster节点坏掉,重新部署完k8s-master节点后node节点无法加入集群

报错如下

[root@k8s-node2 ~]# kubeadm join apiserver.demo:6443 --token ou7vjm.oceacziy0m2z69ak     --discovery-token-ca-cert-hash sha256:3c05e8f1d775a126e78a7643d134e2a1cb378907c160fb8d6ca2d24dc0c30f14
[preflight] Running pre-flight checks.
    [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.12.1-ce. Max validated version: 17.03
    [WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Some fatal errors occurred:
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
    [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists12345678

2.解决方法

出现如上问题的主要原因是之前 kubeadm init 初始化过,所以一些配置文件及服务均已存在,重新执行 kubeadm join 时必然
会导致冲突,解决方法如下:

2.1.重新初始化节点配置

执行命令kubeadm reset

[root@k8s-node2 ~]# kubeadm reset
[preflight] Running pre-flight checks.
[reset] Stopping the kubelet service.
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers.
[reset] No etcd manifest found in "/etc/kubernetes/manifests/etcd.yaml". Assuming external etcd.
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]1234567

2.2.重新加入集群

执行 kubeadm join 添加节点到集群(如果 token 失效,到主节点执行:kubeadm token create 重新生成):
kubeadm join --token xxxxx master_ip:6443 --discovery-token-ca-cert-hash sha256:xxxx

[root@k8s-node2 ~]# kubeadm join apiserver.demo:6443 --token ou7vjm.oceacziy0m2z69ak     --discovery-token-ca-cert-hash sha256:3c05e8f1d775a126e78a7643d134e2a1cb378907c160fb8d6ca2d24dc0c30f14
[preflight] Running pre-flight checks.
    [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.12.1-ce. Max validated version: 17.03
    [WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "10.4.37.167:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.4.37.167:6443"
[discovery] Requesting info from "https://10.4.37.167:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.4.37.167:6443"
[discovery] Successfully established connection with API Server "10.4.37.167:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
  was received.
* The Kubelet was informed of the new secure connection details.

Run kubectl get nodes on the master to see this node join the cluster.1234567891011121314151617

PS: k8s 集群 /etc/kubernetes/pki/ca.crt 证书(任何一节点都有该文件) sha256 编码获取(kubeadm join 添加集群节点时需要该证书的 sha256 编码串认证):
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed s/^.* //

到此节点添加回集群了,但是直接执行 kubectl 相关的命令可能还会报如下错误:

[root@k8s-node2 ~]# kubectl get pod
The connection to the server localhost:8080 was refused - did you specify the right host or port?
You have mail in /var/spool/mail/root123

问题原因及解决方法:
很明显 kubelet 加载的配置文件(/etc/kubernetes/kubelet.conf)有问题,可能服务器重启的缘故,启动后该文件丢失了,导致里面的连接 master 节点的配置及其他配置给丢了,因此会默认连接 localhost:8080 端口。解决方法很简单:拷贝其他任一节点的该文件,然后重启 kubelet (systemctl restart kublete)即可。

2.3.查看master节点是否有node的加入

已经恢复

[root@k8s-master ~]# kubectl get node
NAME         STATUS   ROLES    AGE     VERSION
k8s-master   Ready    master   7m      v1.18.6
k8s-node1    Ready    <none>   5m55s   v1.18.6
k8s-node2    Ready    <none>   5m19s   v1.18.6

以上是关于k8s node节点重新加入集群失败问题解决的主要内容,如果未能解决你的问题,请参考以下文章

新增node加入k8s集群失败

k8s -- 集群重启, Node 节点如何重新加入集群

k8s集群节点添加失败,可用以下命令清理后,重新加入节点

k8s node节点剔除与增加

将 master 节点服务器从 k8s 集群中移除并重新加入

关于忘记kubernetes-master节点的token以及让新node节点加入k8s集群的命令的处理方法