部署k8s ssl集群实践4:部署etcd集群

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了部署k8s ssl集群实践4:部署etcd集群相关的知识,希望对你有一定的参考价值。

参考文档:
https://github.com/opsnull/follow-me-install-kubernetes-cluster
感谢作者的无私分享。
集群环境已搭建成功跑起来。
文章是部署过程中遇到的错误和详细操作步骤记录。如有需要对比参考,请按照顺序阅读和测试。

4.1
下载和分发二进制安装包

[[email protected] kubernetes]# wget https://github.com/coreos/etcd/releases/download/v3.3.7/etcd-v3.3.7-linux-amd64.tar.gz
[[email protected] kubernetes]# ls
etcd-v3.3.7-linux-amd64.tar.gz? kubernetes? kubernetes-client-linux-amd64.tar.gz? kubernetes-src.tar.gz
[[email protected] kubernetes]#
[[email protected] kubernetes]# tar zxvf etcd-v3.3.7-linux-amd64.tar.gz
[[email protected] kubernetes]# ls
etcd-v3.3.7-linux-amd64?

分发到所有节点

[[email protected] kubernetes]# cp etcd-v3.3.7-linux-amd64/etcd* /opt/k8s/bin
[[email protected] kubernetes]# scp etcd-v3.3.7-linux-amd64/etcd* [email protected]:/opt/k8s/bin
etcd? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 100%?? 18MB? 91.6MB/s?? 00:00? ?
etcdctl? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 100%?? 15MB? 96.1MB/s?? 00:00? ?
[[email protected] kubernetes]# scp etcd-v3.3.7-linux-amd64/etcd* [email protected]:/opt/k8s/bin
etcd? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 100%?? 18MB? 92.2MB/s?? 00:00? ?
etcdctl? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 100%?? 15MB? 92.3MB/s?? 00:00? ?
[[email protected] kubernetes]#

4.2
创建etcd证书和私钥

创建证书签名请求

[[email protected] etcd]# cat etcd-csr.json
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"192.168.1.92",
"192.168.1.93",
"192.168.1.95"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "SZ",
"L": "SZ",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
[[email protected] etcd]#

hosts 字段指定授权使用该证书的 etcd 节点 IP 或域名列表,这里将 etcd 集群的三
个节点 IP 都列在其中

生成证书和私钥

[[email protected] etcd]# cfssl gencert -ca=/etc/kubernetes/cert/ca.pem -ca-key=/etc/kubernetes/cert/ca-key.pem -config=/etc/kubernetes/cert/ca-config.json -profile=kubernetes etcd-csr.json | cfssljson -bare etcd
[[email protected] etcd]# ls
etcd.csr? etcd-csr.json? etcd-key.pem? etcd.pem
[[email protected] etcd]#

分发证书和私钥到节点

[[email protected] etcd]# cp etcd* /etc/etcd/cert/
[[email protected] etcd]# scp etcd* [email protected]:/etc/etcd/cert/
etcd.csr? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 100% 1054? ?? 1.5MB/s?? 00:00? ?
etcd-csr.json? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 100%? 213?? 350.8KB/s?? 00:00? ?
etcd-key.pem? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 100% 1679? ?? 2.5MB/s?? 00:00? ?
etcd.pem? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 100% 1415? ?? 2.3MB/s?? 00:00? ?
[[email protected] etcd]# scp etcd* [email protected]:/etc/etcd/cert/
etcd.csr? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 100% 1054? ?? 1.2MB/s?? 00:00? ?
etcd-csr.json? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 100%? 213?? 296.9KB/s?? 00:00? ?
etcd-key.pem? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 100% 1679? ?? 2.6MB/s?? 00:00? ?
etcd.pem? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 100% 1415? ?? 2.5MB/s?? 00:00? ?
[[email protected] etcd]#

4.3
创建etcd的systemd unit模块文件
注意: 这个符号需改成?

[[email protected] etcd]# cat etcd.service.template
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
User=k8s
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/opt/k8s/bin/etcd --data-dir=/var/lib/etcd --name=##NODE_NAME## --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem --trusted-ca-file=/etc/kubernetes/cert/ca.pem --peer-cert-file=/etc/etcd/cert/etcd.pem --peer-key-file=/etc/etcd/cert/etcd-key.pem --peer-trusted-ca-file=/etc/kubernetes/cert/ca.pem --peer-client-cert-auth --client-cert-auth --listen-peer-urls=https://##NODE_IP##:2380 --initial-advertise-peer-urls=https://##NODE_IP##:2380 --listen-client-urls=https://##NODE_IP##:2379,http://127.0.0.1:2379
--advertise-client-urls=https://##NODE_IP##:2379 --initial-cluster-token=etcd-cluster-0 --initial-cluster=${ETCD_NODES} --initial-cluster-state=new
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[[email protected] etcd]#

User :指定以 k8s 账户运行;
WorkingDirectory 、 --data-dir :指定工作目录和数据目录为
/var/lib/etcd ,需在启动服务前创建这个目录;
--name :指定节点名称,当 --initial-cluster-state 值为 new 时, --
name 的参数值必须位于 --initial-cluster 列表中;
--cert-file 、 --key-file :etcd server 与 client 通信时使用的证书和私钥;
--trusted-ca-file :签名 client 证书的 CA 证书,用于验证 client 证书;
--peer-cert-file 、 --peer-key-file :etcd 与 peer 通信使用的证书和私
钥;
--peer-trusted-ca-file :签名 peer 证书的 CA 证书,用于验证 peer 证书;

分发生成的 systemd unit 文件,并修改好各节点配置文件里的##NODE_NAME##和##NODE_IP##

[[email protected] etcd]# cp etcd.service.template /etc/systemd/system/etcd.service
[[email protected] etcd]# scp etcd.service.template [email protected]:/etc/systemd/system/etcd.service
etcd.service.template? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 100% 1038? ?? 1.1MB/s?? 00:00? ?
[[email protected] etcd]# scp etcd.service.template [email protected]:/etc/systemd/system/etcd.service
etcd.service.template? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 100% 1038? ?? 1.2MB/s?? 00:00? ?
[[email protected] etcd]#

##各个节点修改下

4.4
启动etcd

[[email protected] ~]# systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd

启动报错

Aug 20 16:40:29 k8s-master systemd: etcd.service holdoff time over, scheduling restart.
Aug 20 16:40:29 k8s-master systemd: Starting Etcd Server...
Aug 20 16:40:29 k8s-master etcd: etcd Version: 3.3.7
Aug 20 16:40:29 k8s-master etcd: Git SHA: 56536de55
Aug 20 16:40:29 k8s-master etcd: Go Version: go1.9.6
Aug 20 16:40:29 k8s-master etcd: Go OS/Arch: linux/amd64
Aug 20 16:40:29 k8s-master etcd: setting maximum number of CPUs to 1, total number of available CPUs is 1
Aug 20 16:40:29 k8s-master etcd: peerTLS: cert = /etc/etcd/cert/etcd.pem, key = /etc/etcd/cert/etcd-key.pem, ca = , trusted-ca = /etc/kubernetes/cert/ca.pem, client-cert-auth = true, crl-file =
Aug 20 16:40:29 k8s-master etcd: open /etc/etcd/cert/etcd-key.pem: permission denied
Aug 20 16:40:29 k8s-master systemd: etcd.service: main process exited, code=exited, status=1/FAILURE
Aug 20 16:40:29 k8s-master systemd: Failed to start Etcd Server.
Aug 20 16:40:29 k8s-master systemd: Unit etcd.service entered failed state.
Aug 20 16:40:29 k8s-master systemd: etcd.service failed.
[[email protected] ~]#

明显 ?/etc/etcd/cert/etcd-key.pem: permission denied ?没有权限

[[email protected] cert]# pwd
/etc/etcd/cert
[[email protected] cert]# ll
总用量 16
-rw-r--r-- 1 root root 1054 8月? 20 15:39 etcd.csr
-rw-r--r-- 1 root root? 213 8月? 20 15:39 etcd-csr.json
-rw------- 1 root root 1679 8月? 20 15:39 etcd-key.pem
-rw-r--r-- 1 root root 1415 8月? 20 15:39 etcd.pem
[[email protected] cert]#

我们启用启动etcd的用户是k8s,而且这里没有x的权限。
修改权限设置

[[email protected] etc]# chown -R k8s /etc/etcd/cert/
[[email protected] cert]# chmod +x -R /etc/etcd/cert/
[[email protected] cert]# ll
总用量 16
-rwxr-xr-x 1 k8s root 1054 8月? 20 15:39 etcd.csr
-rwxr-xr-x 1 k8s root? 213 8月? 20 15:39 etcd-csr.json
-rwx--x--x 1 k8s root 1679 8月? 20 15:39 etcd-key.pem
-rwxr-xr-x 1 k8s root 1415 8月? 20 15:39 etcd.pem

/etc/kubernetes/cert/?权限也不对

[[email protected] cert]# cd /etc/kubernetes/cert/
[[email protected] cert]# ll
总用量 20
-rw-r--r-- 1 root root? 292 8月? 16 16:05 ca-config.json
-rw-r--r-- 1 root root? 993 8月? 16 16:05 ca.csr
-rw-r--r-- 1 root root? 201 8月? 16 16:05 ca-csr.json
-rw------- 1 root root 1675 8月? 16 16:05 ca-key.pem
-rw-r--r-- 1 root root 1338 8月? 16 16:05 ca.pem

[[email protected] kubernetes]# chown -R k8s /etc/kubernetes/cert/
[[email protected] kubernetes]# chmod -R +x /etc/kubernetes/cert

正常启动的配置文件,见下:

[[email protected] cert]# cat /etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https:github.com/coreos
[Service]
User=k8s
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/opt/k8s/bin/etcd --data-dir=/var/lib/etcd --name=k8s-master --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem --trusted-ca-file=/etc/kubernetes/cert/ca.pem --peer-cert-file=/etc/etcd/cert/etcd.pem --peer-key-file=/etc/etcd/cert/etcd-key.pem --peer-trusted-ca-file=/etc/kubernetes/cert/ca.pem --peer-client-cert-auth --client-cert-auth --listen-peer-urls=https://192.168.1.92:2380 --initial-advertise-peer-urls=https://192.168.1.92:2380 --listen-client-urls=https://192.168.1.92:2379,http://127.0.0.1:2379 --advertise-client-urls=https://192.168.1.92:2379 --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-master=https://192.168.1.92:2380,k8s-node1=https://192.168.1.93:2380,k8s-node2=https://192.168.1.95:2380 --initial-cluster-state=new
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[[email protected] cert]#

4.5
验证etcd集群
报错:

[[email protected] ~]# etcdctl cluster-health
failed to check the health of member 64fe8a986fbba907 on https://192.168.1.95:2379: Get https://192.168.1.95:2379/health: dial tcp 192.168.1.95:2379: getsockopt: no route to host
member 64fe8a986fbba907 is unreachable: [https://192.168.1.95:2379] are all unreachable
failed to check the health of member 9eddf87b04c89943 on https://192.168.1.93:2379: Get https://192.168.1.93:2379/health: dial tcp 192.168.1.93:2379: getsockopt: no route to host
member 9eddf87b04c89943 is unreachable: [https://192.168.1.93:2379] are all unreachable
failed to check the health of member d71352a6aad35c57 on https://192.168.1.92:2379: Get https://192.168.1.92:2379/health: x509: certificate signed by unknown authority
member d71352a6aad35c57 is unreachable: [https://192.168.1.92:2379] are all unreachable
cluster is unavailable
[[email protected] ~]#
[[email protected] ~]# etcdctl member list
client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint https://192.168.1.95:2379 exceeded header timeout
; error #1: client: endpoint https://192.168.1.93:2379 exceeded header timeout
; error #2: x509: certificate signed by unknown authority
[[email protected] ~]#

logs

[[email protected] ~]# cat /var/log/messages

Aug 20 18:06:36 k8s-master etcd: health check for peer 64fe8a986fbba907 could not connect: dial tcp 192.168.1.95:2380: getsockopt: no route to host
Aug 20 18:06:36 k8s-master etcd: health check for peer 9eddf87b04c89943 could not connect: dial tcp 192.168.1.93:2380: getsockopt: no route to host
Aug 20 18:06:36 k8s-master etcd: failed to reach the peerURL(https://192.168.1.95:2380) of member 64fe8a986fbba907 (Get https://192.168.1.95:2380/version: dial tcp 192.168.1.95:2380: getsockopt: no route to host)
Aug 20 18:06:36 k8s-master etcd: cannot get the version of member 64fe8a986fbba907 (Get https://192.168.1.95:2380/version: dial tcp 192.168.1.95:2380: getsockopt: no route to host)
Aug 20 18:06:36 k8s-master etcd: failed to reach the peerURL(https://192.168.1.93:2380) of member 9eddf87b04c89943 (Get https://192.168.1.93:2380/version: dial tcp 192.168.1.93:2380: getsockopt: no route to host)
Aug 20 18:06:36 k8s-master etcd: cannot get the version of member 9eddf87b04c89943 (Get https://192.168.1.93:2380/version: dial tcp 192.168.1.93:2380: getsockopt: no route to host)
Aug 20 18:06:39 k8s-master etcd: rejected connection from "192.168.1.92:50868" (error "remote error: tls: bad certificate", ServerName "")
Aug 20 18:06:40 k8s-master etcd: failed to reach the peerURL(https://192.168.1.95:2380) of member 64fe8a986fbba907 (Get https://192.168.1.95:2380/version: dial tcp 192.168.1.95:2380: getsockopt: no route to host)
Aug 20 18:06:40 k8s-master etcd: cannot get the version of member 64fe8a986fbba907 (Get https://192.168.1.95:2380/version: dial tcp 192.168.1.95:2380: getsockopt: no route to host)
Aug 20 18:06:40 k8s-master etcd: failed to reach the peerURL(https://192.168.1.93:2380) of member 9eddf87b04c89943 (Get https://192.168.1.93:2380/version: dial tcp 192.168.1.93:2380: getsockopt: no route to host)
Aug 20 18:06:40 k8s-master etcd: cannot get the version of member 9eddf87b04c89943 (Get https://192.168.1.93:2380/version: dial tcp 192.168.1.93:2380: getsockopt: no route to host)
Aug 20 18:06:41 k8s-master etcd: health check for peer 64fe8a986fbba907 could not connect: dial tcp 192.168.1.95:2380: getsockopt: no route to host
Aug 20 18:06:41 k8s-master etcd: health check for peer 9eddf87b04c89943 could not connect: dial tcp 192.168.1.93:2380: getsockopt: no route to host
Aug 20 18:06:42 k8s-master etcd: rejected connection from "192.168.1.92:50902" (error "remote error: tls: bad certificate", ServerName "")
Aug 20 18:06:44 k8s-master etcd: failed to reach the peerURL(https://192.168.1.95:2380) of member 64fe8a986fbba907 (Get https://192.168.1.95:2380/version: dial tcp 192.168.1.95:2380: getsockopt: no route to host)
Aug 20 18:06:44 k8s-master etcd: cannot get the version of member 64fe8a986fbba907 (Get https://192.168.1.95:2380/version: dial tcp 192.168.1.95:2380: getsockopt: no route to host)
[[email protected] ~]#

分析思路:
出问题的可能性:
配置文件配置出错
证书
网络
防火墙屏蔽了端口

一个个来测试

用telnet检查发现2379和2380,防火墙没有关闭。
关闭防火墙再测试,还是报错:

Aug 21 09:04:02 k8s-node1 etcd: rejected connection from "192.168.1.92:36138" (error "remote error: tls: bad certificate", ServerName "")
Aug 21 09:04:19 k8s-node1 etcd: rejected connection from "192.168.1.93:51698" (error "remote error: tls: bad certificate", ServerName "")

[[email protected] ~]# etcdctl cluster-health
failed to check the health of member 64fe8a986fbba907 on https://192.168.1.95:2379: Get https://192.168.1.95:2379/health: x509: certificate signed by unknown authority
member 64fe8a986fbba907 is unreachable: [https://192.168.1.95:2379] are all unreachable
failed to check the health of member 9eddf87b04c89943 on https://192.168.1.93:2379: Get https://192.168.1.93:2379/health: x509: certificate signed by unknown authority
member 9eddf87b04c89943 is unreachable: [https://192.168.1.93:2379] are all unreachable
failed to check the health of member d71352a6aad35c57 on https://192.168.1.92:2379: Get https://192.168.1.92:2379/health: x509: certificate signed by unknown authority
member d71352a6aad35c57 is unreachable: [https://192.168.1.92:2379] are all unreachable
cluster is unavailable

这个报错应该是证书的问题了
找资料发现,如果不带证书测试就是报这个错误,带证书后,测试正常,见下:

[[email protected] cert]# etcdctl --ca-file=/etc/kubernetes/cert/ca.pem --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem --endpoints=https://192.168.1.92:2379,https://192.168.1.93:2379,https://192.168.1.95:2379 cluster-health
member 64fe8a986fbba907 is healthy: got healthy result from https://192.168.1.95:2379
member 9eddf87b04c89943 is healthy: got healthy result from https://192.168.1.93:2379
member d71352a6aad35c57 is healthy: got healthy result from https://192.168.1.92:2379
cluster is healthy
[[email protected] cert]#
[[email protected] ~]# etcdctl? --ca-file=/etc/kubernetes/cert/ca.pem --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem --endpoints=https://192.168.1.92:2379,https://192.168.1.93:2379,https://192.168.1.95:2379? member list
64fe8a986fbba907: name=k8s-node2 peerURLs=https://192.168.1.95:2380 clientURLs=https://192.168.1.95:2379 isLeader=true
9eddf87b04c89943: name=k8s-node1 peerURLs=https://192.168.1.93:2380 clientURLs=https://192.168.1.93:2379 isLeader=false
d71352a6aad35c57: name=k8s-master peerURLs=https://192.168.1.92:2380 clientURLs=https://192.168.1.92:2379 isLeader=false
[[email protected] ~]#

执行命令看看

master创建

[[email protected] cert]# etcdctl --ca-file=/etc/kubernetes/cert/ca.pem --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem mkdir test
[[email protected] cert]# etcdctl --ca-file=/etc/kubernetes/cert/ca.pem --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem mkdir ls
[[email protected] cert]# etcdctl --ca-file=/etc/kubernetes/cert/ca.pem --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem? ls
/test
/ls

node2检索

[[email protected] cert]# etcdctl --ca-file=/etc/kubernetes/cert/ca.pem --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem mkdir test
[[email protected] cert]# etcdctl --ca-file=/etc/kubernetes/cert/ca.pem --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem mkdir ls
[[email protected] cert]# etcdctl --ca-file=/etc/kubernetes/cert/ca.pem --cert-file=/etc/etcd/cert/etcd.pem --key-file=/etc/etcd/cert/etcd-key.pem? ls
/test
/ls

数据同步了

4.6
执行文件的属主和有没有执行x的权限,请小心对比检查。

以上是关于部署k8s ssl集群实践4:部署etcd集群的主要内容,如果未能解决你的问题,请参考以下文章

基于k8s集群部署prometheus监控etcd

k8s之二进制安装etcd集群

部署k8s ssl集群实践3:部署kubectl命令工具行

使用kubeadm部署k8s集群02-配置etcd高可用

部署k8s ssl集群实践11:work节点配置flanneld

部署k8s ssl集群实践1:基础环境准备