宕机处理:Kubernetes集群高可用实战总结
Posted 运维
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了宕机处理:Kubernetes集群高可用实战总结相关的知识,希望对你有一定的参考价值。
来自公众号:云加社区
导语 | 在企业生产环境,Kubernetes高可用是一个必不可少的特性,其中最通用的场景就是如何在Kubernetes集群宕机一个节点的情况下保障服务依旧可用。本文对在该场景下实现集群和应用高可用过程中遇到的各种问题进行了梳理和总结,希望与大家一同交流。文章作者,腾讯云架构服务研发工程师。
一、整体架构
二、网络
1. service backend 剔除
(kubelet)node-status-update-frequency:default 10s
(kube-controller-manager)node-monitor-period:default 5s
(kube-controller-manager)node-monitor-grace-period:default 40s
- Amount of time which we allow running Node to be unresponsive
before marking it unhealthy. Must be N times more than kubelet's
nodeStatusUpdateFrequency, where N means number of retries
allowed for kubelet to post node status.
- Currently nodeStatusUpdateRetry is constantly set to 5 in kubelet.go
2. 连接复用
第二,底层调整 TCP ARQ 设置(/proc/sys/net/ipv4/tcp_retries2),缩小超时重试周期,作用于整个集群。
# 0.2+0.4+0.8+1.6+3.2+6.4+12.8+25.6+51.2+102.4 = 222.6s
$ echo 9 > /proc/sys/net/ipv4/tcp_retries2
# 30 + 30*5 = 180s
$ echo 30 > /proc/sys/net/ipv4/tcp_keepalive_time
$ echo 30 > /proc/sys/net/ipv4/tcp_keepalive_intvl
$ echo 5 > /proc/sys/net/ipv4/tcp_keepalive_probes
name: init-sysctl
image: busybox
command:
/bin/sh
-c
|
sysctl -w net.ipv4.tcp_keepalive_time=30
sysctl -w net.ipv4.tcp_keepalive_intvl=30
sysctl -w net.ipv4.tcp_keepalive_probes=5
securityContext:
privileged: true
三、节点驱逐
1. 应用相关
(kube-apiserver)default-not-ready-toleration-seconds:default 300
(kube-apiserver)default-unreachable-toleration-seconds:default 300
Kubernetes automatically adds a toleration for node.kubernetes.io/not-ready and node.kuber- netes.io/unreachable with toleration- Seconds=300, These automatically-added tolerations mean that Pods remain bound to Nodes for 5 minutes after one of these problems is detected. 摘自Kubernetes官方文献
引用中涉及到两个 tolerations [7] :
node.kubernetes.io/not-ready: Node is not ready. This corresponds to the NodeCondition Ready being “False”;
node.kubernetes.io/unreachable: Node is unreachable from the node controller. This corresponds to the NodeCondition Ready being “Unknown”.
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
spec:
...
taints:
- effect: NoSchedule
key: node.kubernetes.io/unreachable
timeAdded: "2020-07-02T03:50:47Z"
- effect: NoExecute
key: node.kubernetes.io/unreachable
timeAdded: "2020-07-02T03:50:53Z"
Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet main- tains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier... 摘自Kubernetes官方文献
tolerations:
- effect: NoSchedule
key: dns
operator: Equal
value: "false"
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
tolerations:
- effect: NoExecute
operator: Exists
2. 存储相关
$ kubectl get pods -o wide
nginx-7b4d5d9fd-bmc8g 0/1 ContainerCreating 0 0s <none> 10.0.0.1 <none> <none>
nginx-7b4d5d9fd-nqgfz 1/1 Terminating 0 19m 192.28.1.165 10.0.0.2 <none> <none>
$ kubectl describe pods/nginx-7b4d5d9fd-bmc8g
[...truncate...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m5s default-scheduler Successfully assigned default/nginx-7b4d5d9fd-bmc8g to 10.0.0.1
Warning FailedAttachVolume 3m5s attachdetach-controller Multi-Attach error for volume "pvc-7f68c087-9e56-11ea-a2ef-5254002f7cc9" Volume is already used by pod(s) nginx-7b4d5d9fd-nqgfz
Warning FailedMount 62s kubelet, 10.0.0.1 Unable to mount volumes for pod "nginx-7b4d5d9fd-bmc8g_default(bb5501ca-9fea-11ea-9730-5254002f7cc9)": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx-7b4d5d9fd-nqgfz". list of unmounted volumes=[nginx-data]. list of unattached volumes=[root-certificate default-token-q2vft nginx-data]
Multi-Attach error for volume "pvc-7f68c087-9e56-11ea-a2ef-5254002f7cc9" Volume
is already used by pod(s) nginx-7b4d5d9fd-nqgfz
$ kubectl delete pods/nginx-7b4d5d9fd-nqgfz --force --grace-period=0
W0811 04:01:25.024422 1 reconciler.go:328] Multi-Attach error for volume "pvc-e97c6ce6-d8a6-11ea-b832-7a866c097df1" (UniqueName: "kubernetes.io/rbd/k8s:kubernetes-dynamic-pvc-f35fc6fa-d8a6-11ea-bd98-aeb6842de1e3") from node "10.0.0.3" Volume is already exclusively attached to node 10.0.0.2 and can't be attached to another
I0811 04:01:25.024480 1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"default-nginx-6584f7ddb7-jx9s2", UID:"5322240f-db87-11ea-b832-7a866c097df1", APIVersion:"v1", ResourceVersion:"28275287", FieldPath:""}): type: 'Warning' reason: 'FailedAttachVolume' Multi-Attach error for volume "pvc-e97c6ce6-d8a6-11ea-b832-7a866c097df1" Volume is already exclusively attached to one node and can't be attached to another
W0811 04:07:25.047767 1 reconciler.go:232] attacherDetacher.DetachVolume started for volume "pvc-e97c6ce6-d8a6-11ea-b832-7a866c097df1" (UniqueName: "kubernetes.io/rbd/k8s:kubernetes-dynamic-pvc-f35fc6fa-d8a6-11ea-bd98-aeb6842de1e3") on node "10.0.0.2" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
I0811 04:07:25.047860 1 operation_generator.go:500] DetachVolume.Detach succeeded for volume "pvc-e97c6ce6-d8a6-11ea-b832-7a866c097df1" (UniqueName: "kubernetes.io/rbd/k8s:kubernetes-dynamic-pvc-f35fc6fa-d8a6-11ea-bd98-aeb6842de1e3") on node "10.0.0.2"
I0811 04:07:25.148094 1 reconciler.go:288] attacherDetacher.AttachVolume started for volume "pvc-e97c6ce6-d8a6-11ea-b832-7a866c097df1" (UniqueName: "kubernetes.io/rbd/k8s:kubernetes-dynamic-pvc-f35fc6fa-d8a6-11ea-bd98-aeb6842de1e3") from node "10.0.0.3"
I0811 04:07:25.148180 1 operation_generator.go:377] AttachVolume.Attach succeeded for volume "pvc-e97c6ce6-d8a6-11ea-b832-7a866c097df1" (UniqueName: "kubernetes.io/rbd/k8s:kubernetes-dynamic-pvc-f35fc6fa-d8a6-11ea-bd98-aeb6842de1e3") from node "10.0.0.3"
I0811 04:07:25.148266 1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"default-nginx-6584f7ddb7-jx9s2", UID:"5322240f-db87-11ea-b832-7a866c097df1", APIVersion:"v1", ResourceVersion:"28275287", FieldPath:""}): type: 'Normal' reason: 'SuccessfulAttachVolume' AttachVolume.Attach succeeded for volume "pvc-e97c6ce6-d8a6-11ea-b832-7a866c097df1"
--attach-detach-reconcile-max-wait-unmount-duration duration maximum amount of time the attach detach controller will wait for a volume to be safely unmounted from its node. Once this time has expired, the controller will assume the node or kubelet are unresponsive and will detach the volume anyway. (default 6m0s)
四、存储
1. 系统存储 - etcd
2. 应用存储 - persistent volume
五、应用
1. Stateless Application
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
selector:
matchLabels:
app: web-store
replicas: 3
template:
metadata:
labels:
app: web-store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-store
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-app
image: nginx:1.16-alpine
2. Stateful Application
结语
参考资料:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/#stacked-etcd-topology
https://github.com/kubernetes/client-go/tree/master/examples/leader-election
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#more-practical-use-cases
https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables
https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables
https://duyanghao.github.io/kubernetes-ha-http-keep-alive-bugs/
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-based-evictions
https://github.com/kubernetes/kubernetes/issues/55713#issuecomment-518340883
https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#statefulset-considerations
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md#current-guarantees-for-pod-lifecycle
https://github.com/kubernetes/kubernetes/pull/93776
https://www.infoq.cn/article/raft-paper/
https://ramcloud.atlassian.net/wiki/download/attachments/6586375/raft.pdf
https://github.com/kubernetes/enhancements/pull/1116/files
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md#recommended-mechanism-for-deploying-csi-drivers-on-kubernetes
https://github.com/container-storage-interface/spec/blob/master/spec.md
●输入m获取文章目录
Linux学习
以上是关于宕机处理:Kubernetes集群高可用实战总结的主要内容,如果未能解决你的问题,请参考以下文章