k8s____pod调度
Posted X糊涂仙儿
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了k8s____pod调度相关的知识,希望对你有一定的参考价值。
一、污点容忍度
Taint 与 Toleration配合使用,除非pod明确能容忍这些污点,否则无法在这些Node节点上运行。Toleration 是Pod的属性,让Pod可以运行在有污点Taint的节点上。
#添加污点、删除污点
kubectl taint node [主机] key=value:[effect]
kubectl taint node [主机] key=value:[effect]- #删除污点
#含义,effect值大小写需一致
- effect值: NoSchedule | PreferNoSchedule | NoExecute
- NoSchedule: 一定不被调度
- PreferNoSchedule: 尽量不要调度
- NoExecute: 禁止调度并驱除已有pod
1、配置污点
#配置PreferNoSchedule,但是可以调度
[root@km01 ~]# kubectl taint node km01 key_1=value_1:PreferNoSchedule
[root@km01 ~]# kubectl describe node km01 | grep Taint
Taints: key_1=value_1:PreferNoSchedule
配置拉起nacos,nacos-2运行在km01节点
[root@km01 tomcat]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nacos-0 1/1 Running 0 49m 10.244.2.15 node01 <none> <none>
nacos-1 1/1 Running 0 49m 10.244.0.15 node02 <none> <none>
nacos-2 1/1 Running 0 39m 10.244.1.14 km01 <none> <none>
#配置NoExecute,不允许调度并且驱逐已有pod
[root@km01 tomcat]# kubectl taint node km01 key_1=value_1:NoExecute
[root@km01 tomcat]# kubetl describe node km01 | grep Taint
Taints: key_1=value_1:NoExecute
查看,nacos-2已被调度至km02节点
[root@km01 tomcat]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql-x6slw 1/1 Running 1 19h 10.244.3.11 km02 <none> <none>
nacos-0 1/1 Running 0 60m 10.244.2.15 node01 <none> <none>
nacos-1 1/1 Running 0 60m 10.244.0.15 node02 <none> <none>
nacos-2 1/1 Running 0 3m12s 10.244.3.13 km02 <none> <none>
nfs-client-provisioner-9d8f7f9c5-8d2qh 1/1 Running 1 19h 10.244.0.13 node02 <none> <none>
#配置NoSchedule,不允许调度pod至该节点
[root@km01 tomcat]# kubectl taint node km01 key_1=value_1:NoSchedule
node/km01 tainted
2、配置容忍度
pod在spec字段配置容忍度可以使pod能调度到相应的配有污点的node节点(与污点策略要匹配)
km01打污点不允许调度至此节点
1> 在tomcat中配置
#在deployment中加入tolerations,保证可调度至该节点
kubectl apply -f tomcat-deployment.yaml
[root@km01 tomcat]# cat tomcat-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-tomcat
spec:
selector:
matchLabels:
app: my-tomcat
replicas: 8
template:
metadata:
labels:
app: my-tomcat
spec:
tolerations: #配置容忍度
- key: "key_1"
operator: "Equal"
value: "value_1"
effect: "NoSchedule"
containers:
- name: my-tomcat
image: tomcat
ports:
- containerPort: 8080
#可调度
[root@km01 tomcat]# kubectl get pods -o wide | grep tomcat | grep -v node01 | grep -v node02 | grep -v km02
my-tomcat-6995474cf9-c88v8 1/1 Running 0 6m19s 10.244.1.15 km01 <none> <none>
my-tomcat-6995474cf9-pxm59 1/1 Running 0 6m19s 10.244.1.16 km01 <none> <none>
2> 格式
spec:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoExecute"
tolerationSeconds: 10s
#如果node节点添加NoExecute,那该节点上所有pod(无toleration)会被立刻驱逐。此处tolerationSeconds 表示node节点添加Noexecute后,pod(无toleration)还能运行多久才会被驱逐
或者
spec:
tolerations:
- key: "key"
operator: "Exists"
effect: "NoSchedule"
- operator 值为Exists 无需指定value字段
- operator 值为Equal key:value 字段都需要有
- 空key配合Exists 能匹配所有的键和值
- 空的effect匹配所有的effect
- 同一node节点允许多个Taints,pod也允许多个Toleration
3> 设置多个污点,pod只配置一个容忍
多个污点
Taints: key_1=value_1:NoExecute
key_1=value_1:NoSchedule
tomcat-deployment使用原来配置,只匹配一个污点
spec:
tolerations:
- key: "key_1"
operator: "Equal"
value: "value_1"
effect: "NoSchedule"
kubectl apply -f tomcat-deployment | 多污点只容忍度只匹配一个pod无法调度
[root@km01 tomcat]# kubectl get pods -o wide | grep km01
无km01
4> 多污点,pod都需要匹配所有污点并配置容忍
多key-value
Taints: key_1=value_1:NoExecute
key_1=value_1:NoSchedule
tomcat deployment加入容忍
spec:
tolerations:
- key: "key_1"
operator: "Equal"
value: "value_1"
effect: "NoSchedule"
- key: "key_1"
operator: "Exists"
effect: "NoExecute"
kubectl apply -f tomcat-deployment | km01可以被调度
3、节点独占
4、内置污点
1> 一些内置污点
Kubernetes v1.18 [stable]
- node.kubernetes.io/not-ready: 节点未加入集群
- node.kubernetes.io/unreachable:无法访问master
- node.kubernetes.io/memory-pressure: 节点内存压力
- node.kubernetes.io/disk-pressure: 节点有磁盘压力
- node.kubernetes.io/pid-pressure: 节点有PID压力
- node.kubernetes.io/network-unavailable: 节点的网络不可用
- node.kubernetes.io/unschedulable: 节点不可调度
2> 节点故障污点
node节点结合 tolerationSeconds,Pod 就可以指定当节点出现一个或全部上述问题时还将在这个节点上运行多长的时间。Kubernetes会自动添加容忍度如下:
node.kubernetes.io/not-ready
node.kubernetes.io/unreachable
tolerationSeconds=300
当出现not-ready或者unreachable后,Pod 仍会与该节点绑定 5 分钟
tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300 #系统默认时长
二、nodeselector
将pod调度到指定Node,通过Node标签和Pod的nodeSelector进行绑定,kubectl label命令给目标Node打上标签。根据分组可以把相应node节点进行分组,比如group=a1 group=b2 group=c3
1、打标签、删除标签
[root@km01 tomcat]# kubectl label nodes node01 group=b1
node/node01 labeled
[root@km01 ~]# kubectl describe node node01
Name: node01
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
group=b1
#查看label
[root@km01 tomcat]# kubectl get node -l group=b1
NAME STATUS ROLES AGE VERSION
node01 Ready <none> 5d3h v1.18.6
[root@km01 ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
km01 Ready <none> 9d v1.18.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=km01,kubernetes.io/os=linux
km02 Ready <none> 8d v1.18.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=km02,kubernetes.io/os=linux
node01 Ready <none> 9d v1.18.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,group=b1,kubernetes.io/arch=amd64,kubernetes.io/hostname=node01,kubernetes.io/os=linux
node02 Ready <none> 7d18h v1.18.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node02,kubernetes.io/os=linux
#删除label
[root@km01 ~]# kubectl label node node01 group-
node/node01 labeled
[root@km01 ~]# kubectl get node -l group=b1
No resources found in default namespace.
2、pod定向到nodeselector节点
在deployment中加入nodeSelector定向选择node节点
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-tomcat
spec:
selector:
matchLabels:
app: my-tomcat
replicas: 3
template:
metadata:
labels:
app: my-tomcat
spec:
containers:
- name: my-tomcat
image: tomcat
ports:
- containerPort: 8080
nodeSelector:
group: b1
如下:
[root@km01 tomcat]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
my-tomcat-8b8bcbdbc-6s2hg 1/1 Running 0 109s 10.244.2.40 node01 <none> <none>
my-tomcat-8b8bcbdbc-mrwl4 1/1 Running 0 109s 10.244.2.39 node01 <none> <none>
my-tomcat-8b8bcbdbc-z4vjt 1/1 Running 0 109s 10.244.2.38 node01 <none> <none>
三、Node亲和性
替换nodeselector的调度策略
- RequiredDuringSchedulingIgnoredDuringExecution: 必须满足条件才可调度到node(硬性条件)
- PreferredDuringSchedulingIgnoredDuringExecution: 调度器会尝试调度pod到Node上,不强求,软限制。多个规则可通过优先级(weight)以定义运行先后顺序
- IgnoredDuringExecution: 如果node标签发生了变更,不再符合pod节点的亲和性需求,则系统将忽略node上的label变化,pod则能继续运行
1、打标签 disktype=ssd
node02打标签设置为ssd类型
[root@km01 ~]# kubectl label node node02 disktype=ssd
node/node02 labeled
[root@km01 ~]# kubectl get nodes --show-labels | grep node02
node02 Ready <none> 7d19h v1.18.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node02,kubernetes.io/os=linux
2、添加亲和性nodeaffinity
vim tomcat-deployment
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype #disktype
operator: In
values:
- ssd #ssd
全部运行在node02节点
[root@km01 tomcat]# kubectl get pods -o wide |grep tomcat
my-tomcat-597775ddbd-mgtm9 1/1 Running 0 42s 10.244.0.39 node02 <none> <none>
my-tomcat-597775ddbd-nmxfh 1/1 Running 0 42s 10.244.0.40 node02 <none> <none>
my-tomcat-597775ddbd-vtv9v 1/1 Running 0 42s 10.244.0.41 node02 <none> <none>
3、nodeAffinity、nodeSelectorTerms、Match搭配
- 同时批定nodeSelector 和 nodeAffinity,两者必须都要满足, 才能将 Pod 调度到候选节点上。
- nodeAffinity类型关联的nodeSelectorTerms有多个,只要有其中一个 nodeSelectorTerms满足的话,pod将可以调度到节点上。
- nodeSelectorTerms关联的matchExpressions条件有多个,只有当所有 matchExpressions 满足的话,Pod 才会可以调度到节点上。
- 如果修改或删除pod所调度到的节点的标签,Pod不会被删除。亲和性选择只在Pod调度期间有效
四、pod亲和性与反亲和性
1、含义
根据在节点上正在运行的Pod的标签(而不是节点的标签进行判断和调度)要求对节点和Pod两个条件进行匹配。这种规则可以描述为:如果在具有标签X的Node上运行了一个或者多个符合条件Y的Pod,那么Pod应该(如果是互斥的情况,那么就变成拒绝)运行在这个Node上。这里X指的是一个集群中的节点、区域(zone)和地域(region)等概念,通过Kubernetes内置节点标签中的key来进行声明。这个key的名字为topologyKey,意为表达节点所属的topology范围。
- kubernetes.io/hostname
- kubernetes.io/arch
- kubernetes.io/os
- failure-domain.beta.kubernetes.io/zone
- failure-domain.beta.kubernetes.io/region
其中podAffinity用于调度pod可以和哪些pod部署在同一拓扑结构之下。而podAntiAffinity相反,其用于规定pod不可以和哪些pod部署在同一拓扑结构下。通过pod affinity与anti-affinity来解决pod和pod之间的关系。
2、创建podAntiAffinity redis-deployment
运行带有podAntiAffinity的redisd-eployment,3副本和标签选择器,确保不会将其pod调度到一个节点上
#redis-deployment
[root@km01 ~]# cat redis-affinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-cache
spec:
selector:
matchLabels:
app: store
replicas: 3
template:
metadata:
labels:
app: store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: redis-server
image: redis:3.2
### 3、创建web-deployment with podAnti podAffi
web-server与redis会有通信,建立pod亲和性确保web-pod与redis-pod运行在同一节点,建立pod反亲和性确保web-pod不会同时运行在同一节点
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
selector:
matchLabels:
app: web-store
replicas: 3
template:
metadata:
labels:
app: web-store
spec:
affinity:
podAntiAffinity: #反亲和性
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:- key: app
operator: In
values:- web-store
topologyKey: "kubernetes.io/hostname"
podAffinity: #亲和性
requiredDuringSchedulingIgnoredDuringExecution:
- web-store
- key: app
- labelSelector:
matchExpressions:- key: app
operator: In
values:- store
topologyKey: "kubernetes.io/hostname"
containers:- name: web-app
image: nginx
- name: web-app
- store
- key: app
以上是关于k8s____pod调度的主要内容,如果未能解决你的问题,请参考以下文章
Kubernetes_14_静态Pod网关apiserver底层都是restful接口
Kubernetes组件_Scheduler_02_二次调度
Kubernetes组件_Scheduler_02_二次调度