k8s____pod调度

Posted X糊涂仙儿

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了k8s____pod调度相关的知识,希望对你有一定的参考价值。

一、污点容忍度

Taint 与 Toleration配合使用,除非pod明确能容忍这些污点,否则无法在这些Node节点上运行。Toleration 是Pod的属性,让Pod可以运行在有污点Taint的节点上。

#添加污点、删除污点
kubectl taint node [主机] key=value:[effect]
kubectl taint node [主机] key=value:[effect]-  #删除污点
#含义,effect值大小写需一致
- effect值: NoSchedule | PreferNoSchedule | NoExecute
- NoSchedule: 一定不被调度
- PreferNoSchedule: 尽量不要调度
- NoExecute: 禁止调度并驱除已有pod

1、配置污点

#配置PreferNoSchedule,但是可以调度
[root@km01 ~]# kubectl taint node km01 key_1=value_1:PreferNoSchedule
[root@km01 ~]# kubectl describe node km01 | grep Taint
Taints:             key_1=value_1:PreferNoSchedule
配置拉起nacos,nacos-2运行在km01节点
[root@km01 tomcat]# kubectl get pods -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
nacos-0                                  1/1     Running   0          49m   10.244.2.15   node01   <none>           <none>
nacos-1                                  1/1     Running   0          49m   10.244.0.15   node02   <none>           <none>
nacos-2                                  1/1     Running   0          39m   10.244.1.14   km01     <none>           <none>
#配置NoExecute,不允许调度并且驱逐已有pod
[root@km01 tomcat]# kubectl taint node km01 key_1=value_1:NoExecute
[root@km01 tomcat]# kubetl describe node km01 | grep Taint
Taints:             key_1=value_1:NoExecute
查看,nacos-2已被调度至km02节点
[root@km01 tomcat]# kubectl get pods -o wide
NAME                                     READY   STATUS    RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
mysql-x6slw                              1/1     Running   1          19h     10.244.3.11   km02     <none>           <none>
nacos-0                                  1/1     Running   0          60m     10.244.2.15   node01   <none>           <none>
nacos-1                                  1/1     Running   0          60m     10.244.0.15   node02   <none>           <none>
nacos-2                                  1/1     Running   0          3m12s   10.244.3.13   km02     <none>           <none>
nfs-client-provisioner-9d8f7f9c5-8d2qh   1/1     Running   1          19h     10.244.0.13   node02   <none>           <none>
#配置NoSchedule,不允许调度pod至该节点
[root@km01 tomcat]# kubectl taint node km01 key_1=value_1:NoSchedule
node/km01 tainted

2、配置容忍度

pod在spec字段配置容忍度可以使pod能调度到相应的配有污点的node节点(与污点策略要匹配)
km01打污点不允许调度至此节点

1> 在tomcat中配置
#在deployment中加入tolerations,保证可调度至该节点

kubectl apply -f tomcat-deployment.yaml

[root@km01 tomcat]# cat tomcat-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-tomcat
spec:
  selector:
   matchLabels:  
        app: my-tomcat
  replicas: 8
  template:
    metadata:
      labels:
        app: my-tomcat
    spec:
      tolerations:      #配置容忍度
      - key: "key_1"
        operator: "Equal"
        value: "value_1"
        effect: "NoSchedule"
      containers:
      - name: my-tomcat
        image: tomcat
        ports:
        - containerPort: 8080
#可调度
[root@km01 tomcat]# kubectl get pods -o wide | grep tomcat | grep  -v node01 | grep -v node02 | grep -v km02
my-tomcat-6995474cf9-c88v8               1/1     Running   0          6m19s   10.244.1.15   km01     <none>           <none>
my-tomcat-6995474cf9-pxm59               1/1     Running   0          6m19s   10.244.1.16   km01     <none>           <none>

2> 格式

spec:
      tolerations:
      - key: "key"
        operator: "Equal"
        value: "value"
        effect: "NoExecute"
        tolerationSeconds: 10s 
#如果node节点添加NoExecute,那该节点上所有pod(无toleration)会被立刻驱逐。此处tolerationSeconds 表示node节点添加Noexecute后,pod(无toleration)还能运行多久才会被驱逐
或者
spec:
      tolerations:
      - key: "key"
        operator: "Exists"
        effect: "NoSchedule"
  • operator 值为Exists 无需指定value字段
  • operator 值为Equal key:value 字段都需要有
  • 空key配合Exists 能匹配所有的键和值
  • 空的effect匹配所有的effect
  • 同一node节点允许多个Taints,pod也允许多个Toleration

3> 设置多个污点,pod只配置一个容忍
多个污点

Taints:             key_1=value_1:NoExecute
                    key_1=value_1:NoSchedule

tomcat-deployment使用原来配置,只匹配一个污点

 spec:
      tolerations:
      - key: "key_1"
        operator: "Equal"
        value: "value_1"
        effect: "NoSchedule"

kubectl apply -f tomcat-deployment | 多污点只容忍度只匹配一个pod无法调度

[root@km01 tomcat]# kubectl get pods -o wide | grep km01
无km01

4> 多污点,pod都需要匹配所有污点并配置容忍
多key-value

Taints:             key_1=value_1:NoExecute
                    key_1=value_1:NoSchedule

tomcat deployment加入容忍

spec:
      tolerations:
      - key: "key_1"
        operator: "Equal"
        value: "value_1"
        effect: "NoSchedule"
      - key: "key_1"
        operator: "Exists"
        effect: "NoExecute"

kubectl apply -f tomcat-deployment |  km01可以被调度

3、节点独占

4、内置污点

1> 一些内置污点
Kubernetes v1.18 [stable]

- node.kubernetes.io/not-ready: 节点未加入集群
- node.kubernetes.io/unreachable:无法访问master
- node.kubernetes.io/memory-pressure: 节点内存压力
- node.kubernetes.io/disk-pressure: 节点有磁盘压力
- node.kubernetes.io/pid-pressure: 节点有PID压力
- node.kubernetes.io/network-unavailable: 节点的网络不可用
- node.kubernetes.io/unschedulable: 节点不可调度

2> 节点故障污点
node节点结合 tolerationSeconds,Pod 就可以指定当节点出现一个或全部上述问题时还将在这个节点上运行多长的时间。Kubernetes会自动添加容忍度如下:
node.kubernetes.io/not-ready
node.kubernetes.io/unreachable
tolerationSeconds=300
当出现not-ready或者unreachable后,Pod 仍会与该节点绑定 5 分钟

tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300 #系统默认时长

二、nodeselector

将pod调度到指定Node,通过Node标签和Pod的nodeSelector进行绑定,kubectl label命令给目标Node打上标签。根据分组可以把相应node节点进行分组,比如group=a1 group=b2 group=c3

1、打标签、删除标签

[root@km01 tomcat]# kubectl label nodes node01 group=b1
node/node01 labeled

[root@km01 ~]# kubectl describe node node01
Name:               node01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    group=b1

#查看label
[root@km01 tomcat]# kubectl get node -l group=b1
NAME     STATUS   ROLES    AGE     VERSION
node01   Ready    <none>   5d3h    v1.18.6
[root@km01 ~]# kubectl get nodes --show-labels
NAME     STATUS   ROLES    AGE     VERSION   LABELS
km01     Ready    <none>   9d      v1.18.6   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=km01,kubernetes.io/os=linux
km02     Ready    <none>   8d      v1.18.6   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=km02,kubernetes.io/os=linux
node01   Ready    <none>   9d      v1.18.6   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,group=b1,kubernetes.io/arch=amd64,kubernetes.io/hostname=node01,kubernetes.io/os=linux
node02   Ready    <none>   7d18h   v1.18.6   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node02,kubernetes.io/os=linux

#删除label
[root@km01 ~]# kubectl label node node01 group-
node/node01 labeled
[root@km01 ~]# kubectl get node -l group=b1
No resources found in default namespace.

2、pod定向到nodeselector节点

在deployment中加入nodeSelector定向选择node节点

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-tomcat
spec:
  selector:
   matchLabels:  
        app: my-tomcat
  replicas: 3
  template:
    metadata:
      labels:
        app: my-tomcat
    spec:
      containers:
      - name: my-tomcat
        image: tomcat
        ports:
        - containerPort: 8080
      nodeSelector:
        group: b1
如下:
[root@km01 tomcat]# kubectl get pods -o wide
NAME                                     READY   STATUS    RESTARTS   AGE     IP            NODE     NOMINATED NODE   READINESS GATES
my-tomcat-8b8bcbdbc-6s2hg                1/1     Running   0          109s    10.244.2.40   node01   <none>           <none>
my-tomcat-8b8bcbdbc-mrwl4                1/1     Running   0          109s    10.244.2.39   node01   <none>           <none>
my-tomcat-8b8bcbdbc-z4vjt                1/1     Running   0          109s    10.244.2.38   node01   <none>           <none>

三、Node亲和性

替换nodeselector的调度策略

  • RequiredDuringSchedulingIgnoredDuringExecution: 必须满足条件才可调度到node(硬性条件)
  • PreferredDuringSchedulingIgnoredDuringExecution: 调度器会尝试调度pod到Node上,不强求,软限制。多个规则可通过优先级(weight)以定义运行先后顺序
  • IgnoredDuringExecution: 如果node标签发生了变更,不再符合pod节点的亲和性需求,则系统将忽略node上的label变化,pod则能继续运行

1、打标签 disktype=ssd

node02打标签设置为ssd类型

[root@km01 ~]# kubectl label node node02 disktype=ssd
node/node02 labeled  
[root@km01 ~]# kubectl get nodes --show-labels | grep node02
node02   Ready    <none>   7d19h   v1.18.6   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node02,kubernetes.io/os=linux

2、添加亲和性nodeaffinity

vim tomcat-deployment
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: disktype  #disktype
                operator: In
                values:
                - ssd   #ssd 

全部运行在node02节点

[root@km01 tomcat]# kubectl get pods -o wide |grep tomcat
my-tomcat-597775ddbd-mgtm9               1/1     Running   0          42s   10.244.0.39   node02   <none>           <none>
my-tomcat-597775ddbd-nmxfh               1/1     Running   0          42s   10.244.0.40   node02   <none>           <none>
my-tomcat-597775ddbd-vtv9v               1/1     Running   0          42s   10.244.0.41   node02   <none>           <none>

3、nodeAffinity、nodeSelectorTerms、Match搭配

- 同时批定nodeSelector 和 nodeAffinity,两者必须都要满足, 才能将 Pod 调度到候选节点上。
- nodeAffinity类型关联的nodeSelectorTerms有多个,只要有其中一个 nodeSelectorTerms满足的话,pod将可以调度到节点上。
- nodeSelectorTerms关联的matchExpressions条件有多个,只有当所有 matchExpressions 满足的话,Pod 才会可以调度到节点上。
- 如果修改或删除pod所调度到的节点的标签,Pod不会被删除。亲和性选择只在Pod调度期间有效

四、pod亲和性与反亲和性

1、含义

根据在节点上正在运行的Pod的标签(而不是节点的标签进行判断和调度)要求对节点和Pod两个条件进行匹配。这种规则可以描述为:如果在具有标签X的Node上运行了一个或者多个符合条件Y的Pod,那么Pod应该(如果是互斥的情况,那么就变成拒绝)运行在这个Node上。这里X指的是一个集群中的节点、区域(zone)和地域(region)等概念,通过Kubernetes内置节点标签中的key来进行声明。这个key的名字为topologyKey,意为表达节点所属的topology范围。

  • kubernetes.io/hostname
  • kubernetes.io/arch
  • kubernetes.io/os
  • failure-domain.beta.kubernetes.io/zone
  • failure-domain.beta.kubernetes.io/region

    其中podAffinity用于调度pod可以和哪些pod部署在同一拓扑结构之下。而podAntiAffinity相反,其用于规定pod不可以和哪些pod部署在同一拓扑结构下。通过pod affinity与anti-affinity来解决pod和pod之间的关系。


2、创建podAntiAffinity redis-deployment

运行带有podAntiAffinity的redisd-eployment,3副本和标签选择器,确保不会将其pod调度到一个节点上

#redis-deployment
[root@km01 ~]# cat redis-affinity.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: redis-server
        image: redis:3.2

### 3、创建web-deployment with podAnti podAffi
web-server与redis会有通信,建立pod亲和性确保web-pod与redis-pod运行在同一节点,建立pod反亲和性确保web-pod不会同时运行在同一节点

apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
selector:
matchLabels:
app: web-store
replicas: 3
template:
metadata:
labels:
app: web-store
spec:
affinity:
podAntiAffinity: #反亲和性
requiredDuringSchedulingIgnoredDuringExecution:

  • labelSelector:
    matchExpressions:
    • key: app
      operator: In
      values:
      • web-store
        topologyKey: "kubernetes.io/hostname"
        podAffinity: #亲和性
        requiredDuringSchedulingIgnoredDuringExecution:
  • labelSelector:
    matchExpressions:
    • key: app
      operator: In
      values:
      • store
        topologyKey: "kubernetes.io/hostname"
        containers:
        • name: web-app
          image: nginx

以上是关于k8s____pod调度的主要内容,如果未能解决你的问题,请参考以下文章

Kubernetes_14_静态Pod网关apiserver底层都是restful接口

k8s之服务发现

Kubernetes组件_Scheduler_02_二次调度

Kubernetes组件_Scheduler_02_二次调度

Kubernetes组件_Scheduler_02_二次调度

容器技术Docker K8s 45 Serverless Kubernetes(ASK)详解-ECI Pod管理