记一次网络故障——pod间无法通信
Posted jayce9102
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了记一次网络故障——pod间无法通信相关的知识,希望对你有一定的参考价值。
一、背景
- 集群是二进制部署
- 部署完成后一起正常,各种资源对象均可正常创建、
- 部署应用后发现无法跨节点通信,且pod的ip都是172.17.0.0段的
二、排查过程层
- 查看节点路由,发现docker0网卡居然是172.17.0.0段(what?)
- 查找如下资料:基于docker的CNM部署flanel时,需要将/run/flannel/subnet.env作为docker的环境变量,且启动时指定flannel的网段信息
三、解决方案(修改配置文件:/usr/lib/systemd/system/docker.service)
[Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com BindsTo=containerd.service After=network-online.target firewalld.service containerd.service Wants=network-online.target Requires=docker.socket [Service] Type=notify # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker EnvironmentFile=/run/flannel/subnet.env ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS -H fd:// --containerd=/run/containerd/containerd.sock ExecReload=/bin/kill -s HUP $MAINPID TimeoutSec=0 RestartSec=2 Restart=always # Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229. # Both the old, and new location are accepted by systemd 229 and up, so using the old location # to make them work for either version of systemd. StartLimitBurst=3 # Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230. # Both the old, and new name are accepted by systemd 230 and up, so using the old name to make # this option work for either version of systemd. StartLimitInterval=60s # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity # Comment TasksMax if your systemd version does not supports it. # Only systemd 226 and above support this option. TasksMax=infinity # set delegate yes so that systemd does not reset the cgroups of docker containers Delegate=yes # kill only the docker process, not all processes in the cgroup KillMode=process [Install] WantedBy=multi-user.target
调用/run/flannel/subnet.env中的DOCKER_NETWORK_OPTIONS指定pod的网段信息
四、补充
- CNI中,docker0的ip与Pod无关,Pod总是生成的时候才去动态的申请自己的IP
- CNM模式下,Pod的网段在docker engine启动时就已经决定
- 推荐使用CNI模式
以上是关于记一次网络故障——pod间无法通信的主要内容,如果未能解决你的问题,请参考以下文章