Podman pod 几天后消失，但进程仍在运行并在给定端口上侦听

Posted 2023-03-24

技术标签:

【中文标题】Podman pod 几天后消失，但进程仍在运行并在给定端口上侦听【英文标题】：Podman pod disappears after a few days, but process is still running and listening on a given port 【发布时间】：2021-05-25 04:45:55 【问题描述】：

我正在使用 podman play kube 和 yaml 定义的 Podman Pod 运行 Elasticsearch 容器。创建了 Pod，创建了三个节点的集群，一切都按预期工作。但是：Podman pod 在闲置几天后就死掉了。

Podman podman ps 命令说：

ERRO[0000] Error refreshing container af05fafe31f6bfb00c2599255c47e35813ecf5af9bbe6760ae8a4abffd343627: error acquiring lock 1 for container af05fafe31f6bfb00c2599255c47e35813ecf5af9bbe6760ae8a4abffd343627: file exists
ERRO[0000] Error refreshing container b4620633d99f156bb59eb327a918220d67145f8198d1c42b90d81e6cc29cbd6b: error acquiring lock 2 for container b4620633d99f156bb59eb327a918220d67145f8198d1c42b90d81e6cc29cbd6b: file exists
ERRO[0000] Error refreshing pod 389b0c34313d9b23ecea3faa0e494e28413bd15566d66297efa9b5065e025262: error retrieving lock 0 for pod 389b0c34313d9b23ecea3faa0e494e28413bd15566d66297efa9b5065e025262: file exists
POD ID        NAME               STATUS   CREATED     INFRA ID      # OF CONTAINERS
389b0c34313d  elasticsearch-pod  Created  1 week ago  af05fafe31f6  2

奇怪的是，如果我们试图找到在端口 9200 或 9300 上侦听的进程 id，该进程仍在侦听：

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp6       0      0 :::9200                 :::*                    LISTEN      1328607/containers-
tcp6       0      0 :::9300                 :::*                    LISTEN      1328607/containers-

挂起的进程ID（并使进程仍在侦听）：

user+ 1339220  0.0  0.1  45452  8284 ?        S    Jan11   2:19 /bin/slirp4netns --disable-host-loopback --mtu 65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /tmp/run-1002/netns/cni-e4bb2146-d04e-c3f1-9207-380a234efa1f tap0

我对 pod 执行的唯一操作是常规操作：podman pod stop、podman pod rm 和 podman play kube，它们正在启动 pod。

是什么导致 Podman 出现这种奇怪的行为？什么可能导致锁不能正常释放？

系统信息：

NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.3"
Red Hat Enterprise Linux release 8.3 (Ootpa)
Red Hat Enterprise Linux release 8.3 (Ootpa)

Podman版本：

podman --version
podman version 2.2.1

【问题讨论】：

【参考方案1】：

对我有用的解决方法是从 /usr/lib/tmpfiles.d/ 和 /etc/tmpfiles.d/ 下的 Podman 存储库 [1] 添加此配置文件，通过这种方式，我们可以防止从 /tmp 目录中删除 Podman 临时文件 [2]。如 [3] 中所述，当系统崩溃或容器未正确关闭时，CNI 还会将网络信息留在 /var/lib/cni/networks 中。此行为已在最新的 Podman 版本 [4] 中得到修复，并且在使用无根 Podman 时会发生。

解决方法

首先，检查为您的 Podman 无根用户设置的 runRoot 默认目录：

podman info | grep runRoot

创建临时配置文件：

sudo vim /usr/lib/tmpfiles.d/podman.conf

添加以下内容，将 /tmp/podman-run-* 替换为您的默认 runRoot 目录。例如。如果您的输出是 /tmp/run-6695/containers 然后使用：x /tmp/run-*

# /tmp/podman-run-* directory can contain content for Podman containers that have run
# for many days. This following line prevents systemd from removing this content.
x /tmp/podman-run-*
x /tmp/containers-user-*
D! /run/podman 0700 root root
D! /var/lib/cni/networks

将临时文件从 /usr/lib/tmpfiles.d 复制到 /etc/tmpfiles.d/

sudo cp -p /usr/lib/tmpfiles.d/podman.conf /etc/tmpfiles.d/

根据您的配置完成所有步骤后，错误应该会消失。

参考文献

https://github.com/containers/podman/blob/master/contrib/tmpfile/podman.conf https://bugzilla.redhat.com/show_bug.cgi?id=1888988#c9 https://github.com/containers/podman/commit/2e0a9c453b03d2a372a3ab03b9720237e93a067c https://github.com/containers/podman/pull/8241

【讨论】：

我已将更改传播到服务器，让我们看看它有什么帮助。很难测试几天后问题是否会出现。 ;) 谢谢！对于任何将此修复作为解决方案的人，请确保 runRoot 对应于您设置为在 tmpfiles.d 中忽略的目录，在我的情况下，我必须将 /tmp/podman-run-* 更改为/tmp/run-*（例如/tmp/run-1001）。咨询podman info：runRoot: /tmp/run-1002/containers。相关 Podman 问题：github.com/containers/podman/issues/9663 解决方法已更新。谢谢@gczarnocki @llopisga 的解决方案似乎不起作用。我们仍看到相同的错误甚至抄袭podman.conf到/etc/tmpfiles.d/ERRO[0000]错误刷新集装箱28d7f360049bd3c3bd7f55baf78af3e11e3baf9ad489586899c928767f51cb2d后：偏差获取锁0集装箱28d7f360049bd3c3bd7f55baf78af3e11e3baf9ad489586899c928767f51cb2d：文件存在 SPAN>

以上是关于Podman pod 几天后消失，但进程仍在运行并在给定端口上侦听的主要内容，如果未能解决你的问题，请参考以下文章