为啥python在关闭fifo文件时会生成sigpipe异常?

Posted

技术标签:

【中文标题】为啥python在关闭fifo文件时会生成sigpipe异常?【英文标题】:Why does python generate sigpipe exception on closing a fifo file?为什么python在关闭fifo文件时会生成sigpipe异常? 【发布时间】:2020-02-07 08:42:34 【问题描述】:

TL;DR:为什么关闭收到 SIGPIPE 异常的 fifo 文件(命名管道)会产生另一个 SIGPIPE 异常?

我的 python 脚本正在通过 FIFO 文件将字节写入另一个进程,该进程是我的 python 进程的子进程。 (有一些限制,我必须使用命名管道。)

我必须考虑到子进程可能会提前终止的事实。如果发生这种情况,我的 python 脚本必须获取死掉的子进程并重新启动它。

要查看子进程是否死亡,我只是尝试先写入 FIFO,如果我收到 SIGPIPE 异常(实际上是 IOError 表示管道损坏),我知道是时候重新启动我的子进程了。

最小的例子如下:

#!/usr/bin/env python3
import os
import signal
import subprocess

# The FIFO file.
os.mkfifo('tmp.fifo')

# A subprocess to simply discard any input from the FIFO.
FNULL = open(os.devnull, 'w')
proc = subprocess.Popen(['/bin/cat', 'tmp.fifo'], stdout=FNULL, stderr=FNULL)
print('pid = %d' % proc.pid)

# Open the FIFO, and MUST BE BINARY MODE.
fifo = open('tmp.fifo', 'wb')

# Endlessly write to the FIFO.
while True:

    # Try to write to the FIFO, restart the subprocess on demand, until succeeded.
    while True:
        try:
            # Optimistically write to the FIFO.
            fifo.write(b'hello')
        except IOError as e:
            # The subprocess died. Close the FIFO and reap the subprocess.
            fifo.close()
            os.kill(proc.pid, signal.SIGKILL)
            proc.wait()

            # Start the subprocess again.
            proc = subprocess.Popen(['/bin/cat', 'tmp.fifo'], stdout=FNULL, stderr=FNULL)
            print('pid = %d' % proc.pid)
            fifo = open('tmp.fifo', 'wb')
        else:
            # The write goes on well.
            break

要重现结果,请运行该脚本并通过kill -9 <pid> 手动终止子进程。回溯会告诉你

Traceback (most recent call last):
  File "./test.py", line 24, in <module>
    fifo.write(b'hello')
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./test.py", line 27, in <module>
    fifo.close()
BrokenPipeError: [Errno 32] Broken pipe

那么为什么关闭 FIFO 文件会产生另一个 SIGPIPE 异常呢?

我在以下平台上进行了测试,结果都是一样的。

Python 3.7.6 @ Darwin Kernel Version 19.3.0 (MacOS 10.15.3)
Python 3.6.8 @ Linux 4.18.0-147.3.1.el8_1.x86_64 (Centos 8)

【问题讨论】:

【参考方案1】:

这是因为当fifo.write 失败时,Python 不会清除写缓冲区。所以在执行fifo.close时,缓冲区会再次写入损坏的管道,这会导致第二个SIGPIPE

我在strace的帮助下找到了原因。以下是一些细节。

首先,修改一小部分 Python 代码,如下所示,

#!/usr/bin/env python3
import os
import signal
import subprocess

# The FIFO file.
os.mkfifo('tmp.fifo')

# A subprocess to simply discard any input from the FIFO.
FNULL = open(os.devnull, 'w')
proc = subprocess.Popen(['/bin/cat', 'tmp.fifo'], stdout=FNULL, stderr=FNULL)
print('pid = %d' % proc.pid)

# Open the FIFO, and MUST BE BINARY MODE.
fifo = open('tmp.fifo', 'wb')

i = 0
# Endlessly write to the FIFO.
while True:

    # Try to write to the FIFO, restart the subprocess on demand, until succeeded.
    while True:
        try:
            # Optimistically write to the FIFO.
            fifo.write(f'helloi'.encode())
            fifo.flush()
        except IOError as e:
            # The subprocess died. Close the FIFO and reap the subprocess.
            print('IOError is occured.')
            fifo.close()
            os.kill(proc.pid, signal.SIGKILL)
            proc.wait()

            # Start the subprocess again.
            proc = subprocess.Popen(['/bin/cat', 'tmp.fifo'], stdout=FNULL, stderr=FNULL)
            print('pid = %d' % proc.pid)
            fifo = open('tmp.fifo', 'wb')
        else:
            # The write goes on well.
            break
    os.kill(proc.pid, signal.SIGKILL)
    i += 1

并将其另存为test.py

然后在 shell 中运行strace -o strace.out python3 test.py。检查strace.out,我们可以找到类似的东西

openat(AT_FDCWD, "tmp.fifo", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
fstat(4, st_mode=S_IFIFO|0644, st_size=0, ...) = 0
ioctl(4, TCGETS, 0x7ffcba5cd290)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(4, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
write(4, "hello0", 6)                   = 6
kill(35626, SIGKILL)                    = 0
write(4, "hello1", 6)                   = 6
kill(35626, SIGKILL)                    = 0
write(4, "hello2", 6)                   = -1 EPIPE (Broken pipe)
--- SIGPIPE si_signo=SIGPIPE, si_code=SI_USER, si_pid=35625, si_uid=1000 ---
--- SIGCHLD si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=35626, si_uid=1000, si_status=SIGKILL, si_utime=0, si_stime=0 ---
write(1, "IOError is occured.\n", 20)   = 20
write(4, "hello2", 6)                   = -1 EPIPE (Broken pipe)
--- SIGPIPE si_signo=SIGPIPE, si_code=SI_USER, si_pid=35625, si_uid=1000 ---
close(4)                                = 0
write(2, "Traceback (most recent call last"..., 35) = 35
write(2, "  File \"test.py\", line 26, in <m"..., 39) = 39

请注意,Python 尝试写入 hello2 两次,分别在 fifo.flushfifo.close 期间。输出解释了为什么会很好地生成两个 SIGPIPE 异常。

为了解决这个问题,我们可以使用open('tmp.fifo', 'wb', buffering=0)来禁用写缓冲区。那么只会产生一个 SIGPIPE 异常。

【讨论】:

以上是关于为啥python在关闭fifo文件时会生成sigpipe异常?的主要内容,如果未能解决你的问题,请参考以下文章

C语言模拟FIFO算法,随机生成320条指令,有四块物理块,为啥错了?

执行 `dart2js` 时会生成哪些文件?为啥?

为啥 Xcode 在创建类时会生成“不必要的”代码?

为啥我的 PHP 脚本在生成缩略图时会停止?

为啥plsql develope 打开.sql文件时会生成后缀名为.~sql文件

为啥我的 Python 代码在从文本文件中读取时会打印额外的字符“”?