使用子进程时如何在 Python 中复制 tee 行为？

Posted 2023-02-23

技术标签:

【中文标题】使用子进程时如何在 Python 中复制 tee 行为？【英文标题】：How to replicate tee behavior in Python when using subprocess? 【发布时间】：2011-03-01 03:02:12 【问题描述】：

我正在寻找一种 Python 解决方案，它允许我将命令的输出保存在文件中，而不会将其隐藏在控制台中。

仅供参考：我问的是tee（作为 Unix 命令行实用程序），而不是 Python intertools 模块中的同名函数。

详情

Python解决方案（不调用tee，Windows下不可用）我不需要为调用的进程向标准输入提供任何输入我无法控制被调用的程序。我所知道的是，它会向 stdout 和 stderr 输出一些内容并返回退出代码。调用外部程序时工作（子进程）为stderr 和stdout 工作能够区分 stdout 和 stderr，因为我可能只想向控制台显示其中一个，或者我可以尝试使用不同的颜色输出 stderr - 这意味着 stderr = subprocess.STDOUT 将不起作用。实时输出（渐进式）- 进程可以运行很长时间，我无法等待它完成。 Python 3 兼容代码（重要）

参考文献

以下是我目前找到的一些不完整的解决方案：

http://devlishgenius.blogspot.com/2008/10/logging-in-real-time-in-python.html（mkfifo 仅适用于 Unix） http://blog.kagesenshi.org/2008/02/teeing-python-subprocesspopen-output.html（根本不起作用）

Diagram http://blog.i18n.ro/wp-content/uploads/2010/06/Drawing_tee_py.png

当前代码（第二次尝试）

#!/usr/bin/python
from __future__ import print_function

import sys, os, time, subprocess, io, threading
cmd = "python -E test_output.py"

from threading import Thread
class StreamThread ( Thread ):
    def __init__(self, buffer):
        Thread.__init__(self)
        self.buffer = buffer
    def run ( self ):
        while 1:
            line = self.buffer.readline()
            print(line,end="")
            sys.stdout.flush()
            if line == '':
                break

proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutThread = StreamThread(io.TextIOWrapper(proc.stdout))
stderrThread = StreamThread(io.TextIOWrapper(proc.stderr))
stdoutThread.start()
stderrThread.start()
proc.communicate()
stdoutThread.join()
stderrThread.join()

print("--done--")

#### test_output.py ####

#!/usr/bin/python
from __future__ import print_function
import sys, os, time

for i in range(0, 10):
    if i%2:
        print("stderr %s" % i, file=sys.stderr)
    else:
        print("stdout %s" % i, file=sys.stdout)
    time.sleep(0.1)

实际输出

stderr 1
stdout 0
stderr 3
stdout 2
stderr 5
stdout 4
stderr 7
stdout 6
stderr 9
stdout 8
--done--

预期的输出是对行进行排序。备注，修改 Popen 以仅使用一个 PIPE 是不允许的，因为在现实生活中我会想用 stderr 和 stdout 做不同的事情。

即使在第二种情况下，我也无法获得实时的输出，实际上所有结果都是在处理完成时收到的。默认情况下，Popen 不应该使用缓冲区 (bufsize=0)。

【问题讨论】：

相关：Python subprocess get children's output to file and terminal? 相关：Subprocess.Popen: cloning stdout and stderr both to terminal and variables Python Popen: Write to stdout AND log file simultaneously 的可能副本以这种方式投票，因为这是一个社区 wiki :-) 【参考方案1】：

这是 tee 到 Python 的直接端口。

import sys
sinks = sys.argv[1:]
sinks = [open(sink, "w") for sink in sinks]
sinks.append(sys.stderr)
while True:
  input = sys.stdin.read(1024)
  if input:
    for sink in sinks:
      sink.write(input)
  else:
    break

我现在在 Linux 上运行，但这应该可以在大多数平台上运行。

现在对于subprocess 部分，我不知道您想如何将子流程的stdin、stdout 和stderr“连接”到您的stdin、stdout、stderr 和文件接收器，但我知道你可以这样做：

import subprocess
callee = subprocess.Popen( ["python", "-i"],
                           stdin = subprocess.PIPE,
                           stdout = subprocess.PIPE,
                           stderr = subprocess.PIPE
                         )

现在您可以像访问普通文件一样访问callee.stdin、callee.stdout 和callee.stderr，从而使上述“解决方案”生效。如果您想获得callee.returncode，则需要额外致电callee.poll()。

写信给callee.stdin时要小心：如果在你这样做时进程已经退出，可能会出现错误（在Linux上，我收到IOError: [Errno 32] Broken pipe）。

【讨论】：

这在 Linux 中不是最理想的，因为 Linux 提供了一个 ad-hoc tee(f_in, f_out, len, flags) API，但这不是重点吗？我更新了问题，问题是我无法找到如何使用子进程来逐渐从两个管道获取数据，而不是在进程结束时一次全部获取数据。我知道您的代码应该可以工作，但有一个小要求确实破坏了整个逻辑：我希望能够区分 stdout 和 stderr ，这意味着我必须阅读两者他们，但我不知道哪个会获得新数据。请看示例代码。 @Sorin，这意味着您必须使用两个线程。一篇阅读stdout，一篇阅读stderr。如果要将两者都写入同一个文件，则可以在开始读取时获取接收器上的锁，并在写入行终止符后释放它。 :// 为此使用线程对我来说听起来不太吸引人，也许我们会找到别的东西。奇怪的是，这是一个常见问题，但没有人提供完整的解决方案。【参考方案2】：

如果您不想与流程交互，您可以使用子流程模块。

例子：

tester.py

import os
import sys

for file in os.listdir('.'):
    print file

sys.stderr.write("Oh noes, a shrubbery!")
sys.stderr.flush()
sys.stderr.close()

testing.py

import subprocess

p = subprocess.Popen(['python', 'tester.py'], stdout=subprocess.PIPE,
                     stdin=subprocess.PIPE, stderr=subprocess.PIPE)

stdout, stderr = p.communicate()
print stdout, stderr

在您的情况下，您可以先简单地将 stdout/stderr 写入文件。您也可以通过通信向您的流程发送参数，尽管我无法弄清楚如何与子流程持续交互。

【讨论】：

这不会在 STDOUT 的上下文中显示 STDERR 中的错误消息，这会使调试 shell 脚本等几乎不可能。意思...？在此脚本中，通过 STDERR 传递的任何内容都会与 STDOUT 一起打印到屏幕上。如果您指的是返回码，只需使用p.poll() 来检索它们。这不满足“渐进”条件。【参考方案3】：

我看到这是一个相当老的帖子，但以防万一有人仍在寻找这样做的方法：

proc = subprocess.Popen(["ping", "localhost"], 
                        stdout=subprocess.PIPE, 
                        stderr=subprocess.PIPE)

with open("logfile.txt", "w") as log_file:
  while proc.poll() is None:
     line = proc.stderr.readline()
     if line:
        print "err: " + line.strip()
        log_file.write(line)
     line = proc.stdout.readline()
     if line:
        print "out: " + line.strip()
        log_file.write(line)

【讨论】：

这对我有用，虽然我发现stdout, stderr = proc.communicate() 更易于使用。 -1：此解决方案会导致任何可以在 stdout 或 stderr 上生成足够输出并且 stdout/stderr 不完全同步的子进程出现死锁。 @J.F.Sebastian：没错，但您可以通过将readline() 替换为readline(size) 来解决该问题。我用其他语言做过类似的事情。参考：docs.python.org/3/library/io.html#io.TextIOBase.readline @kevinarpe 错误。 readline(size) 不会修复死锁。 stdout/stderr 应该同时读取。请参阅问题下的链接，这些链接显示使用线程或异步的解决方案。 @J.F.Sebastian 如果我只对阅读其中一个流感兴趣，是否存在此问题？【参考方案4】：

我的解决方案并不优雅，但很有效。

您可以在WinOS下使用powershell访问“tee”。

import subprocess
import sys

cmd = ['powershell', 'ping', 'google.com', '|', 'tee', '-a', 'log.txt']

if 'darwin' in sys.platform:
    cmd.remove('powershell')

p = subprocess.Popen(cmd)
p.wait()

【讨论】：

在 MacOS 中提供来自 ping 的无效命令行错误消息。【参考方案5】：

这是可以做到的

import sys
from subprocess import Popen, PIPE

with open('log.log', 'w') as log:
    proc = Popen(["ping", "google.com"], stdout=PIPE, encoding='utf-8')
    while proc.poll() is None:
        text = proc.stdout.readline() 
        log.write(text)
        sys.stdout.write(text)

【讨论】：

对于任何想知道的人，是的，您可以使用print() 而不是sys.stdout.write()。 :-) @progyammer print 将添加一个额外的换行符，当您需要忠实地重现输出时，这不是您想要的。可以，但print(line, end='') 可以解决问题【参考方案6】：

如果需要 python 3.6 不是问题，现在有一种方法可以使用 asyncio。此方法允许您分别捕获 stdout 和 stderr，但仍将两者都流到 tty 而不使用线程。这是一个粗略的大纲：

class RunOutput():
    def __init__(self, returncode, stdout, stderr):
        self.returncode = returncode
        self.stdout = stdout
        self.stderr = stderr

async def _read_stream(stream, callback):
    while True:
        line = await stream.readline()
        if line:
            callback(line)
        else:
            break

async def _stream_subprocess(cmd, stdin=None, quiet=False, echo=False) -> RunOutput:
    if isWindows():
        platform_settings = 'env': os.environ
    else:
        platform_settings = 'executable': '/bin/bash'

    if echo:
        print(cmd)

    p = await asyncio.create_subprocess_shell(cmd,
                                              stdin=stdin,
                                              stdout=asyncio.subprocess.PIPE,
                                              stderr=asyncio.subprocess.PIPE,
                                              **platform_settings)
    out = []
    err = []

    def tee(line, sink, pipe, label=""):
        line = line.decode('utf-8').rstrip()
        sink.append(line)
        if not quiet:
            print(label, line, file=pipe)

    await asyncio.wait([
        _read_stream(p.stdout, lambda l: tee(l, out, sys.stdout)),
        _read_stream(p.stderr, lambda l: tee(l, err, sys.stderr, label="ERR:")),
    ])

    return RunOutput(await p.wait(), out, err)


def run(cmd, stdin=None, quiet=False, echo=False) -> RunOutput:
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(
        _stream_subprocess(cmd, stdin=stdin, quiet=quiet, echo=echo)
    )

    return result

以上代码基于这篇博文：https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/

【讨论】：

以上是关于使用子进程时如何在 Python 中复制 tee 行为？的主要内容，如果未能解决你的问题，请参考以下文章