从子进程中实时捕获标准输出

Posted 2023-02-23

技术标签:

【中文标题】从子进程中实时捕获标准输出【英文标题】：catching stdout in realtime from subprocess 【发布时间】：2010-12-09 01:44:22 【问题描述】：

我想在 Windows 中 subprocess.Popen()rsync.exe，并在 Python 中打印标准输出。

我的代码可以工作，但在文件传输完成之前它无法捕捉进度！我想实时打印每个文件的进度。

现在使用 Python 3.1，因为我听说它应该更好地处理 IO。

import subprocess, time, os, sys

cmd = "rsync.exe -vaz -P source/ dest/"
p, line = True, 'start'


p = subprocess.Popen(cmd,
                     shell=True,
                     bufsize=64,
                     stdin=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     stdout=subprocess.PIPE)

for line in p.stdout:
    print(">>> " + str(line.rstrip()))
    p.stdout.flush()

【问题讨论】：

重复：***.com/questions/1085071/…、***.com/questions/874815/…、***.com/questions/527197/… （来自 google？）当 PIPE 的缓冲区之一被填满且未被读取时，所有 PIPE 都会死锁。例如填充 stderr 时出现 stdout 死锁。切勿通过您不打算阅读的 PIPE。有人能解释一下为什么不能只将 stdout 设置为 sys.stdout 而不是 subprocess.PIPE 吗？ 【参考方案1】：

subprocess 的一些经验法则。

从不使用shell=True。它不必要地调用一个额外的 shell 进程来调用您的程序。调用进程时，参数作为列表传递。 sys.argv 在 python 中是一个列表，argv 在 C 中也是一个列表。所以你将 list 传递给 Popen 来调用子进程，而不是字符串。不阅读时不要将stderr 重定向到PIPE。不写信时不要重定向stdin。

例子：

import subprocess, time, os, sys
cmd = ["rsync.exe", "-vaz", "-P", "source/" ,"dest/"]

p = subprocess.Popen(cmd,
                     stdout=subprocess.PIPE,
                     stderr=subprocess.STDOUT)

for line in iter(p.stdout.readline, b''):
    print(">>> " + line.rstrip())

也就是说，当 rsync 检测到它连接到管道而不是终端时，它可能会缓冲其输出。这是默认行为 - 当连接到管道时，程序必须显式刷新标准输出以获得实时结果，否则标准 C 库将缓冲。

要对此进行测试，请尝试运行它：

cmd = [sys.executable, 'test_out.py']

并创建一个包含内容的test_out.py 文件：

import sys
import time
print ("Hello")
sys.stdout.flush()
time.sleep(10)
print ("World")

执行该子进程应该会给您“Hello”并等待 10 秒，然后再给出“World”。如果上面的 python 代码而不是rsync 发生这种情况，这意味着rsync 本身正在缓冲输出，所以你不走运。

一种解决方案是直接连接到pty，使用pexpect 之类的东西。

【讨论】：

shell=False 在构建命令行时是正确的，尤其是从用户输入的数据中。但是，当您从受信任的来源（例如，在脚本中硬编码）获取整个命令行时，shell=True 也很有用。 @Denis Otkidach：我认为这不值得使用shell=True。想一想 - 您正在调用操作系统上的另一个进程，涉及内存分配、磁盘使用、处理器调度，只是为了拆分字符串！还有一个你自己加入的！！您可以在 python 中拆分，但无论如何单独编写每个参数更容易。此外，使用列表意味着您不必转义特殊的 shell 字符：空格、;、>、<、&.. 您的参数可以包含这些字符，您不必担心！我看不出使用 shell=True 的理由，真的，除非你运行的是纯 shell 命令。 nosklo，应该是：p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) @mathtick：我不知道你为什么要将这些操作作为单独的进程来执行……你可以使用csv 模块在 python 中剪切文件内容并轻松提取第一个字段。但作为示例，您在 python 中的管道将是：

p = Popen(['cut', '-f1'], stdin=open('longfile.tab'), stdout=PIPE) ; p2 = Popen(['head', '-100'], stdin=p.stdout, stdout=PIPE) ; result, stderr = p2.communicate() ; print result

请注意，您可以使用长文件名和 shell 特殊字符而无需转义，因为不涉及 shell。而且因为少了一个流程，所以速度也快了很多。在 Python 2 中使用 for line in iter(p.stdout.readline, b'') 而不是 for line in p.stdout 否则即使源进程没有缓冲其输出，也不会实时读取行。【参考方案2】：

我知道这是一个老话题，但现在有一个解决方案。使用选项 --outbuf=L 调用 rsync。示例：

cmd=['rsync', '-arzv','--backup','--outbuf=L','source/','dest']
p = subprocess.Popen(cmd,
                     stdout=subprocess.PIPE)
for line in iter(p.stdout.readline, b''):
    print '>>> '.format(line.rstrip())

【讨论】：

这很有效，应该被赞成以防止未来的读者滚动浏览上面的所有对话框。 @VectorVictor 它没有解释发生了什么，以及为什么会这样。可能是您的程序可以工作，直到： 1. 添加 preexec_fn=os.setpgrp 以使程序在其父脚本中存活 2. 您跳过从进程的管道中读取 3. 进程输出大量数据，填充管道 4. 您是卡住了几个小时，试图弄清楚为什么你正在运行的程序在一段时间后退出。 @nosklo 的回答对我帮助很大。【参考方案3】：

根据用例，您可能还希望禁用子进程本身的缓冲。

如果子进程是 Python 进程，您可以在调用之前执行此操作：

os.environ["PYTHONUNBUFFERED"] = "1"

或者在env 参数中将其传递给Popen。

否则，如果您使用的是 Linux/Unix，则可以使用 stdbuf 工具。例如。喜欢：

cmd = ["stdbuf", "-oL"] + cmd

另请参阅 here 关于 stdbuf 或其他选项。

【讨论】：

你拯救了我的一天，感谢 PYTHONUNBUFFERED=1 在线程内运行 python 代码 /w Popen 时出现问题，stdout 只会在线程终止后打印。这解决了它。这个答案真的应该被推荐！！！我尝试了许多不同的方法来解决缓冲区问题，唯一解决的就是这个解决方案......【参考方案4】：

在 Linux 上，我遇到了摆脱缓冲的同样问题。我终于使用了“stdbuf -o0”（或者，unbuffer from expect）来摆脱 PIPE 缓冲。

proc = Popen(['stdbuf', '-o0'] + cmd, stdout=PIPE, stderr=PIPE)
stdout = proc.stdout

然后我可以在标准输出上使用 select.select。

另见https://unix.stackexchange.com/questions/25372/

【讨论】：

对于任何试图从 Python 获取 C 代码标准输出的人，我可以确认这个解决方案是唯一对我有用的解决方案。为了清楚起见，我说的是在 Popen 中将“stdbuf”、“-o0”添加到我现有的命令列表中。谢谢！ stdbuf -o0 被证明对我编写的一堆 pytest/pytest-bdd 测试非常有用，这些测试生成了一个 C++ 应用程序并验证它是否发出某些日志语句。如果没有stdbuf -o0，这些测试需要 7 秒才能从 C++ 程序获得（缓冲的）输出。现在它们几乎可以瞬间运行！这个答案今天救了我！作为pytest 的一部分将应用程序作为子进程运行，我不可能得到它的输出。 stdbuf 做到了。【参考方案5】：

for line in p.stdout:
  ...

在下一个换行之前一直阻塞。

对于“实时”行为，您必须执行以下操作：

while True:
  inchar = p.stdout.read(1)
  if inchar: #neither empty string nor None
    print(str(inchar), end='') #or end=None to flush immediately
  else:
    print('') #flush for implicit line-buffering
    break

当子进程关闭其标准输出或退出时，while 循环将被保留。 read()/read(-1) 将阻塞直到子进程关闭其标准输出或退出。

【讨论】：

inchar 永远不是 None 使用 if not inchar: 代替（read() 在 EOF 上返回空字符串）。顺便说一句，更糟糕的是for line in p.stdout 在 Python 2 中甚至不能实时打印整行（可以使用for line in iter(p.stdout.readline, '')`）。我已经在 osx 上用 python 3.4 测试过这个，但它不起作用。 @qed: for line in p.stdout: 适用于 Python 3。请务必了解 ''（Unicode 字符串）和 b''（字节）之间的区别。见Python: read streaming input from subprocess.communicate()【参考方案6】：

你的问题是：

for line in p.stdout:
    print(">>> " + str(line.rstrip()))
    p.stdout.flush()

迭代器本身有额外的缓冲。

尝试这样做：

while True:
  line = p.stdout.readline()
  if not line:
     break
  print line

【讨论】：

【参考方案7】：

你不能让标准输出无缓冲地打印到管道（除非你可以重写打印到标准输出的程序），所以这是我的解决方案：

将标准输出重定向到未缓冲的 sterr。 '<cmd> 1>&2' 应该这样做。打开进程如下：myproc = subprocess.Popen('<cmd> 1>&2', stderr=subprocess.PIPE) 您无法区分 stdout 或 stderr，但您会立即获得所有输出。

希望这可以帮助任何人解决这个问题。

【讨论】：

你试过了吗？因为它不起作用.. 如果 stdout 在该过程中被缓冲，它将不会被重定向到 stderr，就像它没有被重定向到 PIPE 或文件一样.. 这是完全错误的。 stdout 缓冲发生在程序本身内。 shell 语法1>&2 只是在启动程序之前更改文件描述符指向的文件。程序本身无法区分将 stdout 重定向到 stderr (1>&2) 还是反之亦然 (2>&1)，因此这对程序的缓冲行为没有影响。无论哪种方式 1>&2 语法都会被解释由壳。 subprocess.Popen('<cmd> 1>&2', stderr=subprocess.PIPE) 会失败，因为您没有指定 shell=True。如果人们会读到这个：我尝试使用 stderr 而不是 stdout，它显示了完全相同的行为。【参考方案8】：

为避免缓存输出，您可能想尝试 pexpect，

child = pexpect.spawn(launchcmd,args,timeout=None)
while True:
    try:
        child.expect('\n')
        print(child.before)
    except pexpect.EOF:
        break

PS ：我知道这个问题已经很老了，仍然提供对我有用的解决方案。

PPS：从另一个问题得到这个答案

【讨论】：

【参考方案9】：

    p = subprocess.Popen(command,
                                bufsize=0,
                                universal_newlines=True)

我正在为 python 中的 rsync 编写一个 GUI，并且有相同的问题。这个问题困扰了我好几天，直到我在 pyDoc 中找到它。

如果universal_newlines 为True，文件对象stdout 和stderr 将作为文本文件以通用换行符模式打开。行可以由 '\n'（Unix 行尾约定）、'\r'（旧 Macintosh 约定）或 '\r\n'（Windows 约定）中的任何一个终止。所有这些外部表示都被 Python 程序视为“\n”。

翻译进行时rsync似乎会输出'\r'。

【讨论】：

【参考方案10】：

将 rsync 进程的标准输出更改为无缓冲。

p = subprocess.Popen(cmd,
                     shell=True,
                     bufsize=0,  # 0=unbuffered, 1=line-buffered, else buffer-size
                     stdin=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     stdout=subprocess.PIPE)

【讨论】：

缓冲发生在 rsync 端，在 python 端更改 bufsize 属性无济于事。对于其他人搜索，nosklo 的回答是完全错误的：rsync 的进度显示没有缓冲；真正的问题是子进程返回一个文件对象，并且文件迭代器接口的内部缓冲区记录不充分，即使 bufsize=0，如果在缓冲区填满之前需要结果，则需要重复调用 readline()。【参考方案11】：

我注意到没有提到使用临时文件作为中间文件。下面通过输出到一个临时文件来解决缓冲问题，并允许您在不连接到 pty 的情况下解析来自 rsync 的数据。我在linux机器上测试了以下，rsync的输出往往会因平台而异，所以解析输出的正则表达式可能会有所不同：

import subprocess, time, tempfile, re

pipe_output, file_name = tempfile.TemporaryFile()
cmd = ["rsync", "-vaz", "-P", "/src/" ,"/dest"]

p = subprocess.Popen(cmd, stdout=pipe_output, 
                     stderr=subprocess.STDOUT)
while p.poll() is None:
    # p.poll() returns None while the program is still running
    # sleep for 1 second
    time.sleep(1)
    last_line =  open(file_name).readlines()
    # it's possible that it hasn't output yet, so continue
    if len(last_line) == 0: continue
    last_line = last_line[-1]
    # Matching to "[bytes downloaded]  number%  [speed] number:number:number"
    match_it = re.match(".* ([0-9]*)%.* ([0-9]*:[0-9]*:[0-9]*).*", last_line)
    if not match_it: continue
    # in this case, the percentage is stored in match_it.group(1), 
    # time in match_it.group(2).  We could do something with it here...

【讨论】：

它不是实时的。文件不能解决 rsync 方面的缓冲问题。 tempfile.TemporaryFile 可以自行删除，以便在出现异常时清理 while not p.poll() 如果子进程以 0 成功退出会导致无限循环，请改用p.poll() is None Windows 可能会禁止打开已经打开的文件，所以open(file_name) 可能会失败我刚刚找到了这个答案，不幸的是，它只适用于 linux，但它就像一个魅力 link 所以我只是扩展我的命令如下：command_argv = ["stdbuf","-i0","-o0","-e0"] + command_argv 并调用：popen = subprocess.Popen(cmd, stdout=subprocess.PIPE) 现在我可以阅读从没有任何缓冲【参考方案12】：

如果你在一个线程中运行这样的东西并将 ffmpeg_time 属性保存在一个方法的属性中以便你可以访问它，它会非常好用我得到这样的输出： output be like if you use threading in tkinter

input = 'path/input_file.mp4'
output = 'path/input_file.mp4'
command = "ffmpeg -y -v quiet -stats -i \"" + str(input) + "\" -metadata title=\"@alaa_sanatisharif\" -preset ultrafast -vcodec copy -r 50 -vsync 1 -async 1 \"" + output + "\""
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True, shell=True)
for line in self.process.stdout:
    reg = re.search('\d\d:\d\d:\d\d', line)
    ffmpeg_time = reg.group(0) if reg else ''
    print(ffmpeg_time)

【讨论】：

【参考方案13】：

在 Python 3 中，有一个解决方案，它从命令行中取出一个命令，并在接收到它们时提供实时解码良好的字符串。

接收者 (receiver.py)：

import subprocess
import sys

cmd = sys.argv[1:]
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for line in p.stdout:
    print("received: ".format(line.rstrip().decode("utf-8")))

可以生成实时输出的简单程序示例 (dummy_out.py)：

import time
import sys

for i in range(5):
    print("hello ".format(i))
    sys.stdout.flush()  
    time.sleep(1)

输出：

$python receiver.py python dummy_out.py
received: hello 0
received: hello 1
received: hello 2
received: hello 3
received: hello 4

【讨论】：

以上是关于从子进程中实时捕获标准输出的主要内容，如果未能解决你的问题，请参考以下文章