Python Popen communicate 和wait使用上的区别

Posted 2023-05-08

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Python Popen communicate 和wait使用上的区别相关的知识，希望对你有一定的参考价值。

简单说就是，使用 subprocess 模块的 Popen 调用外部程序，如果 stdout 或 stderr 参数是 pipe，并且程序输出超过操作系统的 pipe size时，如果使用 Popen.wait() 方式等待程序结束获取返回值，会导致死锁，程序卡在 wait() 调用上。
ulimit -a 看到的 pipe size 是 4KB，那只是每页的大小，查询得知 linux 默认的 pipe size 是 64KB。
看例子：
#!/usr/bin/env python
# coding: utf-8
# yc@2013/04/28

import subprocess

def test(size):
print \'start\'

cmd = \'dd if=/dev/urandom bs=1 count=%d 2>/dev/null\' % size
p = subprocess.Popen(args=cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
#p.communicate()
p.wait()

print \'end\'

# 64KB
test(64 * 1024)

# 64KB + 1B
test(64 * 1024 + 1)

首先测试输出为 64KB 大小的情况。使用 dd 产生了正好 64KB 的标准输出，由 subprocess.Popen 调用，然后使用 wait() 等待 dd 调用结束。可以看到正确的 start 和 end 输出；然后测试比 64KB 多的情况，这种情况下只输出了 start，也就是说程序执行卡在了 p.wait() 上，程序死锁。具体输出如下：
start
end
start

那死锁问题如何避免呢？官方文档里推荐使用 Popen.communicate()。这个方法会把输出放在内存，而不是管道里，所以这时候上限就和内存大小有关了，一般不会有问题。而且如果要获得程序返回值，可以在调用 Popen.communicate() 之后取 Popen.returncode 的值。
结论：如果使用 subprocess.Popen，就不使用 Popen.wait()，而使用 Popen.communicate() 来等待外部程序执行结束。

Popen.wait()¶
Wait for child process to terminate. Set and returnreturncode attribute.
Warning
This will deadlock when using stdout=PIPE and/orstderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
Popen.communicate(input=None)¶
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate. The optionalinput argument should be a string to be sent to the child process, orNone, if no data should be sent to the child.
communicate() returns a tuple (stdoutdata, stderrdata).
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE. Similarly, to get anything other thanNone in the result tuple, you need to give stdout=PIPE and/orstderr=PIPE too.
Note
The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

subprocess 的两种方法：
1）如果想调用之后直接阻塞到子程序调用结束：
Depending on how you want to work your script you have two options. If you want the commands to block and not do anything while it is executing, you can just use subprocess.call.
#start and block until done
subprocess.call([data["om_points"], ">", diz[\'d\']+"/points.xml"])

2）非阻塞的时候方式：

If you want to do things while it is executing or feed things into stdin, you can use communicate after the popen call.
#start and process things, then wait
p = subprocess.Popen(([data["om_points"], ">", diz[\'d\']+"/points.xml"])
print "Happens while running"
p.communicate() #now wait

As stated in the documentation, wait can deadlock, so communicate is advisable. 参考技术A 使用 subprocess 模块的 Popen 调用外部程序，如果 stdout 或 stderr 参数是 pipe，并且程序输出超过操作系统的 pipe size时，如果使用 Popen.wait() 方式等待程序结束获取返回值，会导致死锁，程序卡在 wait() 调用上。
ulimit -a 看到的 pipe size 是 4KB，那只是每页的大小，查询得知 Linux 默认的 pipe size 是 64KB。
看例子：
#!/usr/bin/env python
# coding: utf-8
# yc@2013/04/28

import subprocess

def test(size):
print 'start'

cmd = 'dd if=/dev/urandom bs=1 count=%d 2>/dev/null' % size
p = subprocess.Popen(args=cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
#p.communicate()
p.wait()

print 'end'

# 64KB
test(64 * 1024)

# 64KB + 1B
test(64 * 1024 + 1)

首先测试输出为 64KB 大小的情况。使用 dd 产生了正好 64KB 的标准输出，由 subprocess.Popen 调用，然后使用 wait() 等待 dd 调用结束。可以看到正确的 start 和 end 输出；然后测试比 64KB 多的情况，这种情况下只输出了 start，也就是说程序执行卡在了 p.wait() 上，程序死锁。具体输出如下：
start
end
start

那死锁问题如何避免呢？官方文档里推荐使用 Popen.communicate()。这个方法会把输出放在内存，而不是管道里，所以这时候上限就和内存大小有关了，一般不会有问题。而且如果要获得程序返回值，可以在调用 Popen.communicate() 之后取 Popen.returncode 的值。
结论：如果使用 subprocess.Popen，就不使用 Popen.wait()，而使用 Popen.communicate() 来等待外部程序执行结束。

Popen.wait()¶
Wait for child process to terminate. Set and returnreturncode attribute.
Warning
This will deadlock when using stdout=PIPE and/orstderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
Popen.communicate(input=None)¶
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate. The optionalinput argument should be a string to be sent to the child process, orNone, if no data should be sent to the child.
communicate() returns a tuple (stdoutdata, stderrdata).
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE. Similarly, to get anything other thanNone in the result tuple, you need to give stdout=PIPE and/orstderr=PIPE too.
Note
The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

subprocess 的两种方法：
1）如果想调用之后直接阻塞到子程序调用结束：
Depending on how you want to work your script you have two options. If you want the commands to block and not do anything while it is executing, you can just use subprocess.call.
#start and block until done
subprocess.call([data["om_points"], ">", diz['d']+"/points.xml"])

2）非阻塞的时候方式：

If you want to do things while it is executing or feed things into stdin, you can use communicate after the popen call.
#start and process things, then wait
p = subprocess.Popen(([data["om_points"], ">", diz['d']+"/points.xml"])
print "Happens while running"
p.communicate() #now wait

As stated in the documentation, wait can deadlock, so communicate is advisable.

subprocess.Popen communication（）写入控制台，但不写入日志文件

我在Python脚本中有以下行，它在原始脚本中运行单独的Python脚本：

subprocess.Popen("'/MyExternalPythonScript.py' " + theArgumentToPassToPythonScript, shell=True).communicate()

使用上面的行，在单独的Python文件中找到的任何print()语句都会出现在主Python脚本的控制台中。

但是，这些语句不会反映在脚本写入的.txt文件日志中。

有谁知道如何解决这个问题，以便.txt文件准确反映主Python脚本的真实控制台文本？

这是我用来将控制台保存为.txt文件的the method，实时：

import sys
class Logger(object):
    def __init__(self):
        self.terminal = sys.stdout
        self.log = open("/ScriptLog.txt", "w", 0)
    def write(self, message):
        self.terminal.write(message)
        self.log.write(message)


sys.stdout = Logger()

我不一定依赖这种方法。我对能够实现我所详述的任何方法感兴趣。

答案

请记住，subprocess会生成一个新进程，并且不会真正与父进程通信（它们几乎是独立的实体）。尽管它的名称，communicate方法只是一种从父进程发送/接收数据到子进程的方式（例如，模拟用户在终端上输入内容）

为了知道输出的写入位置，子进程使用数字（文件标识符或文件号）。当子进程生成进程时，子进程只知道标准输出是O.S中标识的文件。作为7（说一个数字），但这就是它。子流程将独立地查询操作系统，例如“嘿！什么是文件号7？给我，我有东西要写在里面”。（了解C fork的作用在这里非常有帮助）

基本上，衍生的子流程不理解你的Logger类。它只知道它必须将其内容写入文件：在OS中用数字唯一标识的文件，除非另有说明，否则该数字对应于标准输出的文件描述符（但在案例＃中说明）如下图2所示，您可以根据需要进行更改）

所以你有几个“解决方案”......

克隆（tee）stdout到一个文件，所以当某些内容被写入stdout时，操作系统也将它写入你的文件（这实际上与Python不相关......它与操作系统有关）： import os import tempfile import subprocess file_log = os.path.join(tempfile.gettempdir(), 'foo.txt') p = subprocess.Popen("python ./run_something.py | tee %s" % file_log, shell=True) p.wait()
选择是否使用每个的fileno()函数写入终端OR或文件。例如，只写入文件： import os import tempfile import subprocess file_log = os.path.join(tempfile.gettempdir(), 'foo.txt') with open(file_log, 'w') as f: p = subprocess.Popen("python ./run_something.py", shell=True, stdout=f.fileno()) p.wait()
我个人觉得“更安全”（我觉得不能轻易覆盖sys.stdout）：只需让命令运行并将其输出存储到变量中并稍后（在父进程中）获取它： import os import tempfile import subprocess p = subprocess.Popen("python ./run_something.py", shell=True, stdout=subprocess.PIPE) p.wait() contents = p.stdout.read() # Whatever the output of Subprocess was is now stored in 'contents' # Let's write it to file: file_log = os.path.join(tempfile.gettempdir(), 'foo.txt') with open(file_log, 'w') as f: f.write(contents) 这样，您还可以在代码中的某处执行print(contents)，以将子进程“说”输出到终端。

例如，脚本“./run_something.py”就是这样的：

print("Foo1")
print("Foo2")
print("Foo3")

另一答案

你真的需要subprocess.Popen的communicate()方法吗？看起来你只想要输出。这就是subprocess.check_output()的用途。

如果您使用它，您可以使用内置的logging模块“tee” - 输出流到多个目的地。

import logging
import subprocess
import sys

EXTERNAL_SCRIPT_PATH = '/path/to/talker.py'
LOG_FILE_PATH = '/path/to/debug.log'

logger = logging.getLogger('')
logger.setLevel(logging.INFO)

# Log to screen
console_logger = logging.StreamHandler(sys.stdout)
logger.addHandler(console_logger)

# Log to file
file_logger = logging.FileHandler(LOG_FILE_PATH)
logger.addHandler(file_logger)

# Driver script output
logger.info('Calling external script')

# External script output
logger.info(
    subprocess.check_output(EXTERNAL_SCRIPT_PATH, shell=True)
)

# More driver script output
logger.info('Finished calling external script')

一如既往，小心shell=True。如果您可以将呼叫写为subprocess.check_output(['/path/to/script.py', 'arg1', 'arg2'])，请执行此操作！

以上是关于Python Popen communicate 和wait使用上的区别的主要内容，如果未能解决你的问题，请参考以下文章