来自 concurrent.futures 的 ProcessPoolExecutor 比 multiprocessing.Pool 慢

Posted 2023-02-16

技术标签:

【中文标题】来自 concurrent.futures 的 ProcessPoolExecutor 比 multiprocessing.Pool 慢【英文标题】：ProcessPoolExecutor from concurrent.futures way slower than multiprocessing.Pool 【发布时间】：2013-09-11 08:46:41 【问题描述】：

我正在试验 Python 3.2 中引入的新的闪亮 concurrent.futures 模块，我注意到，几乎使用相同的代码，使用来自 concurrent.futures 的 Pool 比使用 Pool 慢方式 multiprocessing.Pool.

这是使用多处理的版本：

def hard_work(n):
    # Real hard work here
    pass

if __name__ == '__main__':
    from multiprocessing import Pool, cpu_count

    try:
        workers = cpu_count()
    except NotImplementedError:
        workers = 1
    pool = Pool(processes=workers)
    result = pool.map(hard_work, range(100, 1000000))

这是使用concurrent.futures：

def hard_work(n):
    # Real hard work here
    pass

if __name__ == '__main__':
    from concurrent.futures import ProcessPoolExecutor, wait
    from multiprocessing import cpu_count
    try:
        workers = cpu_count()
    except NotImplementedError:
        workers = 1
    pool = ProcessPoolExecutor(max_workers=workers)
    result = pool.map(hard_work, range(100, 1000000))

使用来自 Eli Bendersky article 的简单分解函数，这些是在我的计算机（i7、64 位、Arch Linux）上的结果：

[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:10] $ time python pool_multiprocessing.py 

real    0m10.330s
user    1m13.430s
sys 0m0.260s
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:29] $ time python pool_futures.py 

real    4m3.939s
user    6m33.297s
sys 0m54.853s

我无法使用 Python 分析器分析这些，因为我遇到了 pickle 错误。有什么想法吗？

【问题讨论】：

你能发布更新吗？也许是 3.8 版？ 【参考方案1】：

当使用concurrent.futures 中的map 时，从可迭代的is submitted separately 到执行程序的每个元素都会为每个调用创建一个Future 对象。然后它返回一个迭代器，该迭代器产生期货返回的结果。Future 对象是相当重量级的，它们做了很多工作来允许它们提供的所有功能（如回调、取消能力、检查状态、. ..)。

与此相比，multiprocessing.Pool 的开销要少得多。它批量提交作业（减少 IPC 开销），并直接使用函数返回的结果。对于大批量的工作，多处理绝对是更好的选择。

如果您想汇总长期运行的作业，而开销并不那么重要，您希望通过回调收到通知或不时检查它们是否已完成或能够取消单独执行。

个人笔记：

我真的想不出太多使用Executor.map 的理由——它没有给你任何期货的特性——除了指定超时的能力。如果您只对结果感兴趣，最好使用multiprocessing.Pool 的地图函数之一。

【讨论】：

非常感谢您的回答！大概分批提交是这里的关键。不管怎样，在 Python 3.5 中，ProcessPoolExecutor.map 将接受 chunksize 关键字参数，这将在一定程度上缓解 IPC 开销问题。请参阅此bug 了解更多信息。另外，在 Python 3.2 中，您可以为多进程池设置 maxtasksperchild，在我的例子中，这有助于在每个工作人员完成其工作负载后清理资源。 link 我更喜欢ProcessPoolExecutor.map()，因为mp.Pool.map()中的this bug 看起来@Ciprian 提到的错误仍然存在，并且有一些未完成的尝试修复它，最新的是github.com/python/cpython/pull/16103

以上是关于来自 concurrent.futures 的 ProcessPoolExecutor 比 multiprocessing.Pool 慢的主要内容，如果未能解决你的问题，请参考以下文章