Python Multiprocessing,函数的一个参数是一个迭代器,Got TypeError

Posted

技术标签:

【中文标题】Python Multiprocessing,函数的一个参数是一个迭代器,Got TypeError【英文标题】:Python Multiprocessing, one argument of the function is a iterator, Got TypeError 【发布时间】:2021-03-30 07:44:35 【问题描述】:

我有这样的代码:

    import multiprocessing
    from itertools import product,imap,ifilter

    def test(it):
        for x in it:
            print x     
        return None


    mp_pool = multiprocessing.Pool(multiprocessing.cpu_count())
    it = imap(lambda x: ifilter(lambda y: x+y > 10, xrange(10)), xrange(10))
    result = mp_pool.map(test, it)

我收到错误消息:

     File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/lib64/python2.7/multiprocessing/pool.py", line 102, in worker
        task = get()
      File "/usr/lib64/python2.7/multiprocessing/queues.py", line 376, in get
        return recv()
        task = get()
      File "/usr/lib64/python2.7/multiprocessing/queues.py", line 376, in get
    TypeError: ifilter expected 2 arguments, got 0
        return recv()

多处理不能使用带有迭代器参数的函数?谢谢!

【问题讨论】:

This 线程可能是相关的。 【参考方案1】:

您的迭代器it 必须生成单个值(每个值都可以是“复杂的”,例如元组或列表)。现在我们有:

>>> it
<itertools.imap object at 0x000000000283DB70>
>>> list(it)
[<itertools.ifilter object at 0x000000000283DC50>, <itertools.ifilter object at 0x000000000283DF98>, <itertools.ifilter object at 0x000000000283DBE0>, <itertools.ifilter object at 0x000000000283DF60>, <itertools.ifilter object at 0x000000000283DB00>, <itertools.ifilter object at 0x000000000283DCC0>, <itertools.ifilter object at 0x000000000283DD30>, <itertools.ifilter object at 0x000000000283DDA0>, <itertools.ifilter object at 0x000000000283DE80>, <itertools.ifilter object at 0x000000000284F080>]

it 的每次迭代都会产生另一个迭代器,这就是您的问题的原因。

所以你必须“迭代你的迭代器”:

import multiprocessing
from itertools import imap, ifilter
import sys


def test(t):
    return 't = ' + str(t) # return value rather than printing


if __name__ == '__main__': # required for Windows
    mp_pool = multiprocessing.Pool(multiprocessing.cpu_count())
    it = imap(lambda x: ifilter(lambda y: x+y > 10, xrange(10)), xrange(10))
    for the_iterator in it:
        result = mp_pool.map(test, the_iterator)
        print result
    mp_pool.close() # needed to ensure all processes terminate
    mp_pool.join() # needed to ensure all processes terminate

如您所定义的it,打印的结果是:

[]
[]
['t = 9']
['t = 8', 't = 9']
['t = 7', 't = 8', 't = 9']
['t = 6', 't = 7', 't = 8', 't = 9']
['t = 5', 't = 6', 't = 7', 't = 8', 't = 9']
['t = 4', 't = 5', 't = 6', 't = 7', 't = 8', 't = 9']
['t = 3', 't = 4', 't = 5', 't = 6', 't = 7', 't = 8', 't = 9']
['t = 2', 't = 3', 't = 4', 't = 5', 't = 6', 't = 7', 't = 8', 't = 9']

但是,如果您想充分利用多处理(假设您有足够的处理器),那么您可以使用 map_async 以便可以一次提交所有作业:

import multiprocessing
from itertools import imap, ifilter
import sys


def test(t):
    return 't = ' + str(t) # return value rather than printing


if __name__ == '__main__': # required for Windows
    mp_pool = multiprocessing.Pool(multiprocessing.cpu_count())
    it = imap(lambda x: ifilter(lambda y: x+y > 10, xrange(10)), xrange(10))
    results = [mp_pool.map_async(test, the_iterator) for the_iterator in it]
    for result in results:
        print result.get()
    mp_pool.close() # needed to ensure all processes terminate
    mp_pool.join() # needed to ensure all processes terminate

或者您可以考虑使用my_pool.imap,它与my_pool.map_async 不同,它不会首先将可迭代参数转换为列表以确定用于提交作业的最佳chunksize 值(阅读文档,它是不太好),但默认情况下使用 chunksize 值 1,这对于非常大的可迭代对象通常是不可取的:

results = [mp_pool.imap(test, the_iterator) for the_iterator in it]
for result in results:
    print list(result) # to get a comparable printout as when using map_async

更新:使用多处理生成列表

import multiprocessing
from itertools import imap, ifilter
import sys


def test(t):
    return 't = ' + str(t) # return value rather than printing

def generate_lists(x):
    return list(ifilter(lambda y: x+y > 10, xrange(10)))

if __name__ == '__main__': # required for Windows
    mp_pool = multiprocessing.Pool(multiprocessing.cpu_count())
    lists = mp_pool.imap(generate_lists, xrange(10))
    # lists, returned by mp_pool.imap, is an iterable
    # as each element of lists becomes available it is passed to test:
    results = mp_pool.imap(test, lists)
    # as each result becomes available
    for result in results:
        print result
    mp_pool.close() # needed to ensure all processes terminate

打印:

t = []
t = []
t = [9]
t = [8, 9]
t = [7, 8, 9]
t = [6, 7, 8, 9]
t = [5, 6, 7, 8, 9]
t = [4, 5, 6, 7, 8, 9]
t = [3, 4, 5, 6, 7, 8, 9]
t = [2, 3, 4, 5, 6, 7, 8, 9]

【讨论】:

对不起,我的示例代码把你弄糊涂了!我的实际代码是迭代器的每次迭代都会产生另一个迭代器。在我的真实代码中,产生的迭代器产生值会很耗时,所以我想把产生的迭代器放到一个进程中产生值。 我已经更新了答案。我不确定您的迭代器 it 是否会产生您期望的结果。 我的代码和你的代码的区别在于我把迭代器作为函数的参数。在我的真实代码中,迭代器产生值会很耗时,所以我想把迭代器放到一个进程中产生值。 你的代码和我的代码的区别在于你的代码是非法的。 results = [mp_pool.map_async(test, the_iterator) for the_iterator in it](或使用 mp_pool.imap 的下一个版本)将尽可能并行处理(取决于您实际拥有的 CPU 数量)。如果您说迭代器本身很耗时,那么您的代码中没有任何内容使用多处理来生成迭代器。是说您想使用多处理来生成迭代器? 我想知道为什么我的代码是非法的。我想使用多处理在许多进程中迭代许多迭代器。

以上是关于Python Multiprocessing,函数的一个参数是一个迭代器,Got TypeError的主要内容,如果未能解决你的问题,请参考以下文章

在 Python 中使用 multiprocessing.Pool 和返回自定义对象的函数

python中multiprocessing模块

Python中multiprocessing.Pool运行带多个参数的函数

python中Multiprocessing

尽管将 Python multiprocessing.Lock 作为目标函数参数传递,但在并行化时为无

Python并发进程—multiprocessing 模块创建进程