为啥 pool.map 会删除数据操作?
Posted
技术标签:
【中文标题】为啥 pool.map 会删除数据操作?【英文标题】:Why is pool.map removing data operations?为什么 pool.map 会删除数据操作? 【发布时间】:2021-09-05 08:43:39 【问题描述】:我有一个特定的函数,它在一个全局定义的 numpy 矩阵上运行并改变这个矩阵中的一些位置。我多次调用此函数并更改矩阵的多个点。当我对函数进行标准顺序调用时,这工作得非常好,并且符合我的期望。我想将它与池并行化,当我尝试这样做时,它不会保存函数产生的更改,并且当我在它之后打印时它只是原始的零矩阵。为什么会发生这种情况,解决方法是什么?附上代码:
all_mutations = np.zeros(10,10)
parallelMutate(all_mutation_settings[0])
parallelMutate(all_mutation_settings[1])
parallelMutate(all_mutation_settings[2])
print(all_mutations)
#THE ABOVE WOULD WORK
pool.map(parallelMutate, all_mutation_settings)
print(all_mutations)
#This would just give back the zero matrix
【问题讨论】:
你永远不会分配给all_mutations
我不确定你的意思。 all_mutations 是在函数外部定义的变量,并由函数的所有调用共享。当使用不同的设置运行时,我确实分配给函数内部的 all_mutations。为了清楚程序的结构,我稍微编辑了代码。
多个进程不共享状态。这在multiprocessing docs 中有非常清楚的描述
【参考方案1】:
您必须使用共享内存作为 np
数组的后备存储:
from multiprocessing import Pool, Array
import numpy as np
import ctypes
def to_numpy_array(shared_array, shape):
'''Create a numpy array backed by a shared memory Array.'''
arr = np.ctypeslib.as_array(shared_array)
return arr.reshape(shape)
def to_shared_array(arr, ctype):
# We do not have to provide a lock if each process is operatying on indidual cells of the array:
shared_array = Array(ctype, arr.size, lock=False)
temp = np.frombuffer(shared_array, dtype=arr.dtype)
temp[:] = arr.flatten(order='C')
return shared_array
def init_pool(shared_array, shape):
""" Initialize global variable for each process in the process pool: """
global all_mutations
# create np array backed by shared array:
all_mutations = to_numpy_array(shared_array, shape)
def parallelMutate(tpl):
# unpack argument
x, y, value = tpl
all_mutations[x, y] = value
# required for Windows:
if __name__ == '__main__':
# 10 by 10 array of doubles:
all_mutations = np.zeros((10,10))
shape = all_mutations.shape
shared_array = to_shared_array(all_mutations, ctypes.c_double)
# now use the shared array as the base:
all_mutations = to_numpy_array(shared_array, shape)
pool = Pool(3, initializer=init_pool, initargs=(shared_array, shape))
all_mutation_settings = [(1,1,1), (2,2,2), (3,3,3)]
pool.map(parallelMutate, all_mutation_settings)
print(all_mutations)
打印:
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 2. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 3. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
【讨论】:
这是否令人满意地回答了您的问题?以上是关于为啥 pool.map 会删除数据操作?的主要内容,如果未能解决你的问题,请参考以下文章
`multiprocessing.Pool.map()` 似乎安排错误