为啥 pool.map 会删除数据操作?

Posted

技术标签:

【中文标题】为啥 pool.map 会删除数据操作?【英文标题】:Why is pool.map removing data operations?为什么 pool.map 会删除数据操作? 【发布时间】:2021-09-05 08:43:39 【问题描述】:

我有一个特定的函数,它在一个全局定义的 numpy 矩阵上运行并改变这个矩阵中的一些位置。我多次调用此函数并更改矩阵的多个点。当我对函数进行标准顺序调用时,这工作得非常好,并且符合我的期望。我想将它与池并行化,当我尝试这样做时,它不会保存函数产生的更改,并且当我在它之后打印时它只是原始的零矩阵。为什么会发生这种情况,解决方法是什么?附上代码:

all_mutations = np.zeros(10,10)
parallelMutate(all_mutation_settings[0])
parallelMutate(all_mutation_settings[1])
parallelMutate(all_mutation_settings[2])
print(all_mutations)
#THE ABOVE WOULD WORK
pool.map(parallelMutate, all_mutation_settings)
print(all_mutations)
#This would just give back the zero matrix

【问题讨论】:

你永远不会分配给all_mutations 我不确定你的意思。 all_mutations 是在函数外部定义的变量,并由函数的所有调用共享。当使用不同的设置运行时,我确实分配给函数内部的 all_mutations。为了清楚程序的结构,我稍微编辑了代码。 多个进程不共享状态。这在multiprocessing docs 中有非常清楚的描述 【参考方案1】:

您必须使用共享内存作为 np 数组的后备存储:

from multiprocessing import Pool, Array
import numpy as np
import ctypes

def to_numpy_array(shared_array, shape):
    '''Create a numpy array backed by a shared memory Array.'''
    arr = np.ctypeslib.as_array(shared_array)
    return arr.reshape(shape)

def to_shared_array(arr, ctype):
    # We do not have to provide a lock if each process is operatying on indidual cells of the array:
    shared_array = Array(ctype, arr.size, lock=False)
    temp = np.frombuffer(shared_array, dtype=arr.dtype)
    temp[:] = arr.flatten(order='C')
    return shared_array

def init_pool(shared_array, shape):
    """ Initialize global variable for each process in the process pool: """
    global all_mutations
    # create np array backed by shared array:
    all_mutations = to_numpy_array(shared_array, shape)

def parallelMutate(tpl):
    # unpack argument
    x, y, value = tpl
    all_mutations[x, y] = value

# required for Windows:
if __name__ == '__main__':
    # 10 by 10 array of doubles:
    all_mutations = np.zeros((10,10))
    shape = all_mutations.shape
    shared_array = to_shared_array(all_mutations, ctypes.c_double)
    # now use the shared array as the base:
    all_mutations = to_numpy_array(shared_array, shape)
    pool = Pool(3, initializer=init_pool, initargs=(shared_array, shape))
    all_mutation_settings = [(1,1,1), (2,2,2), (3,3,3)]
    pool.map(parallelMutate, all_mutation_settings)
    print(all_mutations)

打印:

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 2. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 3. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

【讨论】:

这是否令人满意地回答了您的问题?

以上是关于为啥 pool.map 会删除数据操作?的主要内容,如果未能解决你的问题,请参考以下文章

`multiprocessing.Pool.map()` 似乎安排错误

为啥删除大型 NTFS 卷上的文件夹后文件操作会挂起

在 Python 多处理中将 Pool.map 与共享内存数组结合起来

mysql查询in为啥用不上索引

.为啥被定义为参照表的表无法删除?

MS-SQL日志增长太快,为啥