为啥 pool.map 会删除数据操作？

Posted 2023-02-16

技术标签:

【中文标题】为啥 pool.map 会删除数据操作？【英文标题】：Why is pool.map removing data operations?为什么 pool.map 会删除数据操作？ 【发布时间】：2021-09-05 08:43:39 【问题描述】：

我有一个特定的函数，它在一个全局定义的 numpy 矩阵上运行并改变这个矩阵中的一些位置。我多次调用此函数并更改矩阵的多个点。当我对函数进行标准顺序调用时，这工作得非常好，并且符合我的期望。我想将它与池并行化，当我尝试这样做时，它不会保存函数产生的更改，并且当我在它之后打印时它只是原始的零矩阵。为什么会发生这种情况，解决方法是什么？附上代码：

all_mutations = np.zeros(10,10)
parallelMutate(all_mutation_settings[0])
parallelMutate(all_mutation_settings[1])
parallelMutate(all_mutation_settings[2])
print(all_mutations)
#THE ABOVE WOULD WORK
pool.map(parallelMutate, all_mutation_settings)
print(all_mutations)
#This would just give back the zero matrix

【问题讨论】：

你永远不会分配给all_mutations 我不确定你的意思。 all_mutations 是在函数外部定义的变量，并由函数的所有调用共享。当使用不同的设置运行时，我确实分配给函数内部的 all_mutations。为了清楚程序的结构，我稍微编辑了代码。 多个进程不共享状态。这在multiprocessing docs 中有非常清楚的描述 【参考方案1】：

您必须使用共享内存作为 np 数组的后备存储：

from multiprocessing import Pool, Array
import numpy as np
import ctypes

def to_numpy_array(shared_array, shape):
    '''Create a numpy array backed by a shared memory Array.'''
    arr = np.ctypeslib.as_array(shared_array)
    return arr.reshape(shape)

def to_shared_array(arr, ctype):
    # We do not have to provide a lock if each process is operatying on indidual cells of the array:
    shared_array = Array(ctype, arr.size, lock=False)
    temp = np.frombuffer(shared_array, dtype=arr.dtype)
    temp[:] = arr.flatten(order='C')
    return shared_array

def init_pool(shared_array, shape):
    """ Initialize global variable for each process in the process pool: """
    global all_mutations
    # create np array backed by shared array:
    all_mutations = to_numpy_array(shared_array, shape)

def parallelMutate(tpl):
    # unpack argument
    x, y, value = tpl
    all_mutations[x, y] = value

# required for Windows:
if __name__ == '__main__':
    # 10 by 10 array of doubles:
    all_mutations = np.zeros((10,10))
    shape = all_mutations.shape
    shared_array = to_shared_array(all_mutations, ctypes.c_double)
    # now use the shared array as the base:
    all_mutations = to_numpy_array(shared_array, shape)
    pool = Pool(3, initializer=init_pool, initargs=(shared_array, shape))
    all_mutation_settings = [(1,1,1), (2,2,2), (3,3,3)]
    pool.map(parallelMutate, all_mutation_settings)
    print(all_mutations)

打印：

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 2. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 3. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

【讨论】：

这是否令人满意地回答了您的问题？

以上是关于为啥 pool.map 会删除数据操作？的主要内容，如果未能解决你的问题，请参考以下文章