在共享内存中使用 numpy 数组进行多处理

Posted 2023-02-16

技术标签:

【中文标题】在共享内存中使用 numpy 数组进行多处理【英文标题】：Use numpy array in shared memory for multiprocessing 【发布时间】：2011-12-15 05:44:02 【问题描述】：

我想在共享内存中使用一个 numpy 数组来与多处理模块一起使用。困难在于像使用 numpy 数组一样使用它，而不仅仅是作为 ctypes 数组。

from multiprocessing import Process, Array
import scipy

def f(a):
    a[0] = -a[0]

if __name__ == '__main__':
    # Create the array
    N = int(10)
    unshared_arr = scipy.rand(N)
    arr = Array('d', unshared_arr)
    print "Originally, the first two elements of arr = %s"%(arr[:2])

    # Create, start, and finish the child processes
    p = Process(target=f, args=(arr,))
    p.start()
    p.join()

    # Printing out the changed values
    print "Now, the first two elements of arr = %s"%arr[:2]

这会产生如下输出：

Originally, the first two elements of arr = [0.3518653236697369, 0.517794725524976]
Now, the first two elements of arr = [-0.3518653236697369, 0.517794725524976]

可以以 ctypes 方式访问数组，例如arr[i] 有道理。但是，它不是一个numpy数组，我无法执行-1*arr或arr.sum()之类的操作。我想一个解决方案是将 ctypes 数组转换为 numpy 数组。但是（除了无法完成这项工作），我不相信它会再被分享。

对于必须成为常见问题的问题，似乎会有一个标准解决方案。

【问题讨论】：

和这个不一样？ ***.com/questions/5033799/… 这不是同一个问题。链接的问题是询问subprocess 而不是multiprocessing。 【参考方案1】：

添加到@unutbu（不再可用）和@Henry Gomersall 的答案。您可以在需要时使用shared_arr.get_lock() 同步访问：

shared_arr = mp.Array(ctypes.c_double, N)
# ...
def f(i): # could be anything numpy accepts as an index such another numpy array
    with shared_arr.get_lock(): # synchronize access
        arr = np.frombuffer(shared_arr.get_obj()) # no data copying
        arr[i] = -arr[i]

示例

import ctypes
import logging
import multiprocessing as mp

from contextlib import closing

import numpy as np

info = mp.get_logger().info

def main():
    logger = mp.log_to_stderr()
    logger.setLevel(logging.INFO)

    # create shared array
    N, M = 100, 11
    shared_arr = mp.Array(ctypes.c_double, N)
    arr = tonumpyarray(shared_arr)

    # fill with random values
    arr[:] = np.random.uniform(size=N)
    arr_orig = arr.copy()

    # write to arr from different processes
    with closing(mp.Pool(initializer=init, initargs=(shared_arr,))) as p:
        # many processes access the same slice
        stop_f = N // 10
        p.map_async(f, [slice(stop_f)]*M)

        # many processes access different slices of the same array
        assert M % 2 # odd
        step = N // 10
        p.map_async(g, [slice(i, i + step) for i in range(stop_f, N, step)])
    p.join()
    assert np.allclose(((-1)**M)*tonumpyarray(shared_arr), arr_orig)

def init(shared_arr_):
    global shared_arr
    shared_arr = shared_arr_ # must be inherited, not passed as an argument

def tonumpyarray(mp_arr):
    return np.frombuffer(mp_arr.get_obj())

def f(i):
    """synchronized."""
    with shared_arr.get_lock(): # synchronize access
        g(i)

def g(i):
    """no synchronization."""
    info("start %s" % (i,))
    arr = tonumpyarray(shared_arr)
    arr[i] = -1 * arr[i]
    info("end   %s" % (i,))

if __name__ == '__main__':
    mp.freeze_support()
    main()

如果您不需要同步访问或创建自己的锁，则不需要mp.Array()。在这种情况下，您可以使用 mp.sharedctypes.RawArray。

【讨论】：

漂亮的答案！如果我想拥有多个共享数组，每个共享数组都可以单独锁定，但数组的数量在运行时确定，这是您在这里所做的直接扩展吗？ @Andrew：应该在子进程产生之前创建共享数组。关于操作顺序的要点。不过，这就是我的想法：创建用户指定数量的共享数组，然后生成一些子进程。这很简单吗？ @Chicony：你不能改变数组的大小。将其视为必须在子进程启动之前分配的共享内存块。您不需要使用所有内存，例如，您可以将count 传递给numpy.frombuffer()。您可以尝试使用mmap 或posix_ipc 之类的东西在较低级别上直接实现可调整大小（可能涉及在调整大小时复制）RawArray 模拟（或查找现有库）。或者，如果您的任务允许：分部分复制数据（如果您不需要一次全部）。 “如何调整共享内存的大小”是一个很好的单独问题。 @umopapisdn:Pool()定义进程数（默认使用可用CPU核数）。 M 是 f() 函数被调用的次数。【参考方案2】：

虽然已经给出的答案很好，但只要满足两个条件，这个问题就有一个更简单的解决方案：

POSIX 兼容

只读访问权限

在这种情况下，您不需要显式地使变量共享，因为子进程将使用 fork 创建。分叉的孩子自动共享父母的内存空间。在 Python 多处理的上下文中，这意味着它共享所有 模块级 变量；请注意，对于您显式传递给子进程或您在 multiprocessing.Pool 上调用的函数的参数，这 不成立。

一个简单的例子：

import multiprocessing
import numpy as np

# will hold the (implicitly mem-shared) data
data_array = None

# child worker function
def job_handler(num):
    # built-in id() returns unique memory ID of a variable
    return id(data_array), np.sum(data_array)

def launch_jobs(data, num_jobs=5, num_worker=4):
    global data_array
    data_array = data

    pool = multiprocessing.Pool(num_worker)
    return pool.map(job_handler, range(num_jobs))

# create some random data and execute the child jobs
mem_ids, sumvals = zip(*launch_jobs(np.random.rand(10)))

# this will print 'True' on POSIX OS, since the data was shared
print(np.all(np.asarray(mem_ids) == id(data_array)))

【讨论】：

+1 非常有价值的信息。你能解释一下为什么只有模块级的变量是共享的吗？为什么本地变量不是父级内存空间的一部分？例如，如果我有一个带有本地 var V 的函数 F 和一个 F 内部引用 V 的函数 G，为什么这不能工作？警告：这个答案有点欺骗性。子进程在分叉时接收到父进程状态的副本，包括全局变量。这些状态绝不是同步的，并且会从那一刻开始发散。这种技术在某些情况下可能有用（例如：分叉出每个处理父进程快照然后终止的临时子进程），但在其他情况下无用（例如：必须共享和共享的长时间运行的子进程）与父进程同步数据）。 @EelkeSpaak：您的说法 - “一个分叉的孩子自动共享父母的内存空间” - 是不正确的。如果我有一个子进程想要以严格的只读方式监视父进程的状态，则分叉不会让我到达那里：子进程只会在分叉时看到父状态的快照。事实上，当我发现这个限制时，这正是我试图做的（按照你的回答）。因此，您的答案附有附言。简而言之：父状态不是“共享的”，而只是复制给子状态。这不是通常意义上的“分享”。我是否错误地认为这是一种写时复制的情况，至少在 posix 系统上是这样？也就是说，在fork之后，我认为内存是共享的，直到写入新数据，此时会创建一个副本。所以是的，数据确实不是完全“共享”的，但它可以提供潜在的巨大性能提升。如果您的进程是只读的，那么将没有复制开销！我是否理解正确？ @senderle 是的，这正是我的意思！因此，我在关于只读访问的答案中的观点 (2)。【参考方案3】：

我编写了一个小型 python 模块，它使用 POSIX 共享内存在 python 解释器之间共享 numpy 数组。也许你会发现它很方便。

https://pypi.python.org/pypi/SharedArray

它是这样工作的：

import numpy as np
import SharedArray as sa

# Create an array in shared memory
a = sa.create("test1", 10)

# Attach it as a different array. This can be done from another
# python interpreter as long as it runs on the same computer.
b = sa.attach("test1")

# See how they are actually sharing the same memory block
a[0] = 42
print(b[0])

# Destroying a does not affect b.
del a
print(b[0])

# See how "test1" is still present in shared memory even though we
# destroyed the array a.
sa.list()

# Now destroy the array "test1" from memory.
sa.delete("test1")

# The array b is not affected, but once you destroy it then the
# data are lost.
print(b[0])

【讨论】：

【参考方案4】：

您可以使用sharedmem 模块：https://bitbucket.org/cleemesser/numpy-sharedmem

这是您的原始代码，这次使用行为类似于 NumPy 数组的共享内存（注意最后一条调用 NumPy sum() 函数的附加语句）：

from multiprocessing import Process
import sharedmem
import scipy

def f(a):
    a[0] = -a[0]

if __name__ == '__main__':
    # Create the array
    N = int(10)
    unshared_arr = scipy.rand(N)
    arr = sharedmem.empty(N)
    arr[:] = unshared_arr.copy()
    print "Originally, the first two elements of arr = %s"%(arr[:2])

    # Create, start, and finish the child process
    p = Process(target=f, args=(arr,))
    p.start()
    p.join()

    # Print out the changed values
    print "Now, the first two elements of arr = %s"%arr[:2]

    # Perform some NumPy operation
    print arr.sum()

【讨论】：

注意：这不再被开发并且似乎不适用于linux github.com/sturlamolden/sharedmem-numpy/issues/4【参考方案5】：

Array 对象有一个与之关联的get_obj() 方法，该方法返回呈现缓冲区接口的 ctypes 数组。我认为以下应该可行...

from multiprocessing import Process, Array
import scipy
import numpy

def f(a):
    a[0] = -a[0]

if __name__ == '__main__':
    # Create the array
    N = int(10)
    unshared_arr = scipy.rand(N)
    a = Array('d', unshared_arr)
    print "Originally, the first two elements of arr = %s"%(a[:2])

    # Create, start, and finish the child process
    p = Process(target=f, args=(a,))
    p.start()
    p.join()

    # Print out the changed values
    print "Now, the first two elements of arr = %s"%a[:2]

    b = numpy.frombuffer(a.get_obj())

    b[0] = 10.0
    print a[0]

运行时，这会打印出 a 的第一个元素现在是 10.0，显示 a 和 b 只是同一内存中的两个视图。

为了确保它仍然是多处理器安全的，我相信您将不得不使用存在于 Array 对象 a 上的 acquire 和 release 方法，以及它的内置锁来使确保所有内容都可以安全访问（尽管我不是多处理器模块方面的专家）。

【讨论】：

没有同步就无法工作，正如@unutbu 在他的（现已删除）答案中所展示的那样。想必，如果你只是想访问数组后处理，可以干干净净的，不用担心并发问题和锁定？在这种情况下你不需要mp.Array。处理代码可能需要锁定数组，但数据的后处理解释可能不一定。我想这来自于理解问题到底是什么。显然，同时访问共享数据需要一些保护，我认为这很明显！

以上是关于在共享内存中使用 numpy 数组进行多处理的主要内容，如果未能解决你的问题，请参考以下文章

在 python 多处理中传递共享内存变量

多处理中的共享内存对象

在多处理进程之间共享大型只读 Numpy 数组

使用 numpy 数组和共享内存并行化 python 循环

在 Python 多处理中将 Pool.map 与共享内存数组结合起来

『Numpy』内存分析_利用共享内存创建数组