子进程怎么操作父进程中的变量

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了子进程怎么操作父进程中的变量相关的知识,希望对你有一定的参考价值。

参考技术A 首先子进程和父进程,不共享进程空间,子进程不能够访问父进程的变量,可以这样说,
在执行fork之后,他们是不相关的两个进程。

至于为什么以地址形式,来访问为什么也没有变化,
这是因为这两个进程不共享进程空间,在linux中每个进程有4G的地址。
即子进程的 &m 尽管和父进程的相同,但他们在物理内存中位置不同。
至于父进程和子进程的&m ,为什么相同,这是因为fork时,直接拷贝的原因,
在具体怎么实现的可以看一下fork的源码,也不长。

父进程全局变量如何复制到python多处理中的子进程

【中文标题】父进程全局变量如何复制到python多处理中的子进程【英文标题】:How are parent process global variables copied to sub-processes in python multiprocessing 【发布时间】:2021-07-24 04:59:54 【问题描述】:

Ubuntu 20.04

我对python中不同子进程访问全局变量的理解是这样的:

    全局变量(比如b)可用于写入时复制容量中的每个子进程 如果子进程修改该变量,则首先创建b 的副本,然后修改该副本。父进程看不到此更改(稍后我将就这部分提出问题)

我做了一些实验,试图了解对象何时被复制。我无法得出太多结论:

实验:

import numpy as np
import multiprocessing as mp
import psutil
b=np.arange(200000000).reshape(-1,100).astype(np.float64)

然后我尝试使用下面提到的函数查看内存消耗如何变化:

def f2():
    print(psutil.virtual_memory().used/(1024*1024*1024))
    global b
    print(psutil.virtual_memory().used/(1024*1024*1024))
    b = b + 1 ### I changed this statement to study the different memory behaviors. I am posting the results for different statements in place of b = b + 1.
    print(psutil.virtual_memory().used/(1024*1024*1024))

p2 = mp.Process(target=f2)
p2.start()
p2.join()

结果格式:

statement used in place of b = b + 1
print 1
print 2
print 3
Comments and questions

结果:

b = b+1
6.571144104003906
6.57244873046875
8.082862854003906 
Only a copy-on-write view was provided so no memory consumption till it hit b = b+1. At which point a copy of b was created and hence the memory usage spike

b[:, 1] = b[:, 1] + 1
6.6118621826171875
6.613414764404297
8.108139038085938
Only a copy-on-write view was provided so no memory consumption till it hit b[:, 1] = b[:, 1] + 1. It seems that even if some part of the memory is to be updated (here just one column) the entire object would be copied. Seems fair (so far)

b[0, :] = b[0, :] + 1
6.580562591552734
6.581851959228516
6.582511901855469
NO MEMORY CHANGE! When I tried to modify a column it copied the entire b. But when I try to modify a row, it does not create a copy? Can you please explain what happened here?


b[0:100000, :] = b[0:100000, :] + 1
6.572498321533203
6.5740814208984375
6.656215667724609
Slight memory spike. Assuming a partial copy since I modified just the first 1/20th of the rows. But that would mean that while modifying a column as well some partial copy should have been created, unlike the full copy that we saw in case 2 above. No? Can you please explain what happened here as well?

b[0:500000, :] = b[0:500000, :] + 1
6.593017578125
6.594577789306641
6.970676422119141
The assumption of partial copy was right I think. A moderate memory spike to reflect the change in 1/4th of the total rows

b[0:1000000, :] = b[0:1000000, :] + 1
6.570674896240234
6.5723876953125
7.318485260009766
In-line with partial copy hypothesis


b[0:2000000, :] = b[0:2000000, :] + 1
6.594249725341797
6.596080780029297
8.087333679199219
A full copy since now we are modifying the entire array. This is equal to b = b + 1 only. Just that we have now referred using a slice of all the rows

b[0:2000000, 1] = b[0:2000000, 1] + 1
6.564876556396484
6.566963195800781
8.069766998291016
Again full copy. It seems in the case of row slices a partial copy is getting created and in the case of a column slice, a full copy is getting created which, is weird to me. Can you please help me understand what the exact copy semantics of global variables of a child process are?

如您所见,我没有找到一种方法来证明我在描述的实验设置中看到的结果是正确的。你能帮我理解父进程的全局变量是如何在子进程全部/部分修改时复制的吗?

我也有read那个:

子进程获得父内存空间的写时复制视图。只要您在触发进程之前加载数据集,并且您没有在多处理调用中传递对该内存空间的引用(也就是说,工作人员应该直接使用全局变量),那么就没有副本。

问题 1:“只要在触发进程之前加载数据集,并且在多处理调用中不传递对该内存空间的引用(也就是说,worker 应该直接使用全局变量)”是什么意思? ),那么没有副本”是什么意思?

正如下面蒂姆·罗伯茨先生所回答的那样,这意味着 -

如果您将数据集作为参数传递,那么 Python 必须制作一个副本才能将其传递过来。参数传递机制不使用写时复制,部分原因是引用计数的东西会被混淆。当您在事情开始之前将其创建为全局时,有一个可靠的参考,因此多处理代码可以实现写时复制。

但是,我无法验证此行为。这是我运行的几个测试来验证

import numpy as np
import multiprocessing as mp
import psutil
b=np.arange(200000000).reshape(-1,100).astype(np.float64)

然后我尝试使用下面提到的函数查看内存消耗如何变化:

def f2(b): ### Please notice that the array is passed as an argument and not picked as the global variable of parent process
    print(psutil.virtual_memory().used/(1024*1024*1024))
    b = b + 1 ### I changed this statement to study the different memory behaviors. I am posting the results for different statements in place of b = b + 1.
    print(psutil.virtual_memory().used/(1024*1024*1024))

print(psutil.virtual_memory().used/(1024*1024*1024))
p2 = mp.Process(target=f2,args=(b,)) ### Please notice that the array is passed as an argument and not picked as the global variable of parent process
p2.start()
p2.join()

结果格式:同上

结果:

b = b+1
6.692680358886719
6.69635009765625
8.189273834228516
The second print is arising from within the function hence, by then the copy should have been made and we should see the second print to be around 8.18

b = b
6.699306488037109
6.701808929443359
6.702671051025391
The second and third print should have been around 8.18. The results suggest that no copy is created even though the array b is passed to the function as an argument

【问题讨论】:

【参考方案1】:

写时复制一次只处理一个虚拟内存页。只要您的更改在一个 4096 字节的页面内,您只需为该页面付费。当您修改一列时,您的更改会分布在许多页面上。我们 Python 程序员不习惯担心物理内存中的布局,但这就是问题所在。

问题 1:如果您将数据集作为参数传递,那么 Python 必须制作一个副本才能将其传递过来。参数传递机制不使用写时复制,部分原因是引用计数的东西会被混淆。当您在事情开始之前将其创建为全局时,有一个可靠的参考,因此多处理代码可以实现写时复制。

【讨论】:

您也可以回答我的问题1吗?然后我可以接受你的回答并结束这篇文章 感谢您也回答问题 1 先生。一开始,我也是这么想的。但是,我进行了一些测试来验证它,结果并不同意。似乎将b 作为参数传递的行为与直接从父进程的全局命名空间中选择b 的行为完全相同。你能帮我理解我哪里出错了吗?我已经修改了问题供您参考 自您阅读文章以来,情况可能有所改善。也许这取决于操作系统。也有可能还有其他内存管理魔法正在发生;测量虚拟内存系统中的内存使用情况并不是一门精确的科学。

以上是关于子进程怎么操作父进程中的变量的主要内容,如果未能解决你的问题,请参考以下文章

什么是父进程和子进程?

fork函数的一些小结

boost子进程怎么获取父进程的参数

什么是子进程和父进程

shell 笔记

Linux多进程编程实例