Python 2.6:在使用 multiprocessing.Pool 时处理本地存储
Posted
技术标签:
【中文标题】Python 2.6:在使用 multiprocessing.Pool 时处理本地存储【英文标题】:Python 2.6: Process local storage while using multiprocessing.Pool 【发布时间】:2011-03-18 17:42:25 【问题描述】:我正在尝试构建一个 python 脚本,该脚本具有跨大量数据的工作进程池(使用 mutiprocessing.Pool)。
我希望每个进程都有一个唯一的对象,该对象可以在该进程的多次执行中使用。
伪代码:
def work(data):
#connection should be unique per process
connection.put(data)
print 'work done with connection:', connection
if __name__ == '__main__':
pPool = Pool() # pool of 4 processes
datas = [1..1000]
for process in pPool:
#this is the part i'm asking about // how do I really do this?
process.connection = Connection(conargs)
for data in datas:
pPool.apply_async(work, (data))
【问题讨论】:
【参考方案1】:我认为这样的东西应该可以工作(未经测试)
def init(*args):
global connection
connection = Connection(*args)
pPool = Pool(initializer=init, initargs=conargs)
【讨论】:
你能把它标记为答案吗?【参考方案2】:直接创建mp.Process
es 可能是最简单的(没有mp.Pool
):
import multiprocessing as mp
import time
class Connection(object):
def __init__(self,name):
self.name=name
def __str__(self):
return self.name
def work(inqueue,conn):
name=mp.current_process().name
while 1:
data=inqueue.get()
time.sleep(.5)
print('n: work done with connection c on data d'.format(
n=name,c=conn,d=data))
inqueue.task_done()
if __name__ == '__main__':
N=4
procs=[]
inqueue=mp.JoinableQueue()
for i in range(N):
conn=Connection(name='Conn-'+str(i))
proc=mp.Process(target=work,name='Proc-'+str(i),args=(inqueue,conn))
proc.daemon=True
proc.start()
datas = range(1,11)
for data in datas:
inqueue.put(data)
inqueue.join()
产量
Proc-0: work done with connection Conn-0 on data 1
Proc-1: work done with connection Conn-1 on data 2
Proc-3: work done with connection Conn-3 on data 3
Proc-2: work done with connection Conn-2 on data 4
Proc-0: work done with connection Conn-0 on data 5
Proc-1: work done with connection Conn-1 on data 6
Proc-3: work done with connection Conn-3 on data 7
Proc-2: work done with connection Conn-2 on data 8
Proc-0: work done with connection Conn-0 on data 9
Proc-1: work done with connection Conn-1 on data 10
注意Proc
号码每次都对应同一个Conn
号码。
【讨论】:
【参考方案3】:进程本地存储很容易实现为映射容器,对于从 Google 到这里寻找类似东西的其他人来说(注意这是 Py3,但很容易转换为 2 的语法(只是继承自 object
):
class ProcessLocal:
"""
Provides a basic per-process mapping container that wipes itself if the current PID changed since the last get/set.
Aka `threading.local()`, but for processes instead of threads.
"""
__pid__ = -1
def __init__(self, mapping_factory=dict):
self.__mapping_factory = mapping_factory
def __handle_pid(self):
new_pid = os.getpid()
if self.__pid__ != new_pid:
self.__pid__, self.__store = new_pid, self.__mapping_factory()
def __delitem__(self, key):
self.__handle_pid()
return self.__store.__delitem__(key)
def __getitem__(self, key):
self.__handle_pid()
return self.__store.__getitem__(key)
def __setitem__(self, key, val):
self.__handle_pid()
return self.__store.__setitem__(key)
查看更多@https://github.com/akatrevorjay/pytutils/blob/develop/pytutils/mappings.py
【讨论】:
【参考方案4】:你想让一个对象驻留在共享内存中,对吧?
Python 在其标准库中对此提供了一些支持,但它有点差。据我所知,只能存储整数和其他一些原始类型。
尝试 POSH(Python 对象共享):http://poshmodule.sourceforge.net/
【讨论】:
以上是关于Python 2.6:在使用 multiprocessing.Pool 时处理本地存储的主要内容,如果未能解决你的问题,请参考以下文章