如何在python 2.7中使用pymongo进行多处理池

Question

我正在与Pymongo和Multiprocessing Pool一起运行10个进程并从API获取数据并将输出插入到mongodb中。

我认为我编写代码的方式做错了，因为python显示双连接打开比通常情况要多;例如：如果我运行10个进程，Mongodb将输出20个或更多连接已建立，我将在启动时收到以下警告：

UserWarning：MongoClient在fork之前打开。使用connect = False创建MongoClient，或者在分叉后创建客户端。有关详细信息，请参阅PyMongo的文档：http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>

甚至我在mongodb的连接器客户端输入connect = False。这是一个示例代码，用于了解我如何使用pymongo并请求API在池中发送请求：

# -*- coding: utf-8 -*-
#!/usr/bin/python

import json # to decode and encode json
import requests # web POST and GET requests. 
from pymongo import MongoClient # the mongo driver / connector
from bson import ObjectId # to generate bson object for MongoDB
from multiprocessing import Pool # for the multithreading

# Create the mongoDB Database object, declare collections
client = MongoClient('mongodb://192.168.0.1:27017,192.168.0.2:27017./?replicaSet=rs0', maxPoolSize=20, connect=False)
index = client.database.index
users = client.database.users

def get_user(userid):

    params = {"userid":userid}
    r = requests.get("https://exampleapi.com/getUser",params=params)
    j = json.loads(r.content)
    return j

def process(index_line):

    user = get_user(index_line["userid"])
    if(user):
        users.insert(user)

def main():

    # limit to 100,000 lines of data each loop
    limited = 100
    # skip number of lines for the loop (getting updated)
    skipped = 0
    while True:
        # get cursor with data from index collection
        cursor = index.find({},no_cursor_timeout=True).skip(skipped).limit(limited)
        # prepare the pool with threads
        p = Pool(10)
        # start multiprocessing the pool with the dataset
        p.map(process, cursor)
        # after pool finished, kill it with fire
        p.close()
        p.terminate()
        p.join()
        # after finishing the 100k lines, go for another round, inifnite.
        skipped = skipped + limited
        print "[-] Skipping %s " % skipped

if __name__ == '__main__':
    main()

我的代码算法有什么问题吗？有没有办法让它更有效率，更好地工作，更好地控制我的游泳池？

我已经研究了很长一段时间但是找不到办法以更好的方式做我想做的事情，希望得到一些帮助。

谢谢。