python gevent爬虫http://blog.hownowstephen.com/post/50743415449/gevent-tutorial

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python gevent爬虫http://blog.hownowstephen.com/post/50743415449/gevent-tutorial相关的知识,希望对你有一定的参考价值。

# monkey-patch
import gevent.monkey
gevent.monkey.patch_all()

import gevent.pool
import gevent.queue

import sys
import re
import requests

# Prepare a pool for 5 workers and a messaging queue
pool = gevent.pool.Pool(5)
queue = gevent.queue.Queue()
crawled = 0

def crawler():
    '''A very simple queued gevent web crawler'''

    print 'starting crawler...'
    global crawled

    while 1:
        try:
            u = queue.get(timeout=0)
            response = requests.get(u)
            print response.status_code, u

            # Extract some links to follow
            for link in re.findall('<a href="(http.*?)"', response.content):
                # Limit to 10 pages (ignores links when the pool is already full)
                if crawled < 10:
                    crawled += 1
                    queue.put(link)

        except gevent.queue.Empty:
            break

    print 'stopping crawler...'

queue.put(sys.argv[1])
pool.spawn(crawler)

while not queue.empty() and not pool.free_count() == 5:
    gevent.sleep(0.1)
    for x in xrange(0, min(queue.qsize(), pool.free_count())):
        pool.spawn(crawler)

# Wait for everything to complete
pool.join()

以上是关于python gevent爬虫http://blog.hownowstephen.com/post/50743415449/gevent-tutorial的主要内容,如果未能解决你的问题,请参考以下文章

python 爬虫爱好者必须掌握的知识点“ 协程爬虫”,看一下如何用 gevent 采集女生用头像

python_爬虫

#yyds干货盘点# python 爬虫爱好者必须掌握的知识点“ 协程爬虫”,看一下如何用 gevent 采集女生用头像

gevent 实现io自动切换,gevent.join([]), gevent.spawn, 爬虫多并发的实现

爬虫小案例:多协程工作

Python爬虫采集青年大学习答案