scrapy_redis 设置

Posted wangdongpython

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了scrapy_redis 设置相关的知识,希望对你有一定的参考价值。

class MyCrawler(RedisCrawlSpider):
"""Spider that reads urls from redis queue (myspider:start_urls)."""
name = ‘mycrawler_redis‘
redis_key = ‘mycrawler:start_urls‘

rules = (
# follow all links
Rule(LinkExtractor(), callback=‘parse_page‘, follow=True),
)

def __init__(self, *args, **kwargs):
# Dynamically define the allowed domains list.
domain = kwargs.pop(‘domain‘, ‘‘)
self.allowed_domains = filter(None, domain.split(‘,‘))
super(MyCrawler, self).__init__(*args, **kwargs)

def parse_page(self, response):
return
‘name‘: response.css(‘title::text‘).extract_first(),
‘url‘: response.url,

以上是关于scrapy_redis 设置的主要内容,如果未能解决你的问题,请参考以下文章