强制关闭scrapy蜘蛛
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了强制关闭scrapy蜘蛛相关的知识,希望对你有一定的参考价值。
因此,在这个问题上,我几个星期以来一直在撞墙。我尝试了多种解决方案,但我无法获得优雅的工作。理想情况下,我需要在打开蜘蛛时检查文件,以及文件是否停止执行。我可以在解析方法中执行此操作,但这很丑陋且难以维护。我想我可能会写一些中间件来做这个但是现在我只想在我的每个蜘蛛中实现它。这是我到目前为止:
class MySpider(Spider):
def __init__(self):
dispatcher.connect(self.spider_opened, signals.spider_opened)
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(MySpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened)
return spider
def spider_opened(self):
raise CloseSpider("Testing force close")
这不起作用。我得到以下异常:
2018-06-15 13:05:46 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ?.spider_opened of <MySpider 'myspider' at 0x10c450050>>
Traceback (most recent call last):
File "/Users/.../Library/Python/2.7/lib/python/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "build/bdist.macosx-10.11-intel/egg/pydispatch/robustapply.py", line 55, in robustApply
File "/Users/.../myspider.py", line 72, in spider_opened
raise CloseSpider("Testing force close")
CloseSpider
在我的IDE中,pylint说:
E1101:Instance of 'Spider' has no 'spider_opened' member
谁能指出我的解决方案?是因为我正在运行Scrapy v1.3.0吗?
答案
应该是这样的:
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
class MySpider(Spider):
def __init__(self):
dispatcher.connect(self.spider_opened, signals.spider_opened)
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(MySpider, cls).from_crawler(crawler, *args, **kwargs)
return spider
def spider_opened(self):
raise CloseSpider("Testing force close")
注意你不能在crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened)
中声明classmethod
,因为应该在蜘蛛实例上调用spider_opened
。
以上是关于强制关闭scrapy蜘蛛的主要内容,如果未能解决你的问题,请参考以下文章