Python Scrapy：“runspider”和“crawl”命令有啥区别？

Posted 2023-03-12

技术标签:

【中文标题】Python Scrapy：“runspider”和“crawl”命令有啥区别？【英文标题】：Python Scrapy: What is the difference between "runspider" and "crawl" commands?Python Scrapy：“runspider”和“crawl”命令有什么区别？ 【发布时间】：2016-10-03 02:34:57 【问题描述】：

谁能解释一下 runspider 和 crawl 命令的区别？它们应该在哪些环境中使用？

【问题讨论】：

【参考方案1】：

两者的小解释和语法：

运行蜘蛛

语法：scrapy runspider <spider_file.py>

需要项目：否

在 Python 文件中运行自包含的蜘蛛，而无需创建项目。

使用示例：

$ scrapy runspider myspider.py

抓取

语法：scrapy crawl <spider>

需要项目：是

使用具有相应名称的蜘蛛开始抓取。

用法示例：

 $ scrapy crawl myspider

【讨论】：

【参考方案2】：

主要区别在于runspider 不需要项目。也就是说，您可以在myspider.py 文件中编写蜘蛛并调用scrapy runspider myspider.py。

crawl 命令需要一个项目才能找到项目的设置，从SPIDER_MODULES 设置加载可用的蜘蛛，并通过name 查找蜘蛛。

如果您需要快速爬虫来完成一项简短的任务，那么 runspider 所需的样板文件更少。

【讨论】：

那么runpider是如何进行设置的呢？ @hAcKnRoCk 这是可选的。如果您在项目目录中使用命令runspider，它将使用项目的设置。否则，它将以默认值运行。【参考方案3】：

在命令中：

scrapy crawl [options] <spider>

<spider> 是项目名称（在 settings.py 中定义为BOT_NAME）。

在命令中：

scrapy runspider [options] <spider_file>

<spider_file> 是包含蜘蛛的文件的路径。

否则选项相同：

Options
=======
--help, -h              show this help message and exit
-a NAME=VALUE           set spider argument (may be repeated)
--output=FILE, -o FILE  dump scraped items into FILE (use - for stdout)
--output-format=FORMAT, -t FORMAT
                        format to use for dumping items with -o

Global Options
--------------
--logfile=FILE          log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
                        log level (default: DEBUG)
--nolog                 disable logging completely
--profile=FILE          write python cProfile stats to FILE
--lsprof=FILE           write lsprof profiling stats to FILE
--pidfile=FILE          write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
                        set/override setting (may be repeated)
--pdb                   enable pdb on failure

由于runspider 不依赖于BOT_NAME 参数，根据您自定义刮板的方式，您可能会发现runspider 更灵活。

【讨论】：

以上是关于Python Scrapy：“runspider”和“crawl”命令有啥区别？的主要内容，如果未能解决你的问题，请参考以下文章