Python爬虫编程思想（152）：使用Scrapy抓取数据，使用ItemLoader保存多条抓取的数据

Posted 2022-06-30 蒙娜丽宁

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Python爬虫编程思想（152）：使用Scrapy抓取数据，使用ItemLoader保存多条抓取的数据相关的知识，希望对你有一定的参考价值。

在上一篇文章中通过ItemLoader保存了一条抓取的数据，如果要保存多条或所有抓取的数据，就需要parse方法返回一个MyscrapyItem数组。

下面的例子仍然会抓取上一篇文章例子中的博客列表页面，但会保存抓取页面所有的博客数据，包括每条博客的标题、摘要和Url。

import scrapy
from scrapy.loader import *
from scrapy.loader.processors import *
from bs4 import *
from myscrapy.items import MyscrapyItem
class ItemLoaderSpider1(scrapy.Spider):
    name = \'ItemLoaderSpider1\'
    start_urls = [
        \'https://geekori.com/blogsCenter.php?uid=geekori\'
    ]
    def parse(self,response):
        # 要返回的MyscrapyItem对象数组  
        items = []

        # 获取博客页面的博客列表数据
        sectionList = response.xpath(\'//*[@id="all"]/div[1]/section\').extract()
        # 通过循环迭代处理每一条博客列表数据  
        for section in sectionList:

以上是关于Python爬虫编程思想（152）：使用Scrapy抓取数据，使用ItemLoader保存多条抓取的数据的主要内容，如果未能解决你的问题，请参考以下文章

Python爬虫编程思想（152）：使用Scrapy抓取数据，使用ItemLoader保存多条抓取的数据

Python爬虫编程思想（148）：在PyCharm中使用扩展工具运行Scrapy程序

Python3分布式爬虫（scrap+redis）基础知识和实战详解

Python爬虫编程思想（12）：搭建代理与使用代理