深入爬虫书scrapy 之json内容没有写入文本

Posted my-ordinary

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了深入爬虫书scrapy 之json内容没有写入文本相关的知识,希望对你有一定的参考价值。

settings.py设置

技术图片
ITEM_PIPELINES = 
   tets.pipelines.TetsPipeline: 300,
View Code

spider代码

xpath后缀添加.extract() parse()返回return item

技术图片
import scrapy
from tets.items import TetsItem

class KugouSpider(scrapy.Spider):
    name = kugou
    allowed_domains = [www.kugou.com]
    start_urls = [http://www.kugou.com/]

    def parse(self, response):
        item = TetsItem()
        item[title] = response.xpath("/html/head/title/text()").extract()
        print(item[title])
        return item
View Code

piplines代码

技术图片
# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import codecs
import json

class TetsPipeline(object):
    def __init__(self):
        # self.file = codecs.open("D:/git/learn_scray/day11/mydata2.txt", "wb", encoding="utf-8")
        self.file = codecs.open("D:/git/learn_scray/day11/1.json", "wb", encoding="utf-8")

    # 处理文本(xx.txt)
    # def process_item(self, item, spider):
    #     l = str(item) + "\\n"
    #     print(l)
    #     self.file.write(l)
    #     return item
    def process_item(self, item, spider):
        print("进入")
        # print(item)
        i = json.dumps(dict(item), ensure_ascii=False)
        # print("进入json")
        # print(i)
        l = i + "\\n"
        print(l)
        self.file.write(l)
        return item

    def close_spider(self, spider):
        slef.file.close()
View Code

items.py

技术图片
# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class TetsItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title = scrapy.Field()
View Code

结果如图下

技术图片

 

 技术图片

 

技术图片

以上是关于深入爬虫书scrapy 之json内容没有写入文本的主要内容,如果未能解决你的问题,请参考以下文章

Python爬虫之Scrapy框架系列(16)——深入剖析request和response类

爬虫2.2-scrapy框架-文件写入

Python爬虫之Scrapy框架系列(12)——实战ZH小说的爬取来深入学习CrawlSpider

Python爬虫之Scrapy框架系列(12)——实战ZH小说的爬取来深入学习CrawlSpider

转载Python爬虫框架Scrapy学习笔记

scrapy框架之持久化操作