各种爬虫管道

Posted hanjian200ok

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了各种爬虫管道相关的知识,希望对你有一定的参考价值。

from datetime import datetime
from scrapy.exporters import JsonItemExporter, CsvItemExporter
import pymongo
import redis
from .settings import REDIS_HOST, REDIS_PORT, MONGO_HOST, MONGO_PORT


# 数据源的管道
class AqiDataPipeline(object):
    def process_item(self, item, spider):
        # 记录爬取时间
        item[‘crawl_time‘] = datetime.utcnow()
        # 记录爬虫
        item[‘spider‘] = spider.name
        return item


# Json的管道
class AqiJsonPipeline(object):
    def open_spider(self, spider):
        self.file = open("aqi.json", ‘wb‘)
        self.write = JsonItemExporter(self.file)
        self.write.start_exporting()

    def process_item(self, item, spider):
        self.write.export_item(item)
        return item

    def close_spider(self, spider):
        self.write.finish_exporting()
        self.file.close()


# Csv的管道
class AqiVscPipeline(object):
    def open_spider(self, spider):
        self.file = open("aqi.csv", ‘wb‘)
        self.write = CsvItemExporter(self.file)
        self.write.start_exporting()

    def process_item(self, item, spider):
        self.write.export_item(item)
        return item

    def close_spider(self, spider):
        self.write.finish_exporting()
        self.file.close()


# mongodb数据库管道
class AqiMongoPipeline(object):
    def open_spider(self, spider):
        self.client = pymongo.MongoClient(host=MONGO_HOST, port=MONGO_PORT)
        self.db = self.client[‘Aqi‘]
        self.collection = self.db[‘aqi‘]

    def process_item(self, item, spider):
        self.collection.insert(dict(item))
        return item

    def close_spider(self, spider):
        self.client.close()


# redis数据库管道
class AqiRedisPipeline(object):
    def open_spider(self, spider):
        self.client = redis.Redis(host=REDIS_HOST, port=REDIS_PORT)

    def process_item(self, item, spider):
        self.client.lpush(‘aqi‘, dict(item))
        return item

以上是关于各种爬虫管道的主要内容,如果未能解决你的问题,请参考以下文章

技术淘来的各种爬虫框架

python3爬虫各种网站视频下载方法

爬虫遇到各种不同url怎么爬取

轻松入门Python爬虫,三个爬虫版本,带你以各种方式爬取校花网

python爬虫如何分析一个将要爬取的网站?

Scrapy爬虫requests