scrapy mid中间件一般处理方法

Posted zengxm

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了scrapy mid中间件一般处理方法相关的知识,希望对你有一定的参考价值。

import user_agent
import requests


class UA_midd(object):
    def process_request(self,request,spider):
        request.headers[User-Agent] = user_agent.generate_user_agent()
        referer = request.url
        if referer:
            request.headers[Referer] = referer


class Proxy_midd(object):

    def __init__(self):
        self.ip = ‘‘
        self.url = http://188.131.212.24:5010/get/
        self.count = 0

    def process_request(self, request, spider):

        if self.count == 0 or self.count >=20:
            res = requests.get(url=self.url).content.decode()
            if not no in res:
                self.ip = res
            self.count = 1

        if self.ip:
            request.meta[proxy] = http:// + self.ip
            self.count += 1
        else:
            self.count += 5


    def process_exception(self, request, exception, spider):
        if isinstance(request,TimeoutError):
            self.count += 20
            return request

单纯的处理ua和ip的功能

# 如果你是通过cookies池进行维护的,请请求不过是的cokies

# 注意在中间件中设置cookies是字典化的
import json
import requests

class cookies_mid(object):
        def __init__(self):
            slef.cookies_url = 你维护的cookies池
        
        def process_request(self,request,spider):
            request.cookies = self.get_cookies()                        
        
        def get_cookies(self):
            cookies = requests.get(self.cookies_url).content.decode()
            if cookies:
                return json.loads(cookies)
    

 cookies更换

 

 

 



有关资料 https://blog.csdn.net/sc_lilei/article/details/80702449

 

以上是关于scrapy mid中间件一般处理方法的主要内容,如果未能解决你的问题,请参考以下文章

python爬虫Scrapy框架之中间件

Scrapy-下载中间件

scrapy中间件的简介

scrapy中间件中使用selenium切换ip

爬虫学习笔记—— Scrapy框架

Scrapy详解之中间件(Middleware)