scrapy | downloader middleware

Posted 404noofound

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了scrapy | downloader middleware相关的知识,希望对你有一定的参考价值。

1.User-Agent

scrapy默认的由UserAgentMiddleware设置为  "User-Agent": "Scrapy/1.5.1 (+https://scrapy.org)"

一、可以在setting中设置USER-AGENT设置

1 USER_AGENT=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/39.0.2171.71 Safari/537.36

二、自定义随机user-agent 设置完成后在setting中解放

 1 class RandomMiddlewares(object):
 2     def __init__(self):
 3         self.user_agent=[Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11,
 4                          Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.133 Safari/534.16,
 5                          Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36,
 6                          Mozilla/5.0 (compatible; Baiduspider/2.0; - +http://www.baidu.com/search/spider.html),
 7                          Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html),]
 8 
 9     def process_request(self,request,spider):
10         request.headers[User-Agent]=choice(self.user_agent)

 

以上是关于scrapy | downloader middleware的主要内容,如果未能解决你的问题,请参考以下文章

爬虫框架Scrapy之Downloader Middlewares

爬虫日记(88):Scrapy的Downloader类

scrapy之 downloader middleware

爬虫日记(88):Scrapy的Downloader类

python爬虫人门Scrapy框架之Downloader Middlewares

二十一:scrapy中设置下载延时与自动限速