Scrapy框架中的 UA伪装

Posted duanhaoxin

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Scrapy框架中的 UA伪装相关的知识,希望对你有一定的参考价值。

例如:百度输入ip查看是自己本机的ip,通过UA伪装成其他机器的ip,

爬虫代码:

技术分享图片
 1 import scrapy
 2 
 3 
 4 class UatestSpider(scrapy.Spider):
 5     name = UATest
 6     # allowed_domains = [www.xxx.com]
 7     start_urls = [https://www.baidu.com/s?wd=ip]
 8     def parse(self, response):
 9         with open(./ip.html,w,encoding=utf-8)as fp:
10             fp.write(response.text)
11             print(over!!!)
爬虫代码

Middlewares中间件代码:

技术分享图片
 1 from scrapy import signals
 2 from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware
 3 import  random
 4 user_agent_list = [
 5         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 "
 6         "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
 7         "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 "
 8         "(KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
 9         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 "
10         "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
11         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 "
12         "(KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
13         "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 "
14         "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
15         "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 "
16         "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
17         "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 "
18         "(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
19         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
20         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
21         "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 "
22         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
23         "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 "
24         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
25         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
26         "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
27         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
28         "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
29         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
30         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
31         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
32         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
33         "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 "
34         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
35         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
36         "(KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
37         "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 "
38         "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
39         "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 "
40         "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
41 ]
42 
43 class UAPool(UserAgentMiddleware):
44     def process_request(self,request,spider):
45         ua=random.choice(user_agent_list)
46         request.headers[User-Agent]=ua
47         print(request.headers[User-Agent])
48 
49 proxy_http = [125.27.10.150:56292,114.34.168.157:46160]
50 proxy_https = [1.20.101.81:35454,113.78.254.156:9000]
51 class UapoolDownloaderMiddleware(object):
52     #request参数就是拦截到的 请求对象
53     #spider就是爬虫对象
54     def process_request(self, request, spider):
55         if request.url.split(:)[0]==https:
56             request.meta[proxy]=https://+random.choice(proxy_https)
57         else:
58             request.meta[proxy] = http:// + random.choice(proxy_http)
59         print(request.meta[proxy])
60         return None
middlewares

注:setting需要解开中间件,并添加自己写的中间件类

以上是关于Scrapy框架中的 UA伪装的主要内容,如果未能解决你的问题,请参考以下文章

UA池和代理池

Scrapy框架--中间件及Selenium应用

UA池和IP代理池使用

如何在中间件设置UA池,代理池

scrapy框架的中间件

scrapy中间件和selenium在scrapy中的使用