python3 爬虫

Posted 2020-06-09

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python3 爬虫相关的知识，希望对你有一定的参考价值。

保存当前cookie到本地

import urllib.request as ur
import http.cookiejar as hc
url=‘http://www.xxxx.com/admin/‘
filename=‘cookie.txt‘
cookie=hc.MozillaCookieJar(filename)
handler=ur.HTTPCookieProcessor(cookie)
opener=ur.build_opener(handler)
req=ur.Request(url)
res=opener.open(req)
cookie.save(ignore_discard=True, ignore_expires=True)

加载本地cookie登录网站（先手工登录网站，通过F12获取cookie信息，修改本地cookie.txt，就可以使用下面代码登录网站了）

import urllib.request as ur
import http.cookiejar as hc
url=‘http://www.xxxx.com/admin/‘
cookie=hc.MozillaCookieJar()
cookie.load(‘cookie.txt‘,ignore_discard=True, ignore_expires=True)
handler=ur.HTTPCookieProcessor(cookie)
opener=ur.build_opener(handler)
req=ur.Request(url)
res=opener.open(req)
print(res.read().decode(‘utf8‘))

关于cookie.save和cookie.load的后面两个参数官网说明

ignore_discard: save even cookies set to be discarded.
ignore_expires: save even cookies that have expiredThe file is overwritten if it already exists

已经测试过，参数必须加上，不然运行错误

以上是关于python3 爬虫的主要内容，如果未能解决你的问题，请参考以下文章

学习笔记：python3，代码片段（2017）

爬虫代理池源代码测试-Python3WebSpider

《Python3网络爬虫实战案例（崔庆才著）》中文版PDF下载，附源代码+视频教程

python3 爬虫

《Python3网络爬虫实战案例（崔庆才著）》中文版PDF下载，附源代码+视频教程

Python3爬虫Scrapy+MongoDB+MySQL