Python学习笔记22(urllib模块)
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python学习笔记22(urllib模块)相关的知识,希望对你有一定的参考价值。
Python3和Python2的urllib模块不太一样,本篇文章是以Python3为前提。
1.urlopen的使用
import urllib.request urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) #url:需要抓取的网页 #data:Post提交的数据。默认为空,使用的是get请求,若data有数据则是Post请求 #timeout:设置网站的访问超时时间
import urllib.request response = urllib.request.urlopen(‘http://www.baidu.com‘) print(response.read().decode(‘utf-8‘)) #response.read() 获取的数据格式为bytes类型 #需要decode(),转换成str类型
#POST请求 import urllib.parse import urllib.request data = bytes(urllib.parse.urlencode({‘word‘: ‘hello‘}), encoding=‘utf8‘) response = urllib.request.urlopen(‘http://httpbin.org/post‘, data=data) print(response.read())
#超时设置 import urllib.request response = urllib.request.urlopen(‘http://httpbin.org/get‘, timeout=0.1) print(response.read())
2.Request的使用
#get请求 import urllib.request request = urllib.request.Request(‘https://python.org‘) response = urllib.request.urlopen(request) print(response.read().decode(‘utf-8‘)) #post请求 from urllib import request, parse url = ‘http://httpbin.org/post‘ headers = { ‘User-Agent‘: ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘, ‘Host‘: ‘httpbin.org‘ } dict = { ‘name‘: ‘Germey‘ } data = bytes(parse.urlencode(dict), encoding=‘utf8‘) req = request.Request(url=url, data=data, headers=headers, method=‘POST‘) response = request.urlopen(req) print(response.read().decode(‘utf-8‘))
3.代理
import urllib.request proxy_handler = urllib.request.ProxyHandler({ ‘http‘: ‘http://127.0.0.1:9743‘, ‘https‘: ‘https://127.0.0.1:9743‘ }) opener = urllib.request.build_opener(proxy_handler) response = opener.open(‘http://httpbin.org/get‘) print(response.read().decode(‘utf-8‘))
4.Cookie
#获取cookie import http.cookiejar, urllib.request cookie = http.cookiejar.CookieJar() handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler) response = opener.open(‘http://www.baidu.com‘) for item in cookie: print(item.name+"="+item.value) #获取cookie并且保存在文件中 #有两种格式,记得哪种格式存的哪种格式读就好 #格式一 import http.cookiejar, urllib.request filename = "cookie.txt" cookie = http.cookiejar.MozillaCookieJar(filename) handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler) response = opener.open(‘http://www.baidu.com‘) cookie.save(ignore_discard=True, ignore_expires=True) #格式二 import http.cookiejar, urllib.request filename = ‘cookie.txt‘ cookie = http.cookiejar.LWPCookieJar(filename) handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler) response = opener.open(‘http://www.baidu.com‘) cookie.save(ignore_discard=True, ignore_expires=True) #以格式二读cookie,并且访问url import http.cookiejar, urllib.request cookie = http.cookiejar.LWPCookieJar() cookie.load(‘cookie.txt‘, ignore_discard=True, ignore_expires=True) handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler) response = opener.open(‘http://www.baidu.com‘) print(response.read().decode(‘utf-8‘))
5.异常处理
#urllib.error有两个错误类URLError和HTTPError,HTTPError是URLError的子类,所以一般先捕捉小的错误类,再捕捉大的错误类 from urllib import request, error try: response = request.urlopen(‘http://cuiqingcai.com/index.htm‘) except error.HTTPError as e: print(e.reason, e.code, e.headers, sep=‘\n‘) except error.URLError as e: print(e.reason) else: print(‘Request Successfully‘)
以上是关于Python学习笔记22(urllib模块)的主要内容,如果未能解决你的问题,请参考以下文章
python自动化测试学习笔记-6urllib模块&request模块
Python学习笔记(四十一)— 内置模块(10)urllib
Python3学习笔记(urllib模块的使用)转http://www.cnblogs.com/Lands-ljk/p/5447127.html