python: 爬虫利器requests
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python: 爬虫利器requests相关的知识,希望对你有一定的参考价值。
requests并不是系统自带的模块,他是第三方库,需要安装才能使用
requests库使用方式
闲话少说,来,让我们上代码:
简单的看一下效果:
import requests
requests = requests.session()
headers = {
‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0‘
}
url = "http://httpbin.org"
response = requests.get(url, headers=headers, timeout=None)
print(response.text)
print(response.cookies)
print(response.content)
print(response.content.decode("utf-8"))
print(respone.json())
基本的post请求:
data = {
"name":"zhaofan",
"age":23
}
response = requests.post("http://httpbin.org/post",data=data)
print(response.text)
对于无效的网站证书请求方法:
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get("https://www.12306.cn",verify=False)
print(response.status_code)
代理设置:
import requests
proxies= {
"http":"http://127.0.0.1:9999",
"https":"http://127.0.0.1:8888"
}
response = requests.get("https://www.baidu.com",proxies=proxies)
print(response.text)
如果代理需要设置账户名和密码,只需要将字典更改为如下:
proxies = {
"http":"http://user:[email protected]:9999"
}
如果你的代理是通过sokces这种方式则需要pip install "requests[socks]"
proxies= {
"http":"socks5://127.0.0.1:9999",
"https":"sockes5://127.0.0.1:8888"
}
超时设置
通过timeout参数可以设置超时的时间
没有超时时间,一直等待
timeout=None
异常捕捉:
import requests
from requests.exceptions import ReadTimeout,ConnectionError,RequestException
try:
response = requests.get("http://httpbin.org/get",timout=0.1)
print(response.status_code)
except ReadTimeout:
print("timeout")
except ConnectionError:
print("connection Error")
except RequestException:
print("error")
以上是关于python: 爬虫利器requests的主要内容,如果未能解决你的问题,请参考以下文章
Python学习之旅 -11-爬虫利器Requests-HTML使用方法
14_Python_爬虫利器Requests-HTML使用方法