Python模块-requests
Posted Sch01aR#
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python模块-requests相关的知识,希望对你有一定的参考价值。
requests不是python自带的,使用前需要安装
-
发送请求
HTTP请求类型有GET,POST,PUT,DELETE,HEAD和OPTIONS
使用requests发送请求的方法如下:
>>> import requests
>>> r = requests.get("http://httpbin.org/get") #发送GET请求
>>> r = requests.post("http://httpbin.org/post") #发送POST请求
>>> r = requests.put("http://httpbin.org/put") #发送PUT请求
>>> r = requests.delete("http://httpbin.org/delete") #发送DELETE请求
>>> r = requests.head("http://httpbin.org/get") #发送HEAD请求
>>> r = requests.options("http://httpbin.org/get") #发送OPTIONS请求
-
传递URL参数
params参数会对传入的参数进行拼接处理
通常使用params传的参数为字典的格式
>>> import requests >>> payload = {"word":"test","page":11} >>> r = requests.get("http://httpbin.org/get", params=payload) >>> print(r.url) #打印r里的url参数的值 http://httpbin.org/get?word=test&page=11
字典里的值还可以是列表
>>> payload = {"word":"test","page":[1,2,3]} >>> r = requests.get("http://httpbin.org/get", params=payload) >>> print(r.url) http://httpbin.org/get?word=test&page=1&page=2&page=3
字典中的值为None的键将不会被传参数到url里
>>> payload = {"word":"test","page":None} >>> r = requests.get("http://httpbin.org/get", params=payload) >>> print(r.url) http://httpbin.org/get?word=test
params传的参数也可以直接是字符串
>>> payload = "word=test&page=11" >>> r = requests.get("http://httpbin.org/get", params=payload) >>> print(r.url) http://httpbin.org/get?word=test&page=11
-
响应内容
requests能读取服务器响应的内容
>>> r = requests.get("https://www.cnblogs.com/") >>> r.text #获取网页源代码 \'\'\'此处为网页源代码\'\'\' >>> r.encoding #查看网页源代码的编码 \'utf-8\' >>> r.encoding = \'GBK\' #把网页源码的编码改为gbk >>> r.encoding #再调用的时候,发现网页编码变成了gbk了 \'GBK\'
-
二进制响应内容
对于非文本请求,requests也能用字节的方式来访问请求响应体
>>> r = requests.get("http://p1.ifengimg.com/a/2018_06/75880eeacd0823d_size11_w230_h152.jpg") >>> r.content \'\'\'此处为bytes类型的图片内容\'\'\' >>> r.text \'\'\'一堆乱码\'\'\'
该方式也能用于文本请求,不过返回的结果为bytes类型
>>> r = requests.get("https://www.cnblogs.com/") >>> r.text \'\'\'此处为文本类型的网页源代码\'\'\' >>> r.content \'\'\'此处为bytes类型的网页源代码\'\'\'
-
JSON响应内容
requests中也有一个内置的json解码器,帮助我们处理json数据
>>> import requests >>> r = requests.get("https://github.com/timeline.json") >>> r.json() {\'message\': \'Hello there, wayfaring stranger. If you’re reading this then you probably didn’t see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead.\', \'documentation_url\': \'https://developer.github.com/v3/activity/events/#list-public-events\'} >>> r.status_code 410 >>> r.raise_for_status <bound method Response.raise_for_status of <Response [410]>>
如果json数据解码失败,就会抛出一个ValueError: No JSON object could be decoded的异常
但是成功调用r.json()也不能说明响应成功,有的服务器会在失败的响应中包含一个json对象,如HTTP 500的错误细节,这种json也会被解码返回
所以要检查请求是否成功,可以使用r.status_code和r.raise_for_status来检查
-
原始响应内容
requests获取来自服务器的原始套接字响应
>>> import requests >>> r = requests.get("http://httpbin.org/get", stream=True) >>> r.raw <urllib3.response.HTTPResponse object at 0x000001B93F230518> >>> r.raw.read(300) b\', \\n "Accept-Encoding": "gzip, deflate", \\n "Connection": "close", \\n "Host": "httpbin.org", \\n "User-Agent": "python-requests/2.18.4"\\n }, \\n "origin": "110.90.39.155", \\n "url": "http://httpbin.org/get"\\n}\\n\'
要在初始请求中设置stream=True,然后用r.raw,可以使用r.raw.read()对内容进行读取
-
定制请求头
HTTP请求头为字典格式
>>> headers = {"user-agent":"Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"} >>> r = requests.get("http://httpbin.org/get",headers=headers)
定制的请求头的优先级低于某些特定的信息源,例如:
- 如果在 .netrc 中设置了用户认证信息,使用请求头设置的授权就不会生效,而如果设置了 auth= 参数,.netrc 的设置就无效了
- 如果被重定向到别的主机,授权的请求头就会被删除
- 代理授权请求头会被URL中提供的代理身份覆盖掉
- 在我们能判断内容长度的情况下,请求头的Content-Length会被改写
requests不会因为定制的请求头的具体情况改变自己的行为
只不过会在最后的请求中,所有的请求头信息都会被传递进去
所有的请求头值必须是 string、bytestring 或者 unicode
尽管传递 unicode header 也是允许的,但不建议这样做
-
POST请求
想要给网站发送post数据,例如登陆某网站的时候,可以用requests发送post请求并发送数据
要发送的数据可以传data参数,然后使用post请求进行发送
发送的数据常为字典
>>> payload = "test" #post的数据为字符串 >>> r = requests.post("http://httpbin.org/post", data=payload) >>> print(r.text) { "args": {}, "data": "test", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "4", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "json": null, "origin": "110.90.39.155", "url": "http://httpbin.org/post" } >>> payload = {\'username\':\'test\',\'password\':\'test1234\'} #post的数据为字典 >>> r = requests.post("http://httpbin.org/post", data=payload) >>> print(r.text) { "args": {}, "data": "", "files": {}, "form": { "password": "test1234", "username": "test" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "31", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "json": null, "origin": "110.90.39.155", "url": "http://httpbin.org/post" } >>> payload = {\'username\':[\'test\',\'test123\'],\'password\':\'test1234\'} #post的数据为字典和列表 >>> r = requests.post("http://httpbin.org/post", data=payload) >>> print(r.text) { "args": {}, "data": "", "files": {}, "form": { "password": "test1234", "username": [ "test", "test123" ] }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "48", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "json": null, "origin": "110.90.39.155", "url": "http://httpbin.org/post" }
-
POST请求发送文件
post请求不仅可以发送数据,也可以发送二进制文件,参数为file=二进制文件
>>> import requests >>> files = {\'file\': open(\'python.txt\', \'rb\')} #以二进制打开 >>> r = requests.post(\'http://httpbin.org/post\', files=files) >>> print(r.text) { "args": {}, "data": "", "files": { "file": "Python\\n" #文件的内容 }, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "153", "Content-Type": "multipart/form-data; boundary=03080f2f96834a78b2d509d2741ff17a", "Host": "httpbin.org", "User-Agent": "python-requests/2.9.1" }, "json": null, "origin": "110.90.39.155", "url": "http://httpbin.org/post" }
-
响应状态码
可以检测响应状态码
>>> r = requests.get(\'http://httpbin.org/get\') >>> r.status_code 200 >>> r.status_code == requests.codes.ok #判断状态码是否为200 True
如果发送了一个错误的请求(4XX客户端错误,5XX服务器错误响应),我们可以使用raise_for_status()来抛出异常
>>> r = requests.get(\'http://httpbin.org/status/404\') >>> r.status_code 404 >>> r.raise_for_status <bound method Response.raise_for_status of <Response [404]>> >>> r.raise_for_status() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\\Users\\hp\\AppData\\Roaming\\Python\\Python36\\site-packages\\requests\\models.py", line 935, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: http://httpbin.org/status/404
如果状态码为200,raise_for_status()返回的None
-
响应头
获取一个请求的响应头
>>> r = requests.get(\'http://httpbin.org/get\') >>> r.headers { \'Connection\': \'keep-alive\', \'Server\': \'meinheld/0.6.1\', \'Date\': \'Sun, 04 Feb 2018 10:27:03 GMT\', \'Content-Type\': \'application/json\', \'Access-Control-Allow-Origin\': \'*\', \'Access-Control-Allow-Credentials\': \'true\', \'X-Powered-By\': \'Flask\', \'X-Processed-Time\': \'0.000623941421509\', \'Content-Length\': \'266\', \'Via\': \'1.1 vegur\' }
获取请求头中特定的一些内容,如Content-Type和X-Powered-By
>>> r = requests.get(\'http://httpbin.org/get\') >>> r.headers.get("Content-Type") \'application/json\' >>> r.headers["Content-Type"] \'application/json\' >>> r.headers.get("X-Powered-By") \'Flask\' >>> r.headers["X-Powered-By"] \'Flask\'
就是根据字典的键获取对应的值
-
COOKIE
如果响应中包含cookie,我们可以快速地访问他们
>>> r = requests.get("http://httpbin.org/get") >>> r.cookies[\'example_cookie_name\'] \'example_cookie_value\'
如果要发送cookies给网站,可以使用cookies参数
>>> cookies = {\'uesrname\':\'test\',\'password\':\'test1234\'} >>> r = requests.get(\'http://httpbin.org/cookies\',cookies=cookies) >>> print(r.text) { "cookies": { "password": "test1234", "uesrname": "test" } }
cookie返回的对象为RequestsCookieJar,它的行为和字典类似,但界面更为完整,适合跨域名跨路径使用。还可以把 Cookie Jar 传到 Requests 中
>>> jar = requests.cookies.RequestsCookieJar() >>> jar.set(\'tasty_cookie\', \'yum\', domain=\'httpbin.org\', path=\'/cookies\') >>> jar.set(\'gross_cookie\', \'blech\', domain=\'httpbin.org\', path=\'/elsewhere\') >>> url = \'http://httpbin.org/cookies\' >>> r = requests.get(url, cookies=jar) >>> r.text \'{"cookies": {"tasty_cookie": "yum"}}\'
-
重定向与请求历史
默认情况下,除了HEAD请求,requests会处理所有的重定向请求
可以使用响应对象的history方法来追踪重定向
>>> r = requests.get(\'https://www.baidu.com/test.php\') >>> r.status_code 200 >>> r.url \'http://www.baidu.com/forbiddenip/forbidden.html\' >>> r.history [<Response [302]>]
如果使用的请求方式为GET,POST,PUT,OPTIONS,PATCH,DELETE时,可以通过allow_redirects参数禁用重定向处理
>>> r = requests.get(\'https://www.baidu.com/test.php\',allow_redirects=False) >>> r.status_code 302 >>> r.url \'https://www.baidu.com/test.php\' >>> r.history []
如果HEAD请求方法需要重定向,也可以通过allow_redirects参数来进行重定向
-
超时
requests会在time参数设置的秒数过后停止等待响应
如果不使用,程序可能会永远失去响应
>>> requests.get(\'https://www.baidu.com\', timeout=0.01) \'\'\'省略一大堆\'\'\' requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host=\'www.baidu.com\', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x000001B93F2C3D68>, \'Connection to www.baidu.com timed out. (connect timeout=0.01)\'))
timeout也能对读取时间进行设置
>>> requests.get(\'https://www.baidu.com\', timeout=(5,1))
响应的时间为5秒,读取的时间为1秒
-
错误与异常
遇到网络问题(如:DNS 查询失败、拒绝连接等)时,Requests 会抛出一个 ConnectionError 异常
如果 HTTP 请求返回了不成功的状态码, Response.raise_for_status() 会抛出一个 HTTPError 异常
若请求超时,则抛出一个 Timeout 异常
若请求超过了设定的最大重定向次数,则会抛出一个 TooManyRedirects 异常
所有Requests显式抛出的异常都继承自 requests.exceptions.RequestException
以上是关于Python模块-requests的主要内容,如果未能解决你的问题,请参考以下文章
入门学Python一定要知道的requests模块安装及使用
django.core.exceptions.ImproperlyConfigured: Requested setting DEFAULT_INDEX_TABLESPACE的解决办法(转)(代码片段
django.core.exceptions.ImproperlyConfigured: Requested setting DEFAULT_INDEX_TABLESPACE的解决办法(转)(代码片段