requests和BeautifulSoup
Posted hbxzj
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了requests和BeautifulSoup相关的知识,希望对你有一定的参考价值。
转自https://www.cnblogs.com/wupeiqi/articles/6283017.html
一.requests
Python标准库中提供了:urllib、urllib2、httplib等模块以供Http请求,但是,它的 API 太渣了。它是为另一个时代、另一个互联网所创建的。它需要巨量的工作,甚至包括各种方法覆盖,来完成最简单的任务。
Requests 是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装,从而使得Pythoner进行网络请求时,变得美好了许多,使用Requests可以轻而易举的完成浏览器可有的任何操作。
1、GET请求
#<1>、GET请求 #1、无参数实例 ret=requests.get("https://github.com/timeline.json") print (ret.url) #输出:https://github.com/timeline.json print (ret.text) #输出:{"message":"Hello there, wayfaring stranger. If you’re reading this then you probably didn’t see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead.","documentation_url":"https://developer.github.com/v3/activity/events/#list-public-events"} # 2、有参数实例 payload = {‘key1‘: ‘value1‘, ‘key2‘: ‘value2‘} ret = requests.get("http://httpbin.org/get", params=payload) print(ret.url) #输出:http://httpbin.org/get?key1=value1&key2=value2 print(ret.text) #输出:{"args":{"key1":"value1","key2":"value2"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Host":"httpbin.org","User-Agent":"python-requests/2.18.4"},"origin":"182.48.111.194","url":"http://httpbin.org/get?key1=value1&key2=value2"} #<2>、POST请求 #1、基本POST实例 payload={"key1":"value1","key2":"value2"} ret=requests.post("http://httpbin.org/post",data=payload) print (ret.text) #输出:{"args":{},"data":"","files":{},"form":{"key1":"value1","key2":"value2"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Content-Length":"23","Content-Type":"application/x-www-form-urlencoded","Host":"httpbin.org","User-Agent":"python-requests/2.18.4"},"json":null,"origin":"182.48.111.194","url":"http://httpbin.org/post"} #2、POST请求 import json url=‘https://api.github.com/some/endpoint‘ payload={"some":"data"} headers={‘content-type‘: ‘application/json‘} ret=requests.post(url,data=json.dumps(payload),headers=headers) print (ret.text) # 输出:{"message":"Not Found","documentation_url":"https://developer.github.com/v3"} print (ret.cookies) #输出:<RequestsCookieJar[]> #3、其他请求 requests.get(url, params=None, **kwargs) requests.post(url, data=None, json=None, **kwargs) requests.put(url, data=None, **kwargs) requests.head(url, **kwargs) requests.delete(url, **kwargs) requests.patch(url, data=None, **kwargs) requests.options(url, **kwargs) # 以上方法均是在此方法的基础上构建 requests.request(method, url, **kwargs)
以上是关于requests和BeautifulSoup的主要内容,如果未能解决你的问题,请参考以下文章
python 爬虫 requests+BeautifulSoup 爬取简单网页代码示例
Python的基本Web Scraping(Beautifulsoup和Requests)
python爬虫:使用urllib.request和BeautifulSoup抓取新浪新闻标题链接和主要内容
python3 requests + BeautifulSoup 爬取阳光网投诉贴详情实例代码