requests和BeautifulSoup

Posted hbxzj

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了requests和BeautifulSoup相关的知识,希望对你有一定的参考价值。

转自https://www.cnblogs.com/wupeiqi/articles/6283017.html

一.requests

Python标准库中提供了:urllib、urllib2、httplib等模块以供Http请求,但是,它的 API 太渣了。它是为另一个时代、另一个互联网所创建的。它需要巨量的工作,甚至包括各种方法覆盖,来完成最简单的任务。

Requests 是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装,从而使得Pythoner进行网络请求时,变得美好了许多,使用Requests可以轻而易举的完成浏览器可有的任何操作。

1、GET请求

#<1>、GET请求
#1、无参数实例
ret=requests.get("https://github.com/timeline.json")

print (ret.url)
#输出:https://github.com/timeline.json
print (ret.text)
#输出:{"message":"Hello there, wayfaring stranger. If you’re reading this then you probably didn’t see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead.","documentation_url":"https://developer.github.com/v3/activity/events/#list-public-events"}

# 2、有参数实例
payload = {key1: value1, key2: value2}
ret = requests.get("http://httpbin.org/get", params=payload)

print(ret.url)
#输出:http://httpbin.org/get?key1=value1&key2=value2
print(ret.text)
#输出:{"args":{"key1":"value1","key2":"value2"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Host":"httpbin.org","User-Agent":"python-requests/2.18.4"},"origin":"182.48.111.194","url":"http://httpbin.org/get?key1=value1&key2=value2"}


#<2>、POST请求
#1、基本POST实例
payload={"key1":"value1","key2":"value2"}
ret=requests.post("http://httpbin.org/post",data=payload)

print (ret.text)
#输出:{"args":{},"data":"","files":{},"form":{"key1":"value1","key2":"value2"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"close","Content-Length":"23","Content-Type":"application/x-www-form-urlencoded","Host":"httpbin.org","User-Agent":"python-requests/2.18.4"},"json":null,"origin":"182.48.111.194","url":"http://httpbin.org/post"}


#2、POST请求
import json

url=https://api.github.com/some/endpoint
payload={"some":"data"}
headers={content-type: application/json}

ret=requests.post(url,data=json.dumps(payload),headers=headers)

print (ret.text)
# 输出:{"message":"Not Found","documentation_url":"https://developer.github.com/v3"}
print (ret.cookies)
#输出:<RequestsCookieJar[]>

#3、其他请求
requests.get(url, params=None, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.head(url, **kwargs)
requests.delete(url, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.options(url, **kwargs)

# 以上方法均是在此方法的基础上构建
requests.request(method, url, **kwargs)

 

以上是关于requests和BeautifulSoup的主要内容,如果未能解决你的问题,请参考以下文章

python 爬虫 requests+BeautifulSoup 爬取简单网页代码示例

Python的基本Web Scraping(Beautifulsoup和Requests)

python爬虫:使用urllib.request和BeautifulSoup抓取新浪新闻标题链接和主要内容

python3 requests + BeautifulSoup 爬取阳光网投诉贴详情实例代码

python 爬虫 requests+BeautifulSoup 爬取巨潮资讯公司概况代码实例

用requests库和BeautifulSoup4库爬取新闻列表