Scrapy-requests模块
Posted benchdog
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Scrapy-requests模块相关的知识,希望对你有一定的参考价值。
1. requests
pip3 install requests
response = requests.get(‘http://www.autohome.com.cn/news/‘)
response.text
总结:
response = requests.get(‘URL‘)
response.text
response.content
response.encoding
response.aparent_encoding
response.status_code
response.cookies.get_dict()
requests.get(‘http://www.autohome.com.cn/news/‘,cookie={‘xx‘:‘xxx‘})
2. beautisoup模块
pip3 install beautifulsoup4
from bs4 import BeautiSoup
soup = BeautiSoup(response.text,features=‘html.parser‘)
target = soup.find(id=‘auto-channel-lazyload-article‘)
print(target)
总结:
soup = beautifulsoup(‘<html>...</html>‘,features=‘html.parser‘)
v1 = soup.find(‘div‘)
v1 = soup.find(id=‘i1‘)
v1 = soup.find(‘div‘,id=‘i1‘)
v2 = soup.find_all(‘div‘)
v2 = soup.find_all(id=‘i1‘)
v2 = soup.find_all(‘div‘,id=‘i1‘)
obj = v1
obj = v2[0]
obj.text
obj.attrs
模块详细使用
requests
- 方法关系
requests.get(.....)
requests.post(.....)
requests.put(.....)
requests.delete(.....)
...
requests.request(‘POST‘...)
- 参数
request.request
- method: 提交方式
- url: 提交地址
- params: 在URL中传递的参数,GET
requests.request(
method=‘GET‘,
url= ‘http://www.oldboyedu.com‘,
params = {‘k1‘:‘v1‘,‘k2‘:‘v2‘}
)
# http://www.oldboyedu.com?k1=v1&k2=v2
- data: 在请求体里传递的数据
requests.request(
method=‘POST‘,
url= ‘http://www.oldboyedu.com‘,
params = {‘k1‘:‘v1‘,‘k2‘:‘v2‘},
data = {‘use‘:‘alex‘,‘pwd‘: ‘123‘,‘x‘:[11,2,3]}
)
请求头:
content-type: application/url-form-encod.....
请求体:
use=alex&pwd=123
- json 在请求体里传递的数据
requests.request(
method=‘POST‘,
url= ‘http://www.oldboyedu.com‘,
params = {‘k1‘:‘v1‘,‘k2‘:‘v2‘},
json = {‘use‘:‘alex‘,‘pwd‘: ‘123‘}
)
请求头:
content-type: application/json
请求体:
"{‘use‘:‘alex‘,‘pwd‘: ‘123‘}"
PS: 字典中嵌套字典时使用
- headers 请求头
requests.request(
method=‘POST‘,
url= ‘http://www.oldboyedu.com‘,
params = {‘k1‘:‘v1‘,‘k2‘:‘v2‘},
json = {‘use‘:‘alex‘,‘pwd‘: ‘123‘},
headers={
‘Referer‘: ‘http://dig.chouti.com/‘,
‘User-Agent‘: "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"
}
)
- cookies Cookies
- files 上传文件
requests.post(
url=‘xxx‘,
files={
‘f1‘: open(‘s1.py‘,‘rb‘),
‘f2‘: (‘ssssss1.py‘,open(‘s1.py‘,‘rb‘))
}
)
- auth 基本认知(headers中加入加密的用户名和密码)
- timeout 请求和响应的超市时间
- allow_redirects 是否允许重定向
- proxies 代理
import requests
post_dict = {
"phone": ‘username‘,
‘password‘: ‘pwd‘,
‘oneMonth‘: 1
}
response = requests.post(
url="http://dig.chouti.com/login",
data=post_dict,
proxys={
‘http‘: "http://4.19.128.5:8099"
}
)
- verify 是否忽略证书
- cert 证书文件
requests.get(
url=‘https:www.12306.cn‘,
# verify=False #忽略证书,不做验证
# cert=‘asd.pem‘ #或cert=(‘asd.crt‘,‘fgh.key‘)
)
- stream 请求内容大于本地内存
from contextlib import closing
with closing(requests.get(‘http://baidu.com/123.mkv‘,stream=‘True‘)) as fr:
for i in fr.iter_content():
print(i)
- session 用于保存客户端历史访问信息
import requests session = requests.Session() ### 1、首先登陆任何页面,获取cookie i1 = session.get(url="http://dig.chouti.com/help/service") ### 2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权 i2 = session.post( url="http://dig.chouti.com/login", data={ ‘phone‘: "username", ‘password‘: "pwd", ‘oneMonth‘: "" } ) i3 = session.post( url="http://dig.chouti.com/link/vote?linksId=11837086", ) print(i3.text)
以上是关于Scrapy-requests模块的主要内容,如果未能解决你的问题,请参考以下文章
CTS测试CtsWindowManagerDeviceTestCases模块的testShowWhenLockedImeActivityAndShowSoftInput测试fail项解决方法(代码片段