requests模块

Posted a438842265

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了requests模块相关的知识,希望对你有一定的参考价值。

一 requests模块

  • 概念:
    • python中原生的基于网络请求的模块,模拟浏览器进行请求发送,获取页面数据
  • 安装: pip install requests

二 requests使用的步骤

  • 1 指定url
  • 2 基于requests模块请求发送
  • 3 获取响应对象中的数据值(text)
  • 4 持久化储存

三 反反爬

  • 1 设置ip
  • 2 设置UA
import requests

word = input(请你输入你要查的词)

url = https://www.sogou.com/web?

params = {
    query: word
}

heards = {
user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/71.0.3578.98 Safari/537.36
}

response = requests.get(url=url, params=params,heards=heards,proxies={‘https‘: ‘62.103.68.8:8080‘}) ######UA  和   IP

page_tail = response.text

filename = word + .html

with open(filename, w, encoding=utf-8) as f:
    f.write(page_tail)

 

四  示例

No.1基于requests模块的get请求 

需求1:爬取搜狗首页的页面数据

import requests

# 1 指定url
url = https://www.sogou.com/
# 2 基于ruquests模块发送请求
response = requests.get(url=url)
# 3 获取响应对象的数据值
page_text = response.text
# 4 持久化存储
with open(./sogou.html,w,encoding=utf-8) as f:
    f.write(page_text)

注意: 对于上面的代码

response.content             返回二进制的页面数据
response.headers             返回响应头信息
response.status_code         返回响应200
response.url                 返回是地址
response.encoding            返回的是响应对象中存储数据的原始编码程序

需求2:爬取搜狗指定词搜索后的页面数据

import requests

word = input(请你输入你要查的词)
url = https://www.sogou.com/web

param = { query: word } response = requests.get(url=url, params=param) page_text = response.text filename = word+.html with open(filename, w, encoding=utf-8) as f: f.write(page_text)

No.2基于requests模块的post请求  

需求3:登录豆瓣电影,爬取登录成功后的页面数据

# 依照我们上面所说的步骤
import
requests url = https://www.douban.com/accounts/login data = { # 在浏览器中找 "source": "index_nav", "form_email": "xxxxxxxxx", "form_password": "xxxxxxxxx" } response = requests.post(url=url,data=data) page_text = response.text with open(douban.html, w, encoding=utf-8) as f: f.write(page_text)

需求4:

基于requests模块ajax的get请求-------爬取豆瓣电影分类排行榜 https://movie.douban.com/中的电影详情数据 
import requests

url = https://movie.douban.com/j/chart/top_list?

param = {                               #携带的数据
    type: 13,
    interval_id: 100:90,
    action: ‘‘,
    start: 20,
    limit: 20,
}

response = requests.get(url=url, params=param})
print(response.text)

需求5:基于requests模块ajax的post请求-------------------------爬取肯德基餐厅查询http://www.kfc.com.cn/kfccda/index.aspx中指定地点的餐厅数据

import requests

url =  http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword
city = input(请输入你要查的城市)
data = {
    cname: ‘‘,
    pid: ‘‘,
    keyword: city,
    pageIndex: 1,
    pageSize: 10,
}
response = requests.post(url=url, data=data)
print(response.text)

需求6:简单的爬取博客园前几页


import requests
import os

url = https://www.cnblogs.com/#p
if not os.path.exists(boke):
    os.mkdir(boke)

start_page = int(input(enter a start page:))
end_page = int(input(enter a end page:))

for page in range(start_page, end_page + 1):
    url = url + str(page)
    response = requests.get(url=url, proxies={https: 62.103.68.8:8080})
    page_text = response.text

    fileName = str(page) + .html
    filePath = ./boke/ + fileName
    with open(filePath, w, encoding=utf-8) as f:
        f.write(page_text)
        print(第%s页打印 % page)

 



以上是关于requests模块的主要内容,如果未能解决你的问题,请参考以下文章

django.core.exceptions.ImproperlyConfigured: Requested setting DEFAULT_INDEX_TABLESPACE的解决办法(转)(代码片段

如何使用模块化代码片段中的LeakCanary检测内存泄漏?

C#-WebForm-★内置对象简介★Request-获取请求对象Response相应请求对象Session全局变量(私有)Cookie全局变量(私有)Application全局公共变量Vi(代码片段

Python练习册 第 0013 题: 用 Python 写一个爬图片的程序,爬 这个链接里的日本妹子图片 :-),(http://tieba.baidu.com/p/2166231880)(代码片段

如何有条件地将 C 代码片段编译到我的 Perl 模块?

推进学说代码片段