爬1 429

Posted zhangchen-sx

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬1 429相关的知识,希望对你有一定的参考价值。

爬 爬 爬 -- 

两个软件 
Anaconda  内置 Jupyter   编译器
Fiddler4     一个代理软件
案例1  获取整个页面   搜狗首页
import requests
url = https://www.sogou.com/  # 1指定url
res = requests.get(url=url)     # 2 请求得到相应对象
# print(res.text)
page_text = res.text        # 3 text属性返回的是字符串形式的响应数据
with open(./sg.html,w,encoding=utf-8) as f:    #4 持久化数据
    f.write(page_text)
案例二   搜狗搜索的结果页面
# 搜索结果    UA检测认证 会报错 解决办法headers请求头里加User-Agent(浏览器标识)
import requests
url = https://www.sogou.com/web
wd = input(你要搜啥:)
param = {
    query:wd
}
res = requests.get(url=url,params=param)
# print(res.encoding)  # ISO-8859-1  查看响应的编码格式
res.encoding = utf-8 # 编码格式改变
page_text = res.text
name = wd + .html
with open(name,w,encoding=utf-8) as f:
    f.write(page_text)
    print(name,爬取结束!)
案例二  更新 添加 请求头 User-Agent 键值对
import requests
url = https://www.sogou.com/web
wd = input(你要搜啥:)
headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36}
param = {
    query:wd
}
res = requests.get(url=url,params=param,headers=headers)  # 参数,请求头 UA检测  反爬机制
# print(res.encoding)  # ISO-8859-1  查看响应的编码格式
res.encoding = utf-8 # 编码格式改变
page_text = res.text
name = wd + .html
with open(name,w,encoding=utf-8) as f:
    f.write(page_text)
    print(name,爬取结束!)
案例3 
# 获取 百度翻译的结果数据 
# 页面中有可能存在动态加载的数据
import requests
url = https://fanyi.baidu.com/sug
wd = input(enter a word: )
data = {kw:wd}
headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36}
res = requests.post(url=url,data=data,headers=headers) # post 请求
obj_json = res.json()  # json
for i in obj_json[data]:
    print(i[k], ,i[v])
案例4
#
豆瓣电影详情数据 # 页面中有些情况会包含动态加载的数据 鼠标滚轮下滑 数据持续加载 import requests url = https://movie.douban.com/j/chart/top_list headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36} param = { "type": "5", "interval_id": "100:90", "action":"", "start": "0", "limit": "50", } obj_json = requests.get(url=url,params=param,headers=headers).json() # get请求 params参数 # print(obj_json) print(len(obj_json))
案例五
#
药监局 化妆品公司数据 import requests post_url = http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36} all_data = [] IDs =[] for page in range(1,3): data = { "on": "true", "page": str(page), "pageSize": "15", "productName":"", "conditionType": "1", "applyname": "", "applysn": "", } # 首页ajax 请求返回的响应数据 json_obj = requests.post(url=post_url,data=data,headers=headers).json() for dic in json_obj["list"]: IDs.append(dic[ID]) print(len(IDs)) for id in IDs: detail_post_url = http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById data = {id:id} detail_dic = requests.post(url=detail_post_url,data=data,headers=headers).json() all_data.append(detail_dic) print(all_data[0]) print(len(all_data))

 

以上是关于爬1 429的主要内容,如果未能解决你的问题,请参考以下文章

python小白学习记录 多线程爬取ts片段

scrapy主动退出爬虫的代码片段(python3)

使用 Tweepy 的 Twitter 错误代码 429

如何避免nodejs循环上的429

Telegram bot api:错误代码 429,错误:请求太多:稍后重试

我在单个请求中使用 spotify api 出现错误 429