python 爬虫学习

Posted 阿里云的奥斯卡

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 爬虫学习相关的知识,希望对你有一定的参考价值。

response = requests.get("http://www.baidu.com")

 response.content.decode("utf-8")  返回bytes类型 decode解码

 response.text    request.encoding = "gbk" # 修改编码 返回str类型 

 获取图片

# coding=utf-8
import requests
url = "http://wap.jiapai.net.cn/images/1.jpg"

response = requests.get(url)
with open("baidu.png","wb") as f:
    f.write(response.content)
                             

---

# 状态码 

response.status_code 

# 响应头

response.headers

# 请求头

response.request.headers

200
{Content-Length: 20851, Content-Type: image/jpeg, Last-Modified: Sun, 28 Jul 2019 04:29:48 GMT, Accept-Ranges: bytes, ETag: "1f3f6d17fd44d51:0", Set-Cookie: sdwaf-test-item=1ed57f5405075208510954035156575b5c5754065406040d015701515e520c; path=/; HttpOnly, X-Powered-By: SDWAF, Date: Tue, 05 May 2020 01:56:48 GMT} {User-Agent: python-requests/2.23.0, Accept-Encoding: gzip, deflate, Accept: */*, Connection: keep-alive}

 

---

# 发送带header的请求 

# coding=utf-8
import requests
url = "http://wap.jiapai.net.cn/images/1.jpg"
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"}

response = requests.get(url,headers=headers)
print(response.status_code)
print(response.headers)
print(response.request.headers)

---

# 占位符 建议使用format+ {} 代替 

input_string = input("")

url = "http://www.baidu.com/s?wd={}".format(input_string) || url = "https://www.baidu.com/s?wd=%s"%input_string 

---

 

以上是关于python 爬虫学习的主要内容,如果未能解决你的问题,请参考以下文章

python爬虫学习笔记-M3U8流视频数据爬虫

学习《从零开始学Python网络爬虫》PDF+源代码+《精通Scrapy网络爬虫》PDF

python 机器学习有用的代码片段

Python爬虫学习记录内附代码详细步骤

学习笔记:python3,代码片段(2017)

Python网络爬虫学习手记——爬虫基础