爬虫实例

Posted nxrs

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬虫实例相关的知识,希望对你有一定的参考价值。

# 爬取糗图上的图片

import
re import urllib.request import os def handler_request(url, page): url = url + str(page) + "/" headers = "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) Apple WebKit/537.36 (Khtml, like Gecko) Chrome/75.0.3770.100 Safari/537.36" request = urllib.request.Request(url, headers=headers) return request def download_image(page, html): headers = "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) Apple WebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36" pattern = re.compile(r<img src="(.*?)" alt=".*?" />) src_list = pattern.findall(html) dirs = os.path.join(os.getcwd(), "糗图") if not os.path.exists(dirs): os.makedirs(dirs) for i, src in enumerate(src_list): src = "https:" + src # print(src) file_name = os.path.join(dirs, "" + str(page) + "" + str(i) + ".jpg") print("图片%s开始下载..." % (str(page) + "" + str(i) + ".jpg")) try: request = urllib.request.Request(src, headers=headers) image = urllib.request.urlopen(request).read() except Exception as e: print("图片%s下载出错了" % (str(page) + "" + str(i) + ".jpg")) continue print("图片%s已经下载完毕" % (str(page) + "" + str(i) + ".jpg")) with open(file_name, "wb") as f: f.write(image) if __name__ == __main__: url = "https://www.qiushibaike.com/pic/page/" start_page = int(input("请输入你想要查询的起始页:")) end_page = int(input("请输入你想要查询的结束页:")) for page in range(start_page, end_page + 1): print("第%s页开始下载..." % page) request = handler_request(url, page) content = urllib.request.urlopen(request).read().decode() download_image(page, content) print("第%s页已经下载完毕" % page) print() print() # print(content)

 

以上是关于爬虫实例的主要内容,如果未能解决你的问题,请参考以下文章

Python 爬虫实例(12)—— python selenium 爬虫

Python多线程和多进程爬虫

爬虫“中国大学排名定向爬虫”实例

python爬虫实例

python爬虫实例项目大全

python爬虫实例项目大全