pyppeteer爬虫例子
Posted c-x-a
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pyppeteer爬虫例子相关的知识,希望对你有一定的参考价值。
import asyncio
import pyppeteer
from collections import namedtuple
Response = namedtuple("rs", "title url html cookies headers history status")
async def get_html(url, timeout=30):
# 默认30s
browser = await pyppeteer.launch(headless=True, args=[‘--no-sandbox‘])
page = await browser.newPage()
res = await page.goto(url, options={‘timeout‘: int(timeout * 1000)})
data = await page.content()
title = await page.title()
resp_cookies = await page.cookies()
resp_headers = res.headers
resp_history = None
resp_status = res.status
response = Response(title=title, url=url,
html=data,
cookies=resp_cookies,
headers=resp_headers,
history=resp_history,
status=resp_status)
return response
if __name__ == ‘__main__‘:
url_list = ["http://www.10086.cn/index/tj/index_220_220.html", "http://www.10010.com/net5/011/",
"http://python.jobbole.com/87541/"]
task = (get_html(url) for url in url_list)
loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*task))
for res in results:
print(res.title)
以上是关于pyppeteer爬虫例子的主要内容,如果未能解决你的问题,请参考以下文章
asyncio多进程+pyppeteer浏览器控制+pyquery解析实现爬虫demo