python100题目的爬取

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python100题目的爬取相关的知识,希望对你有一定的参考价值。

import requests
from bs4 import BeautifulSoup

def getHTMLText(url):
try:
r = requests.get(url)
r.raise_for_status()
r.encoding = ‘utf-8‘
return r.text
except:
return ‘‘

def fillUnivList(ulist, html):
soup = BeautifulSoup(html, ‘html.parser‘)
meta = soup.find_all(‘meta‘, attrs={‘name‘: ‘description‘})
ulist.append(meta[0].attrs[‘content‘])


def main():
start_url = ‘http://www.runoob.com/python/python-exercise-example‘
uinfo = []
for i in range(101):
url = start_url + str(i) +‘.html‘
try:
html = getHTMLText(url)
fillUnivList(uinfo, html)
except:
continue
for i in range(101):
try:
with open(‘100.txt‘, ‘a‘) as f:
f.write(uinfo[i] + ‘\n‘)
except:
continue

print(uinfo)

main()





































以上是关于python100题目的爬取的主要内容,如果未能解决你的问题,请参考以下文章

python爬虫入门 之 移动端数据的爬取

python 爬取世纪佳缘,经过js渲染过的网页的爬取

Python知乎热门话题数据的爬取实战

Python爬虫实例:爬取B站《工作细胞》短评——异步加载信息的爬取

python解决m3u8直播视频的爬取

请教一个问题,怎么提高 python 爬虫的爬取效率