用python写网络爬虫 -从零开始 3 编写ID遍历爬虫
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了用python写网络爬虫 -从零开始 3 编写ID遍历爬虫相关的知识,希望对你有一定的参考价值。
我们在访问网站的时候,发现有些网页ID 是按顺序排列的数字,这个时候我们就可以使用ID遍历的方式来爬取内容。但是局限性在于有些ID数字在10位数左右,那么这样爬取效率就会很低很低!
import itertools
from common import download
def iteration():
max_errors = 5 # maximum number of consecutive download errors allowed
num_errors = 0 # current number of consecutive download errors
for page in itertools.count(1):
url = ‘http://example.webscraping.com/view/-{}‘.format(page)
html = download(url)
if html is None:
# received an error trying to download this webpage
num_errors += 1
if num_errors == max_errors:
# reached maximum amount of errors in a row so exit
break
# so assume have reached the last country ID and can stop downloading
else:
# success - can scrape the result
# ...
num_errors = 0
以上是关于用python写网络爬虫 -从零开始 3 编写ID遍历爬虫的主要内容,如果未能解决你的问题,请参考以下文章