爬取糗事百科的热门段子,以及热图链接
Posted life is tough,so are you
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬取糗事百科的热门段子,以及热图链接相关的知识,希望对你有一定的参考价值。
# -*- coding:utf-8 -*- import urllib import urllib2 from bs4 import BeautifulSoup import re import os page = 1 while page<10 : url = ‘http://www.qiushibaike.com/hot/page/‘ + str(page) user_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)‘ headers = { ‘User-Agent‘ : user_agent } try: request = urllib2.Request(url,headers = headers) response = urllib2.urlopen(request) qiubai_html = response.read() #print qiubai_html soup = BeautifulSoup(qiubai_html,"html.parser") #print soup.find("a",class_="contentHerf") #print soup.find("a",class_="contenHerf").span.text file = open(‘imgsrc.txt‘,‘a‘) qiubailist = soup.find_all("a",class_="contentHerf") print ‘this is page ‘,page for x in qiubailist: print x.span.text file.write(x.span.text.encode(‘utf-8‘)+‘\r\n‘) print ‘\n‘ imgSrclist = soup.find_all("div",class_="thumb") for x in imgSrclist: file.write(x.img[‘src‘].encode(‘utf-8‘)+‘\r\n‘) file.close() print soup.find("div",class_="thumb").img[‘src‘] page = page + 1 except urllib2.URLError, e: if hasattr(e,"code"): print e.code if hasattr(e,"reason"): print e.reason
以上是关于爬取糗事百科的热门段子,以及热图链接的主要内容,如果未能解决你的问题,请参考以下文章