python网络爬虫:实现百度热搜榜数据爬取
Posted abcdefgh9
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python网络爬虫:实现百度热搜榜数据爬取相关的知识,希望对你有一定的参考价值。
from bs4 import BeautifulSoup from selenium import webdriver import time import xlwt #打开网页 url="http://top.baidu.com/buzz?b=1&fr=topindex" driver = webdriver.Chrome() driver.get(url) #time.sleep(5) #获取网页信息 html=driver.page_source soup=BeautifulSoup(html,‘lxml‘) #用soup来获得所有‘tr‘标签 list=soup.find_all(‘tr‘) result=[] #将所有符合规则的‘tr‘标签里面的内容提取出来 for each in list: rank = each.find(‘span‘) key = each.find(‘a‘,{‘class‘:‘list-title‘}) point = each.find(‘td‘,{‘class‘:‘last‘}) if point !=None: point = point.find(‘span‘) if rank!=None and key!=None and point!=None : result.append([rank.string,key.string,point.string]) #新建xls对象 workbook = xlwt.Workbook(encoding = ‘utf-8‘) worksheet = workbook.add_sheet(‘Baidu Rank Data‘) worksheet.write(0,0, label = ‘rank‘) worksheet.write(0,1, label = ‘key‘) worksheet.write(0,2, label = ‘point‘) #设置列宽 col = worksheet.col(1) col.width=5000 #写入数据 i=1 for each in result: rank=str(each[0]) key=str(each[1]) point=str(each[2]) worksheet.write(i,0,rank) worksheet.write(i,1,key) worksheet.write(i,2,point) i+=1 #保存 workbook.save(r‘C:UsersmeDesktopData.xls‘) print(result) #print(len(result)) #print(len(list))
以上是关于python网络爬虫:实现百度热搜榜数据爬取的主要内容,如果未能解决你的问题,请参考以下文章