python日常—爬取豆瓣250条电影记录

Posted zxycb

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python日常—爬取豆瓣250条电影记录相关的知识,希望对你有一定的参考价值。

#  感兴趣的同仁可以相互交流哦

import
requests import lxml.html,csv doubanUrl = https://movie.douban.com/top250?start={}&filter= def getSource(doubanUrl): response = requests.get(doubanUrl) # 获取网页 response.encoding = utf-8 # 修改编码 return response.content #获取源码 def getEveryItem(source): # 获取HTML对象 selector = lxml.html.document_fromstring(source) # 提取标签所有的信息 movieItemList = selector.xpath(//div[@class="info"]) # 定义一个空列表——用于展示信息 movieList = [] for eachMovie in movieItemList: movieDict = {} # 分层提取 title = eachMovie.xpath(div[@class="hd"/a/span/[@class="title"]/text()) otherTitle = eachMovie.xpath(div[@class="hd"/a/span/[@class="other"]/text()) link = eachMovie.xpath(div[@class="hd"/a/@href)[0] star = eachMovie.xpath(div[@class="hd"/div[@class="star"]/span[@class="rating_num"]/text()) quote = eachMovie.xpath(div[@class="hd"/p[@class="quote"]/span/text()) # 保存字典信息 movieDict[title] = ‘‘.join(title+otherTitle) movieDict[url] = link movieDict[star] = star movieDict[quote] = quote movieList.append(movieDict) return movieList def writeData(movieList): with open(./Douban.csv,w,encoding=UTF-8,newline=‘‘) as f: writer = csv.DictWriter(f,fieldnames=[titlr,star,quote,url]) # 写入表头 writer.writeheader() for each in movieList: writer.writerow(each) if __name__ == main: # 共展示250条电影信息 每页25条 ,共10页 movieList = [] for i in range(10): # 获取url pageLink = doubanUrl.format(i*25) print(pageLink) # 根据地址获取资源 source = getSource(pageLink) movieList = getEveryItem(source) print(movieList[:10]) writeData(movieList)


以上是关于python日常—爬取豆瓣250条电影记录的主要内容,如果未能解决你的问题,请参考以下文章

团队-爬取豆瓣电影TOP250-需求分析

python爬虫入门爬取豆瓣电影top250

团队-爬取豆瓣电影TOP250-需求分析

python爬取豆瓣电影Top250(附完整源代码)

Python爬虫实践 四种姿势爬取豆瓣电影Top250信息!

Python爬虫实践 四种姿势爬取豆瓣电影Top250信息!