豆瓣电影爬虫编写教程
Posted secsafe
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了豆瓣电影爬虫编写教程相关的知识,希望对你有一定的参考价值。
import requests from lxml import etree headers = ‘User-Agent‘:"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/75.0.3770.142 Safari/537.36", ‘Referer‘:"https://movie.douban.com/" url="https://movie.douban.com/cinema/nowplaying/shijiazhuang/" response = requests.get(url,headers=headers) text = response.text html = etree.HTML(text) ul = html.xpath("//ul[@class=‘lists‘]")[0] #print(etree.tostring(ul,encoding=‘utf-8‘).decode("utf-8")) lis = ul.xpath("./li") movies = [] for li in lis: #print(etree.tostring(li,encoding=‘utf-8‘).decode("utf-8")) title = li.xpath("@data-title") [0] score = li.xpath("@data-score")[0] duration = li.xpath("@data-duration")[0] region = li.xpath("@data-region")[0] director = li.xpath("@data-director")[0] actors = li.xpath("@data-actors")[0] thumbnail = li.xpath(".//img/@src")[0] movie = ‘title‘:title, ‘score‘:score, ‘duration‘:duration, ‘region‘:region, ‘director‘:director, ‘actors‘:actors, ‘thumbnail‘:thumbnail movies.append(movie) print(movies)
以上代码仅供参考学习!
以上是关于豆瓣电影爬虫编写教程的主要内容,如果未能解决你的问题,请参考以下文章