selenium+PhantomJS小案例—爬豆瓣网所有电影代码python

Posted reyinever

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了selenium+PhantomJS小案例—爬豆瓣网所有电影代码python相关的知识,希望对你有一定的参考价值。

#coding=utf-8
from selenium import webdriver

def crawMovie():
driver=webdriver.PhantomJS()
driver.get("https://movie.douban.com/")
movie_list=[]
more_btn=driver.find_element_by_xpath(‘(//a[@class="more-link"])[1]‘)
more_btn.click()

while True:
start_index=len(movie_list)
xpath_str=‘//a[@class="item"][position()>%d]‘%start_index
item_tags=driver.find_elements_by_xpath(xpath_str)
print "start_index:",start_index
print item_tags
print "number:",len(item_tags)
for item_tag in item_tags:
img_tag=item_tag.find_element_by_tag_name(‘img‘)
cover=img_tag.get_attribute("src")
title=img_tag.get_attribute("alt")
rating=item_tag.find_element_by_xpath(".//p/strong").text

movie={‘cover‘:cover,
‘title‘:title,
‘rating‘:rating
}

movie_list.append(movie)
print "--"*20
load_more_btn=driver.find_element_by_xpath(‘//a[@class="more"]‘)
if load_more_btn.get_attribute("style"):
break
load_more_btn.click()

with open("e:\movie_list.txt","w") as fp:
for d in movie_list:
temp=""
for k in d:
temp+=k+":"+d[k]+","
fp.write("{"+temp.strip(",")+"}"+" ")

if __name__=="__main__":
crawMovie()











































以上是关于selenium+PhantomJS小案例—爬豆瓣网所有电影代码python的主要内容,如果未能解决你的问题,请参考以下文章

Python爬虫(二十二)_selenium案例:模拟登陆豆瓣

Python爬虫(二十二)_selenium案例:模拟登陆豆瓣

Selenium+PhantomJS自动化登录爬取博客文章

python+selenium+phantomjs爬百度美女图片

动态网页爬取样例(WebCollector+selenium+phantomjs)

用selenium爬动态网页