使用webdrivers呈现javascript

Posted 2021-02-24

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了使用webdrivers呈现javascript相关的知识，希望对你有一定的参考价值。

# This is a piece of code that use webdrivers to load&render a page with Scrapy and Selenium.
# 
# This work is based on the snippets [wynbennett](http://snippets.scrapy.org/users/wynbennett/) [posted here](http://snippets.scrapy.org/snippets/21/) some time ago
 
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import htmlXPathSelector
from scrapy.http import Request
from myItem.items import myItem
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
 
import time
import pprint
 
class WebDriverSpider(CrawlSpider):
    name = "WebDriverSpider"
    start_urls = ["http://yourDomain.com/yourUrl.html"]
 
    rules = (
        Rule(SgmlLinkExtractor(allow=('.html', ), allow_domains=('yourDomain.com', )), callback='parse_page',follow=False),
        )
 
    def __init__(self):
        CrawlSpider.__init__(self)
        self.verificationErrors = []
        #create a profile with specific add-ons
        #and do this. Firefox to load it
        profile = FirefoxProfile(profile_directory="/home/yourUser/.mozilla/firefox/selenium/")
        self.selenium = webdriver.Firefox(profile)
 
    def __del__(self):
        self.selenium.quit()
        print self.verificationErrors
        CrawlSpider.__del__(self)
 
    def parse_page(self, response):
        #normal scrapy result
        hxs = HtmlXPathSelector(response)
        #webdriver rendered page
        sel = self.selenium
        sel.get(response.url)
 
        if sel:
            #Wait for javascript to load in Selenium                                                                                       
            time.sleep(2.5)
 
        #Do some crawling of javascript created content with Selenium                                                                      
        item = myItem()
        item['url'] = response.url
        item['title'] = hxs.select('//title/text()').extract()
 
 
        #something u can do only with webdrivers
        item['thatDiv'] = sel.find_element_by_id("thatDiv")
 
# Snippet imported from snippets.scrapy.org (which no longer works)
# author: rollsappletree
# date  : Aug 25, 2011

以上是关于使用webdrivers呈现javascript的主要内容，如果未能解决你的问题，请参考以下文章

不希望图像加载和 CSS 在 Selenium WebDriver 中的 Firefox 上呈现 - Python

JavascriptJs动态生成checkbox(使用Json数据)

JavaScriptJS知识点复习

Python 3.5 - 如何对javascript呈现的页面进行webscraping

JavaScriptjs02

JavaScriptJS跨域