Scrapy框架中结合splash 解析js ——环境配置

Posted 那时的吻狠陶醉

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Scrapy框架中结合splash 解析js ——环境配置相关的知识,希望对你有一定的参考价值。

环境配置:

http://splash.readthedocs.io/en/stable/install.html

 

pip install scrapy-splash
 

 service docker start

 

docker pull scrapinghub/splash

 

docker run -p 8050:8050 scrapinghub/splash

 

----

 

settings.py

#--
SPLASH_URL = \'http://localhost:8050\'
#--
DOWNLOADER_MIDDLEWARES = {
\'scrapy_splash.SplashCookiesMiddleware\': 723,
\'scrapy_splash.SplashMiddleware\': 725,
\'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware\': 810,
}
#--
SPIDER_MIDDLEWARES = {
\'scrapy_splash.SplashDeduplicateArgsMiddleware\': 100,
}
#--
DUPEFILTER_CLASS = \'scrapy_splash.SplashAwareDupeFilter\'
#--
HTTPCACHE_STORAGE = \'scrapy_splash.SplashAwareFSCacheStorage\'
import scrapy
from scrapy_splash import SplashRequest

class MySpider(scrapy.Spider):
    start_urls = ["http://example.com", "http://example.com/foo"]

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url, self.parse, args={\'wait\': 0.5})

    def parse(self, response):
        # response.body is a result of render.html call; it
        # contains HTML processed by a browser.
        # ...       
参考链接: https://germey.gitbooks.io/python3webspider/content/7.2-Splash%E7%9A%84%E4%BD%BF%E7%94%A8.html
      http://blog.csdn.net/qq_23849183/article/details/51287935
      http://ae.yyuap.com/pages/viewpage.action?pageId=919763

  

以上是关于Scrapy框架中结合splash 解析js ——环境配置的主要内容,如果未能解决你的问题,请参考以下文章

爬虫进阶-JS自动渲染Scrapy_splash组件的使用

利用scrapy-splash爬取JS生成的动态页面

有人有scrapy-splash或者scrapyjs的使用经验么

使用scrapy+splash+Lua滚动爬取CSDN

Splash对接Scrapy

scrapy飞溅渲染js页面的问题