Python下载图像文件夹
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python下载图像文件夹相关的知识,希望对你有一定的参考价值。
我有Python和Scrapy的问题,我认为脚本仍在工作并将所有数据放在MongoDB上,但当他刮他仍然只在数据库中拍摄照片但我想在此结构中下载/项目/照片/链接-page / name.jpg
你有我的代码!这是Itmes.py
import scrapy
from PIL import Image
class RedditItem(scrapy.Item):
'''
Defining the storage containers for the data we
plan to scrape
'''
title = scrapy.Field()
photoLink = scrapy.Field()
这是来自setting.py
ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}
IMAGES_STORE = '/ProjectX/reddit/reddit/photos/'
这里我有scrapper.py
from scrapy.http import Request
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import HtmlResponse
from scrapy.selector import Selector
from datetime import datetime as dt
import scrapy
from reddit.items import RedditItem
from PIL import Image
def parse_following_urls(self, response):
item = RedditItem()
item['title'] = response.css('h1.kiwii-font-xlarge::text').extract_first()
item['photoLink'] = response.css("div.kiwii-carousel-picture span::attr(src)").extract()
答案
如果要存储图像,例如:{IMAGES_STORE}/link-page/name.jpg
,则需要扩展默认的ImagesPipeline类并覆盖方法file_path
。
例如:
from scrapy.pipelines.images import ImagesPipeline
class MyImagesPipeline(ImagesPipeline):
def file_path(self, request, response=None, info=None):
# Code to generate {link-page/name.jpg} value
然后将其作为管道添加到您的设置文件中,而不是默认的ImagePipeline:
ITEM_PIPELINES = {'your_project.pipelines.ImagesPipeline': 1}
以上是关于Python下载图像文件夹的主要内容,如果未能解决你的问题,请参考以下文章