Python下载图像文件夹

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python下载图像文件夹相关的知识,希望对你有一定的参考价值。

我有Python和Scrapy的问题,我认为脚本仍在工作并将所有数据放在MongoDB上,但当他刮他仍然只在数据库中拍摄照片但我想在此结构中下载/项目/照片/链接-page / name.jpg

你有我的代码!这是Itmes.py

 import scrapy
from PIL import Image
class RedditItem(scrapy.Item):
    '''
    Defining the storage containers for the data we
    plan to scrape
    '''

    title = scrapy.Field()
    photoLink = scrapy.Field()

这是来自setting.py

ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}
IMAGES_STORE = '/ProjectX/reddit/reddit/photos/'

这里我有scrapper.py

    from scrapy.http import Request
    from scrapy.selector import HtmlXPathSelector
    from scrapy.contrib.spiders import CrawlSpider
    from scrapy.http import HtmlResponse
    from scrapy.selector import Selector
    from datetime import datetime as dt
    import scrapy
    from reddit.items import RedditItem
    from PIL import Image
def parse_following_urls(self, response):
        item = RedditItem()
        item['title'] = response.css('h1.kiwii-font-xlarge::text').extract_first()
        item['photoLink'] = response.css("div.kiwii-carousel-picture span::attr(src)").extract()
答案

如果要存储图像,例如:{IMAGES_STORE}/link-page/name.jpg,则需要扩展默认的ImagesPipeline类并覆盖方法file_path

例如:

from scrapy.pipelines.images import ImagesPipeline

class MyImagesPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None):
        # Code to generate {link-page/name.jpg} value

然后将其作为管道添加到您的设置文件中,而不是默认的ImagePipeline:

ITEM_PIPELINES = {'your_project.pipelines.ImagesPipeline': 1}

以上是关于Python下载图像文件夹的主要内容,如果未能解决你的问题,请参考以下文章

使用 libtorrent 下载特定片段

Python下载图像文件夹

在 Python 多处理进程中运行较慢的 OpenCV 代码片段

Python图像resize前后颜色不一致问题

用python下载图像

如何在Python中将图像分割成多个片段