如何使用python检查目录中所有图像的尺寸?

Posted

技术标签:

【中文标题】如何使用python检查目录中所有图像的尺寸?【英文标题】:How to check dimensions of all images in a directory using python? 【发布时间】:2010-12-03 04:28:56 【问题描述】:

我需要检查目录中图像的尺寸。目前它有大约 700 张图像。 我只需要检查尺寸,如果尺寸与给定尺寸不匹配,它将被移动到不同的文件夹。我该如何开始?

【问题讨论】:

【参考方案1】:

如果您不需要 PIL 的其余部分而只需要 PNG、JPEG 和 GIF 的图像尺寸,那么这个小功能(BSD 许可)可以很好地完成工作:

http://code.google.com/p/bfg-pages/source/browse/trunk/pages/getimageinfo.py

import StringIO
import struct

def getImageInfo(data):
    data = str(data)
    size = len(data)
    height = -1
    width = -1
    content_type = ''

    # handle GIFs
    if (size >= 10) and data[:6] in ('GIF87a', 'GIF89a'):
        # Check to see if content_type is correct
        content_type = 'image/gif'
        w, h = struct.unpack("<HH", data[6:10])
        width = int(w)
        height = int(h)

    # See PNG 2. Edition spec (http://www.w3.org/TR/PNG/)
    # Bytes 0-7 are below, 4-byte chunk length, then 'IHDR'
    # and finally the 4-byte width, height
    elif ((size >= 24) and data.startswith('\211PNG\r\n\032\n')
          and (data[12:16] == 'IHDR')):
        content_type = 'image/png'
        w, h = struct.unpack(">LL", data[16:24])
        width = int(w)
        height = int(h)

    # Maybe this is for an older PNG version.
    elif (size >= 16) and data.startswith('\211PNG\r\n\032\n'):
        # Check to see if we have the right content type
        content_type = 'image/png'
        w, h = struct.unpack(">LL", data[8:16])
        width = int(w)
        height = int(h)

    # handle JPEGs
    elif (size >= 2) and data.startswith('\377\330'):
        content_type = 'image/jpeg'
        jpeg = StringIO.StringIO(data)
        jpeg.read(2)
        b = jpeg.read(1)
        try:
            while (b and ord(b) != 0xDA):
                while (ord(b) != 0xFF): b = jpeg.read(1)
                while (ord(b) == 0xFF): b = jpeg.read(1)
                if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                    jpeg.read(3)
                    h, w = struct.unpack(">HH", jpeg.read(4))
                    break
                else:
                    jpeg.read(int(struct.unpack(">H", jpeg.read(2))[0])-2)
                b = jpeg.read(1)
            width = int(w)
            height = int(h)
        except struct.error:
            pass
        except ValueError:
            pass

    return content_type, width, height

【讨论】:

这对我来说就像一个魅力,+1 对于没有第三方库的解决方案。 你用什么来称呼这个函数?你的数据是什么?【参考方案2】:

一种常见的方法是使用python成像库PIL来获取尺寸:

from PIL import Image
import os.path

filename = os.path.join('path', 'to', 'image', 'file')
img = Image.open(filename)
print img.size

然后您需要遍历目录中的文件,根据您所需的尺寸检查尺寸,并移动那些不匹配的文件。

【讨论】:

【参考方案3】:

您可以使用Python Imaging Library(又名 PIL)来读取图像标题并查询尺寸。

解决它的一种方法是自己编写一个函数,该函数接受文件名并返回尺寸(使用 PIL)。然后使用os.path.walk函数遍历目录下的所有文件,应用这个函数。收集结果,您可以构建映射字典filename -&gt; dimensions,然后使用列表推导(参见itertools)过滤掉与所需大小不匹配的那些。

【讨论】:

我这样做了,但是用 os.listdir 代替.. 与 ~700 个图像效果很好。 os.path.walk 更好吗? 如果os.listdir 能满足您的需求,那很好。主要区别在于os.walk 会递归到子目录中。【参考方案4】:

这是一个满足您需要的脚本:

#!/usr/bin/env python

"""
Get information about images in a folder.
"""

from os import listdir
from os.path import isfile, join

from PIL import Image


def print_data(data):
    """
    Parameters
    ----------
    data : dict
    """
    for k, v in data.items():
        print("%s:\t%s" % (k, v))
    print("Min width: %i" % data["min_width"])
    print("Max width: %i" % data["max_width"])
    print("Min height: %i" % data["min_height"])
    print("Max height: %i" % data["max_height"])


def main(path):
    """
    Parameters
    ----------
    path : str
        Path where to look for image files.
    """
    onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]

    # Filter files by extension
    onlyfiles = [f for f in onlyfiles if f.endswith(".jpg")]

    data = 
    data["images_count"] = len(onlyfiles)
    data["min_width"] = 10 ** 100  # No image will be bigger than that
    data["max_width"] = 0
    data["min_height"] = 10 ** 100  # No image will be bigger than that
    data["max_height"] = 0

    for filename in onlyfiles:
        im = Image.open(filename)
        width, height = im.size
        data["min_width"] = min(width, data["min_width"])
        data["max_width"] = max(width, data["max_width"])
        data["min_height"] = min(height, data["min_height"])
        data["max_height"] = max(height, data["max_height"])

    print_data(data)


if __name__ == "__main__":
    main(path=".")

【讨论】:

【参考方案5】:
import os
from PIL import Image 

folder_images = "/tmp/photos"
size_images = dict()

for dirpath, _, filenames in os.walk(folder_images):
    for path_image in filenames:
        image = os.path.abspath(os.path.join(dirpath, path_image))
        with Image.open(image) as img:
            width, heigth = img.size
            SIZE_IMAGES[path_image] = 'width': width, 'heigth': heigth
print(size_images)

folder_images 你的箭头目录中,它是图像。 size_images 是图片大小的变量,采用这种格式。

例子:

'image_name.jpg' : 'width': 100, 'heigth': 100 

【讨论】:

虽然您的代码背后的想法很好,但它缺乏解释。我还要指出,全部大写的变量通常用于常量,因此我不建议像使用 SIZE_IMAGES 那样将其用于字典。 拜托,我可以重新评估我的答案。【参考方案6】:

您还可以使用 cv2 库来检查图像的尺寸。

import cv2

# read image
img = cv2.imread('boarding_pass.png', cv2.IMREAD_UNCHANGED)

# get dimensions of image
dimensions = img.shape

# height, width, number of channels in image
height = img.shape[0]
width = img.shape[1]
channels = img.shape[2]

print('Image Dimension    : ',dimensions)
print('Image Height       : ',height)
print('Image Width        : ',width)
print('Number of Channels : ',channels)

【讨论】:

【参考方案7】:

我对上面提供的答案非常满意,因为这些答案帮助我为这个问题写了另一个简单的答案。 由于上述答案只有脚本,因此读者需要运行以检查它们是否正常工作。所以我决定使用交互模式编程(使用 Python shell)来解决这个问题。 我想你会很清楚。我正在使用 Python 2.7.12 并且我已经安装了 Pillow 库来使用 PIL 来访问图像。我的当前目录中有很多 jpg 图像和 1 个 png 图像。 现在让我们继续讨论 Python shell。

>>> #Date of creation : 3 March 2017
>>> #Python version   : 2.7.12
>>>
>>> import os         #Importing os module
>>> import glob       #Importing glob module to list the same type of image files like jpg/png(here)
>>> 
>>> for extension in ["jpg", 'png']:
...     print "List of all " + extension + " files in current directory:-"
...     i = 1
...     for imgfile in glob.glob("*."+extension):
...         print i,") ",imgfile
...         i += 1
...     print "\n"
... 
List of all jpg files in current directory:-
1 )  002-tower-babel.jpg
2 )  1454906.jpg
3 )  69151278-great-hd-wallpapers.jpg
4 )  amazing-ancient-wallpaper.jpg
5 )  Ancient-Rome.jpg
6 )  babel_full.jpg
7 )  Cuba-is-wonderfull.jpg
8 )  Cute-Polar-Bear-Images-07775.jpg
9 )  Cute-Polar-Bear-Widescreen-Wallpapers-07781.jpg
10 )  Hard-work-without-a-lh.jpg
11 )  jpeg422jfif.jpg
12 )  moscow-park.jpg
13 )  moscow_city_night_winter_58404_1920x1080.jpg
14 )  Photo1569.jpg
15 )  Pineapple-HD-Photos-03691.jpg
16 )  Roman_forum_cropped.jpg
17 )  socrates.jpg
18 )  socrates_statement1.jpg
19 )  steve-jobs.jpg
20 )  The_Great_Wall_of_China_at_Jinshanling-edit.jpg
21 )  torenvanbabel_grt.jpg
22 )  tower_of_babel4.jpg
23 )  valckenborch_babel_1595_grt.jpg
24 )  Wall-of-China-17.jpg


List of all png files in current directory:-
1 )  gergo-hungary.png


>>> #So let's display all the resolutions with the filename
... from PIL import Image   #Importing Python Imaging library(PIL)
>>> for extension in ["jpg", 'png']:
...     i = 1
...     for imgfile in glob.glob("*." + extension):
...         img = Image.open(imgfile)
...         print i,") ",imgfile,", resolution: ",img.size[0],"x",img.size[1]
...         i += 1
...     print "\n"
... 
1 )  002-tower-babel.jpg , resolution:  1024 x 768
2 )  1454906.jpg , resolution:  1920 x 1080
3 )  69151278-great-hd-wallpapers.jpg , resolution:  5120 x 2880
4 )  amazing-ancient-wallpaper.jpg , resolution:  1920 x 1080
5 )  Ancient-Rome.jpg , resolution:  1000 x 667
6 )  babel_full.jpg , resolution:  1464 x 1142
7 )  Cuba-is-wonderfull.jpg , resolution:  1366 x 768
8 )  Cute-Polar-Bear-Images-07775.jpg , resolution:  1600 x 1067
9 )  Cute-Polar-Bear-Widescreen-Wallpapers-07781.jpg , resolution:  2300 x 1610
10 )  Hard-work-without-a-lh.jpg , resolution:  650 x 346
11 )  jpeg422jfif.jpg , resolution:  2048 x 1536
12 )  moscow-park.jpg , resolution:  1920 x 1200
13 )  moscow_city_night_winter_58404_1920x1080.jpg , resolution:  1920 x 1080
14 )  Photo1569.jpg , resolution:  480 x 640
15 )  Pineapple-HD-Photos-03691.jpg , resolution:  2365 x 1774
16 )  Roman_forum_cropped.jpg , resolution:  4420 x 1572
17 )  socrates.jpg , resolution:  852 x 480
18 )  socrates_statement1.jpg , resolution:  1280 x 720
19 )  steve-jobs.jpg , resolution:  1920 x 1080
20 )  The_Great_Wall_of_China_at_Jinshanling-edit.jpg , resolution:  4288 x 2848
21 )  torenvanbabel_grt.jpg , resolution:  1100 x 805
22 )  tower_of_babel4.jpg , resolution:  1707 x 956
23 )  valckenborch_babel_1595_grt.jpg , resolution:  1100 x 748
24 )  Wall-of-China-17.jpg , resolution:  1920 x 1200


1 )  gergo-hungary.png , resolution:  1236 x 928


>>> 

【讨论】:

【参考方案8】:

如果您使用的是 ipython / jupyter notebook,这个功能就像一个魅力。方便的命令是 linux 终端中的file 命令。你问优点?这里:

速度极快,适合文件夹包含数千张图片且您需要了解图片大小分布的情况 无需将图片加载到内存中,从而节省内存过载
def get_image_size_faster(file_dir, ext='png'):
        """
        Function to retrieve image size without loading the image at all

        params:
        file_dir = path of the folder containing image files
        dim_index = index of image dimensions in the `file $file_path` call output
                    For PNG : -3 # Downloads/test.png: PNG image data, 4032 x 3024, 8-bit/color RGB, non-interlaced
                    For JPEG/JPG : -2 # Downloads/test.jpg: JPEG image data,..., baseline, precision 8, 2252x1400, components 3
                    For GIF : -1 # Downloads/test.gif: GIF image data, version 89a, 498 x 373
        """
        dim_index_map = 
            'png' : -3,
            'jpg' : -2,
            'jpeg': -2,
            'gif' : -1
        

        dim_index = dim_index_map[ext]

        files_regex = "file_dir/*.ext".format(file_dir=file_dir, ext=ext)
        outputs = !file $files_regex
        dims = [tuple(map(int, x.split(',')[dim_index].strip().split('x'))) for x in outputs]
        return dims

可以使用subprocess 包为这个函数编写python-script 替代方案,它产生相同的结果

【讨论】:

【参考方案9】:

我尝试使用@JohnTESlade 的答案,但我遇到了字节字符串转换问题,所以我更正了它,遵循了一些 PEP,并添加了对 EMF 类型的支持,这是我需要的。

def get_image_info(data: bytes) -> Tuple[str, int, int]:
    size = len(data)
    height = -1
    width = -1
    content_type = ''

    # handle GIFs
    if (size >= 10) and data[:6] in (b'GIF87a', b'GIF89a'):
        # Check to see if content_type is correct
        content_type = 'image/gif'
        w, h = struct.unpack("<HH", data[6:10])
        width = int(w)
        height = int(h)

    # See PNG 2. Edition spec (http://www.w3.org/TR/PNG/)
    # Bytes 0-7 are below, 4-byte chunk length, then 'IHDR'
    # and finally the 4-byte width, height
    elif ((size >= 24) and data[0:8] == b'\211PNG\r\n\032\n'
          and (data[12:16] == b'IHDR')):
        content_type = 'image/png'
        w, h = struct.unpack(">LL", data[16:24])
        width = int(w)
        height = int(h)

    # Maybe this is for an older PNG version.
    elif (size >= 16) and data[0:8] == b'\211PNG\r\n\032\n':
        # Check to see if we have the right content type
        content_type = 'image/png'
        w, h = struct.unpack(">LL", data[8:16])
        width = int(w)
        height = int(h)

    # handle JPEGs
    elif (size >= 2) and data[0:2] == b'\377\330':
        content_type = 'image/jpeg'
        jpeg = BytesIO(data)
        jpeg.read(2)
        b = jpeg.read(1)
        w, h = -1, -1
        try:
            while b and ord(b) != 0xDA:
                while ord(b) != 0xFF:
                    b = jpeg.read(1)
                while ord(b) == 0xFF:
                    b = jpeg.read(1)
                if 0xC0 <= ord(b) <= 0xC3:
                    jpeg.read(3)
                    h, w = struct.unpack(">HH", jpeg.read(4))
                    break
                else:
                    jpeg.read(int(struct.unpack(">H", jpeg.read(2))[0]) - 2)
                b = jpeg.read(1)
            width = int(w)
            height = int(h)
        except struct.error:
            pass
        except ValueError:
            pass

    # Maybe this will work for most EMF types.
    elif (size >= 40) and data[0:4] == b'\001\000\000\000':
        # Check to see if we have the right content type
        content_type = 'image/x-emf'
        x, y, r, b = struct.unpack("<LLLL", data[24:40])
        width = int(r - x)
        height = int(b - y)

    return content_type, width, height

【讨论】:

以上是关于如何使用python检查目录中所有图像的尺寸?的主要内容,如果未能解决你的问题,请参考以下文章

如何将图像尺寸(宽度/高度)转换为毫米?

如何获取图像文件的尺寸?

如何强制 Pillow 将图像调整为任意大小?

如何在所有设备和浏览器上显示具有适当尺寸的相同徽标图像?

如何使背景图像适合所有屏幕尺寸?

Jcrop 在预览前检查原始图像尺寸