如何从 Keras 中的 image_dataset_from_directory() 附加或获取 MapDataset 的文件名？

Posted 2023-02-16

技术标签:

【中文标题】如何从 Keras 中的 image_dataset_from_directory() 附加或获取 MapDataset 的文件名？【英文标题】：How to attach or get filenames from MapDataset from image_dataset_from_directory() in Keras? 【发布时间】：2022-01-12 13:11:56 【问题描述】：

我正在训练卷积自动编码器，我有这个用于加载数据（图像）的代码：

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    'path/to/images',
    image_size=image_size
)
normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)

def adjust_inputs(images, labels):
    return normalization_layer(images), normalization_layer(images)

normalized_train_ds = train_ds.map(adjust_inputs)

由于我不需要类标签，而是将其本身图像化为 Y，因此我将函数 adjust_inputs 映射到数据集。但是现在当我尝试访问属性filenames 时，我收到错误：AttributeError: 'MapDataset' object has no attribute 'filenames'。这是合乎逻辑的，因为 MapDataset 不是 Dataset。

如何附加或获取数据集中已加载图像的文件名？

我真的很惊讶没有更简单的界面，这看起来很常见。

【问题讨论】：

【参考方案1】：

以防万一您想将 filepaths 添加为数据集的一部分：

import tensorflow as tf
import pathlib
import matplotlib.pyplot as plt

dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)

batch_size = 32
train_ds = tf.keras.utils.image_dataset_from_directory(data_dir, shuffle=False, batch_size=batch_size)

normalization_layer = tf.keras.layers.Rescaling(1./255)
def change_inputs(images, labels, paths):
  x = normalization_layer(images)
  return x, x, tf.constant(paths)

normalized_ds = train_ds.map(lambda images, labels: change_inputs(images, labels, paths=train_ds.file_paths))

image, image, path = next(iter(normalized_ds.take(1)))

image = images[0]
path = paths[0]
print(path)
plt.imshow(image.numpy())

Found 3670 files belonging to 5 classes.
tf.Tensor(b'/root/.keras/datasets/flower_photos/daisy/100080576_f52e8ee070_n.jpg', shape=(), dtype=string)
<matplotlib.image.AxesImage at 0x7f9b113d1a10>

您必须确保对路径使用相同的批量大小。

【讨论】：

我如何将它输入到我的模型中？我想到了这个想法，但是这将在一个元组中包含三个值，我认为函数 model.fit() 不会对此感到满意。是的，但如果您使用custom training loop，您将能够控制向模型提供哪些数据。【参考方案2】：

我是通过以下方式做到的。

训练完我的模型后，我刚刚重新加载了所有图像，这次使用选项shuffle=False 并在我的模型中运行它们以提取特征。由于 shuffle 关闭，图像和文件路径的顺序是相同的。所以索引 0 处的图像，索引 0 处具有相应特征的图像的文件路径位于索引 0 处。

【讨论】：

以上是关于如何从 Keras 中的 image_dataset_from_directory() 附加或获取 MapDataset 的文件名？的主要内容，如果未能解决你的问题，请参考以下文章