keras 中的 preprocess_input() 方法

Posted 2023-02-23

技术标签:

【中文标题】keras 中的 preprocess_input() 方法【英文标题】：preprocess_input() method in keras 【发布时间】：2018-05-13 08:12:16 【问题描述】：

我正在尝试来自以下keras 文档页面的示例keras 代码， https://keras.io/applications/

keras 模块的preprocess_input(x) 函数在下面的代码中有什么作用？为什么在传递给preprocess_input()方法之前我们必须做expand_dims(x, axis=0)？

from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input
import numpy as np

model = ResNet50(weights='imagenet')

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

是否有任何文档对这些功能进行了很好的解释？

谢谢！

【问题讨论】：

github.com/fchollet/keras/blob/master/keras/applications/… 谢谢！，所以我理解它标准化了 -1 到 +1 之间的像素，任何关于为什么我们必须对传递给它的输入执行 np.expand_dims(x,axis=0) 的解释功能？ @AKSHAYAAVAIDYANATHAN 抱歉这个（也许）愚蠢的问题，但你怎么理解它从上面的链接中标准化了 [-1,1] 中的像素？ @RicS 通过使用 preprocess_input() 函数后直接查看输出数组。 @AKSHAYAAVAIDYANATHAN 确实很愚蠢。谢谢你:) 【参考方案1】：

正如您在tensorflow/python/keras/_impl/keras/applications/imagenet_utils.py 看到的那样，torch 预处理的主要目的是相应地对颜色通道进行归一化，哪个数据集之前使用过训练网络。就像我们通过简单的 (Data - Mean) / Std 所做的那样

源代码：

def _preprocess_numpy_input(x, data_format, mode):
  """Preprocesses a Numpy array encoding a batch of images.
  Arguments:
      x: Input array, 3D or 4D.
      data_format: Data format of the image array.
      mode: One of "caffe", "tf" or "torch".
          - caffe: will convert the images from RGB to BGR,
              then will zero-center each color channel with
              respect to the ImageNet dataset,
              without scaling.
          - tf: will scale pixels between -1 and 1,
              sample-wise.
          - torch: will scale pixels between 0 and 1 and then
              will normalize each channel with respect to the
              ImageNet dataset.
  Returns:
      Preprocessed Numpy array.
  """
  if mode == 'tf':
    x /= 127.5
    x -= 1.
    return x

  if mode == 'torch':
    x /= 255.
    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]
  else:
    if data_format == 'channels_first':
      # 'RGB'->'BGR'
      if x.ndim == 3:
        x = x[::-1, ...]
      else:
        x = x[:, ::-1, ...]
    else:
      # 'RGB'->'BGR'
      x = x[..., ::-1]
    mean = [103.939, 116.779, 123.68]
    std = None

  # Zero-center by mean pixel
  if data_format == 'channels_first':
    if x.ndim == 3:
      x[0, :, :] -= mean[0]
      x[1, :, :] -= mean[1]
      x[2, :, :] -= mean[2]
      if std is not None:
        x[0, :, :] /= std[0]
        x[1, :, :] /= std[1]
        x[2, :, :] /= std[2]
    else:
      x[:, 0, :, :] -= mean[0]
      x[:, 1, :, :] -= mean[1]
      x[:, 2, :, :] -= mean[2]
      if std is not None:
        x[:, 0, :, :] /= std[0]
        x[:, 1, :, :] /= std[1]
        x[:, 2, :, :] /= std[2]
  else:
    x[..., 0] -= mean[0]
    x[..., 1] -= mean[1]
    x[..., 2] -= mean[2]
    if std is not None:
      x[..., 0] /= std[0]
      x[..., 1] /= std[1]
      x[..., 2] /= std[2]
  return x

【讨论】：

【参考方案2】：

我发现预处理您的数据，而您的数据集与预训练模型/数据集的数据集相差太大，那么它可能会以某种方式损害您的准确性。如果您进行迁移学习并从预训练模型/它们的权重中冻结某些层，只需 /255.0 您的原始数据集就可以很好地完成这项工作，至少对于 1/2 百万的大型样本食品数据集。理想情况下，您应该知道数据集的 std/mean 并使用它，而不是使用预训练模型预处理的 std/mdean。

【讨论】：

是的。一个不错的选择是尝试与模型一起获得的默认预处理函数。对于某些模型，准确性会受到影响。在这种情况下使用史蒂夫建议的替代方法。这在创建合奏时很有帮助【参考方案3】：

这会加载图像并将图像大小调整为 (224, 224)：

 img = image.load_img(img_path, target_size=(224, 224))

img_to_array() 函数添加通道：x.shape = (224, 224, 3) 用于 RGB，(224, 224, 1) 用于灰度图像

 x = image.img_to_array(img)

expand_dims()用于添加图片数量：x.shape = (1, 224, 224, 3)：

x = np.expand_dims(x, axis=0)

preprocess_input 减去 imagenet 数据集的平均 RGB 通道。这是因为您使用的模型已经在不同的数据集上进行了训练：x.shape 仍然是 (1, 224, 224, 3)

x = preprocess_input(x)

如果将x 添加到数组images 中，则在循环结束时，您需要添加images = np.vstack(images) 以便获得(n, 224, 224, 3) 作为图像的暗度，其中n 是图像的数量图像处理

【讨论】：

我认为将图像添加到列表然后堆叠它们会比使用np.vstack()更有效【参考方案4】：

Keras 可以处理批量图像。因此，第一个维度用于您拥有的样本（或图像）的数量。

加载单张图片时，会得到一张图片的形状，即(size1,size2,channels)。

为了创建一批图片，你需要一个额外的维度：(samples, size1,size2,channels)

preprocess_input 函数旨在使您的图像适合模型所需的格式。

一些模型使用的图像值范围为 0 到 1。其他模型使用的值范围为 -1 到 +1。其他人则使用“caffe”样式，即未归一化，而是居中。

来自source code，Resnet 使用的是 caffe 风格。

您无需担心preprocess_input 的内部细节。但理想情况下，您应该为此使用 keras 函数加载图像（这样您就可以保证您加载的图像与preprocess_input 兼容）。

【讨论】：

是的，但有时，该函数在模型文件中显式声明，有时从imagenet_utils 加载。使用不同的功能查看此模型：github.com/fchollet/keras/blob/master/keras/applications/… 最终用户应该从模型的模块中加载它，以确保加载正确的功能。是的，再次感谢。我还看到不同型号的相同功能。例如，有一个用于 vgg16、resnet50 等。到目前为止，我认为所有这些模型实际上都适用于任何范围的图像。如果我不使用它们对应的 preprocess_input() 方法，你认为模型的性能会受到影响吗？是的，性能会改变。模型的权重会根据某些输入值进行调整和优化。当您从正确的模块（所选模型的模块，例如from keras.applications.vgg16 import preprocess_input）导入preprocess_input时，您具有将标准图像正确转换为适当输入的功能。我也没有不知道哪个模型使用什么，但我们可以随时查看源代码。如果模型是自定义的（未预训练），您可以使用任何类型的输入。

以上是关于keras 中的 preprocess_input() 方法的主要内容，如果未能解决你的问题，请参考以下文章

如何以正确的顺序标记图像以进行 keras 图像分类？

如何使用提供的需要 tf.Tensor 的 preprocess_input 函数预处理 tf.data.Dataset？

Keras.metrics中的accuracy

keras 中的 BatchNormalization 是如何工作的？

Keras 中的 tensorflow 会话在哪里