Tensorflow：如何编码和读取 bmp 图像？

Posted 2023-02-16

技术标签:

【中文标题】Tensorflow：如何编码和读取 bmp 图像？【英文标题】：Tensorflow: How to encode and read bmp images? 【发布时间】：2018-06-15 07:51:44 【问题描述】：

我正在尝试读取 .bmp 图像，对这些图像进行一些扩充，将它们保存到 .tfrecords 文件中，然后打开 .tfrecords 文件并将图像用于图像分类。我知道有一个 tf.image.encode_jpeg() 和一个 tf.image.encode_png() 函数，但是没有 tf.image.encode_bmp() 函数。我知道 .bmp 图像是未压缩的，所以我尝试简单地对图像进行 base64 编码、np.tostring() 和 np.tobytes()，但在尝试解码这些格式时出现以下错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError: channels attribute 3 does not match bits per pixel from file <some long number>

我的看法是，tensorflow 在编码为 jpeg 或 png 时，对图像的字节编码做了一些额外的事情；保存有关数组维度等的信息。但是，我对此一无所知，所以任何帮助都会很棒！

一些代码来显示我想要实现的目标：

with tf.gfile.FastGFile(filename, 'rb') as f:
    image_data = f.read()
    bmp_data = tf.placeholder(dtype=tf.string)
    decode_bmp = tf.image.decode_bmp(self._decode_bmp_data, channels=3)
    augmented_bmp = <do some augmentation on decode_bmp>
    sess = tf.Session()
    np_img = sess.run(augmented_bmp, feed_dict=bmp_data: image_data)
    byte_img = np_img.tostring()

    # Write byte_img to file using tf.train.Example
    writer = tf.python_io.TFRecordWriter(<output_tfrecords_filename>)
    example = tf.train.Example(features=tf.train.Features(feature=
        'encoded_img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[byte_img])))
    writer.write(example.SerializeToString())

    # Read img from file
    dataset = tf.data.TFRecordDataset(<img_file>)
    dataset = dataset.map(parse_img_fn)

parse_img_fn 可以浓缩为以下内容：

def parse_img_fn(serialized_example):
    features = tf.parse_single_example(serialized_example, feature_map)
    image = features['encoded_img']
    image = tf.image.decode_bmp(image, channels=3) # This is where the decoding fails
    features['encoded_img']

    return features

【问题讨论】：

似乎问题只是关于编码 bmp 图像，因为您知道如何阅读它们。您将其编码为 bmp 的用例是什么？为什么不改用 png 呢？好点！我不知道 png 是一种非破坏性压缩算法，因此我尝试修复 bmp 加密。那么，我将使用 png 代替，所以谢谢！无论如何，我还是想知道 tensorflow 是如何加密图像的，以及是否可以加密 bmp 图像。这将是一个了解它如何在幕后工作的绝佳机会！ 【参考方案1】：

^{在您的评论中，您的意思肯定是 encode 而不是 encrypt}

BMP file format 非常简单，由一堆标题和几乎原始像素数据组成。这就是 BMP 图像如此之大的原因。我想这也是为什么 TensorFlow 开发人员没有费心编写一个函数来将数组（表示图像）编码为这种格式的原因。很少有人还在使用它。建议改用 PNG，它对图像执行无损压缩。或者，如果您可以处理有损压缩，请使用 JPG。

TensorFlow 对图像编码没有做任何特别的事情。它只返回以该格式表示图像的字节，类似于 matplotlib 在您执行 save_fig 时所做的（除了 MPL 还将字节写入文件）。

假设您生成一个 numpy 数组，其中顶行为 0，底行为 255。这是一个 数字数组，如果您将其视为图片，将代表 2 个水平带，顶部为黑色，底部为白色。

如果您想在另一个程序 (GIMP) 中查看此图片，您需要将此信息编码为标准格式，例如 PNG。编码意味着添加一些标头和元数据，并且可以选择压缩数据。

现在已经更清楚了什么是编码，我建议您使用 PNG 图像。

with tf.gfile.FastGFile('image.png', 'rb') as f:
    # get the bytes representing the image
    # this is a 1D array (string) which includes header and stuff
    raw_png = f.read()

    # decode the raw representation into an array
    # so we have 2D array representing the image (3D if colour) 
    image = tf.image.decode_png(raw_png)

    # augment the image using e.g.
    augmented_img = tf.image.random_brightness(image)

    # convert the array back into a compressed representation
    # by encoding it into png
    # we now end up with a string again
    augmented_png = tf.image.encode_png(augmented_img, compression=9) 

    # Write augmented_png to file using tf.train.Example
    writer = tf.python_io.TFRecordWriter(<output_tfrecords_filename>)
    example = tf.train.Example(features=tf.train.Features(feature=
        'encoded_img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[augmented_png])))
    writer.write(example.SerializeToString())

    # Read img from file
    dataset = tf.data.TFRecordDataset(<img_file>)
    dataset = dataset.map(parse_img_fn)

有一些重要的建议：

不要使用numpy.tostring。这将返回一个 HUUGE 表示，因为每个像素都表示为一个浮点数，并且它们都是串联的。没有压缩，什么都没有。尝试检查文件大小:)

无需使用 tf.Session 传回 python。您可以在 TF 端执行所有操作。这样，您就有了一个输入图，您可以将其作为输入管道的一部分重复使用。

【讨论】：

【参考方案2】：

tensorflow主包中没有encode_bmp，但是如果你导入tensorflow_io（也是谷歌官方支持的包）你可以在那里找到encode_bmp方法。

有关文档，请参阅： https://www.tensorflow.org/io/api_docs/python/tfio/image/encode_bmp

【讨论】：

以上是关于Tensorflow：如何编码和读取 bmp 图像？的主要内容，如果未能解决你的问题，请参考以下文章