切片 Tensorflow FixedLengthRecordReader 值

Posted 2023-02-23

技术标签:

【中文标题】切片 Tensorflow FixedLengthRecordReader 值【英文标题】：slicing Tensorflow FixedLengthRecordReader value 【发布时间】：2016-10-28 22:19:39 【问题描述】：

我正在使用 Tensorflow/python API 实现一个将图像映射到姿势的回归网络，并尝试处理 FixedLengthRecordReader 的输出。

我正在尝试将 cifar10 example 调整到最低限度以满足我的目的。

cifar10 示例读取原始字节，解码，然后拆分。

result.key, value = reader.read(filename_queue)

# Convert from a string to a vector of uint8 that is record_bytes long.
record_bytes = tf.decode_raw(value, tf.uint8)

# The first bytes represent the label, which we convert from uint8->int32.
result.label = tf.cast(
    tf.slice(record_bytes, [0], [label_bytes]), tf.int32)

# The remaining bytes after the label represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
                         [result.depth, result.height, result.width])
# Convert from [depth, height, width] to [height, width, depth].
result.uint8image = tf.transpose(depth_major, [1, 2, 0])

我正在从二进制文件列表中读取数据，这些文件保存为 (pose_data, image_data)。因为我的pose数据是float32，而我的图片数据是uint8，所以想先slice，再cast。不幸的是，reader.read 的 value 结果是一个零维字符串张量，所以切片不起作用。

key, value = reader.read(filename_queue)
print value.dtype
print value.get_shape()

<dtype: 'string'>
()

tf.decode_raw(value, dtype) 的结果是一维数组，但需要指定 dtype，而 tf.string 不是它所采用的有效类型。

解码前可以切片吗？还是我必须解码 -> 案例回到字符串 -> 切片 -> 重铸？还有其他方法吗？

【问题讨论】：

【参考方案1】：

找到了解决办法：解码两次，扔掉一半。效率不高（如果有人有更好的解决方案，我会很高兴听到它）但它似乎有效。

key, value = reader.read(filename_queue)
uint8_bytes = tf.decode_raw(value, tf.uint8)
uint8_data = uint8_bytes[:n_uint8_vals]
float32_bytes = tf.decode_raw(value, tf.float32)
float32_start_index = n_uint8_vals // 4
float32_data = float32_bytes[float32_start_index:]

这要求 n_uint8_vals 是 4 的因数。

【讨论】：

链接已损坏，如果可以在此处粘贴，我很乐意查看示例。【参考方案2】：

当您的数据具有多种类型（OP 的问题）并且没有“排列”（更一般的情况）时，OP 提到的 cifar10 示例以及两次解码的解决方案不起作用。

如果您的数据是例如：

[float32][int16][int16]

两次解码有效。但是，如果您的数据是：

[int16][float32][int16]

它不起作用，因为 tf.decode_raw 不接受半个 float32 的偏移量。

在这种情况下起作用的是tf.substr()，返回值来自

result.key, value = reader.read(filename_queue)

实际上是一个字符串（如果你愿意，也可以是一个字节串）并让它自己分裂。

【讨论】：

以上是关于切片 Tensorflow FixedLengthRecordReader 值的主要内容，如果未能解决你的问题，请参考以下文章