Tensorflow:使用不同长度的多维输入数据创建 TensorFlow 数据集。 (视频数据)
Posted
技术标签:
【中文标题】Tensorflow:使用不同长度的多维输入数据创建 TensorFlow 数据集。 (视频数据)【英文标题】:Tensorflow: Creating a TensorFlow dataset using multi-dimensional input data with differing length. (Video Data) 【发布时间】:2022-01-15 06:12:20 【问题描述】:我遇到的问题是我大学四年级项目的一部分。该项目是翻译手语。我目前对输入数据的设置是一个形状为 [n_videos] 的 NumPy 数组,此列表中的每个视频都是一个形状为 [n_frames, n_hands=2, n_hand_landmarks=21, n_points(x,y) 的 NumPy 张量,z)=3]
输出数据只是一个单词数组,因此例如给定的视频张量可以映射到短语“
我遇到的问题是,当我尝试创建数据集时出现以下错误
ValueError:无法将 NumPy 数组转换为张量(不支持的对象类型 numpy.ndarray)。
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-56-bf28891dc793> in <module>
16 print(target_tensor_train.shape)
17
---> 18 dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
19 dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in from_tensor_slices(tensors, name)
779 Dataset: A `Dataset`.
780 """
--> 781 return TensorSliceDataset(tensors, name=name)
782
783 class _GeneratorState(object):
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, element, is_files, name)
4659 def __init__(self, element, is_files=False, name=None):
4660 """See `Dataset.from_tensor_slices()` for details."""
-> 4661 element = structure.normalize_element(element)
4662 batched_spec = structure.type_spec_from_value(element)
4663 self._tensors = structure.to_batched_tensor_list(batched_spec, element)
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/data/util/structure.py in normalize_element(element, element_signature)
127 dtype = getattr(spec, "dtype", None)
128 normalized_components.append(
--> 129 ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
130 return nest.pack_sequence_as(pack_as, normalized_components)
131
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py in wrapped(*args, **kwargs)
161 with Trace(trace_name, **trace_kwargs):
162 return func(*args, **kwargs)
--> 163 return func(*args, **kwargs)
164
165 return wrapped
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
1619
1620 if ret is None:
-> 1621 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1622
1623 if ret is NotImplemented:
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/tensor_conversion_registry.py in _default_conversion_function(***failed resolving arguments***)
50 def _default_conversion_function(value, dtype, name, as_ref):
51 del as_ref # Unused.
---> 52 return constant_op.constant(value, dtype, name=name)
53
54
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
269 ValueError: if called on a symbolic tensor.
270 """
--> 271 return _constant_impl(value, dtype, shape, name, verify_shape=False,
272 allow_broadcast=True)
273
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
281 with trace.Trace("tf.constant"):
282 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 283 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
284
285 g = ops.get_default_graph()
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
306 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
307 """Creates a constant on the current device."""
--> 308 t = convert_to_eager_tensor(value, ctx, dtype)
309 if shape is None:
310 return t
/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
104 dtype = dtypes.as_dtype(dtype).as_datatype_enum
105 ctx.ensure_initialized()
--> 106 return ops.EagerTensor(value, ctx.device_name, dtype)
107
108
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
我使用的代码是根据 Manning 的《使用 TensorFlow 第二版的机器学习》教科书的第 18 章编辑的。我正在使用 TensorFlow 2。
我的代码如下所示,用于演示数据的形状。
all_data = np.load('people_data_1.0.npz', allow_pickle=True)
phrases = all_data['Phrases']
input_data = all_data['Data']
print(input_data.shape)
print([item.shape for item in input_data])
(20,)
[(43, 2, 21, 3), (75, 2, 21, 3), (56, 2, 21, 3), (45, 2, 21, 3), (77, 2, 21, 3), (81, 2, 21, 3), (93, 2, 21, 3), (76, 2, 21, 3), (71, 2, 21, 3), (69, 2, 21, 3), (63, 2, 21, 3), (73, 2, 21, 3), (76, 2, 21, 3), (98, 2, 21, 3), (101, 2, 21, 3), (47, 2, 21, 3), (67, 2, 21, 3), (46, 2, 21, 3), (48, 2, 21, 3), (74, 2, 21, 3)]
输出数据被标记化并加载后如下所示;
[[ 1 4 3 13 2 0 0]
[ 1 4 3 14 15 2 0]
[ 1 4 3 11 2 0 0]
[ 1 4 3 7 2 0 0]
[ 1 4 3 8 2 0 0]
[ 1 4 3 9 2 0 0]
[ 1 5 6 10 3 2 0]
[ 1 5 6 12 2 0 0]
[ 1 16 3 17 18 19 2]
[ 1 20 21 2 0 0 0]
[ 1 4 3 11 2 0 0]
[ 1 4 3 7 2 0 0]
[ 1 4 3 8 2 0 0]
[ 1 4 3 9 2 0 0]
[ 1 5 6 10 3 2 0]
[ 1 4 3 7 2 0 0]
[ 1 4 3 8 2 0 0]
[ 1 4 3 9 2 0 0]
[ 1 5 6 10 3 2 0]
[ 1 5 6 12 2 0 0]]
i.e.
Target Language; index to word mapping
1 ----> <start>
4 ----> are
3 ----> you
7 ----> ill
2 ----> <end>
然后,当我检查输入和输出数据的形状和数据类型时,如下所示
[print(i.shape, i.dtype) for i in input_data]
[print(o.shape, o.dtype) for o in target_tensor]
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
下面的代码就是错误发生的地方。
BUFFER_SIZE = len(input_tensor_train)
BATCH_SIZE = 5
dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
我觉得这与输入是不同大小的 np 数组的列表有关。我考虑在视频数据的末尾用类似于单词的零填充,但觉得这会导致我的数据大小急剧增加,并且很好奇是否有其他方法可以解决这个问题。
任何关于此事的帮助以及处理此类输入和输出数据的另一种方法的方向将不胜感激。
谢谢,威廉。
【问题讨论】:
【参考方案1】:要创建不同长度的视频数据集,我建议这样做:
file_names = [str(i) for i in range(20)]
def dummy_read_file(name):
length = tf.random.uniform(shape=[], minval=10, maxval=40, dtype=tf.int32)
return tf.random.normal(shape=[length, 2, 21, 3])
dataset = tf.data.Dataset.from_tensor_slices(file_names)
dataset = dataset.map(lambda file_name: "file_name": file_name, "video": dummy_read_file(file_name))
dataset = dataset.padded_batch(4)
for batch in dataset.as_numpy_iterator():
print(batch["video"].shape)
# (4, 28, 2, 21, 3)
# (4, 24, 2, 21, 3)
# (4, 27, 2, 21, 3)
# (4, 23, 2, 21, 3)
# (4, 26, 2, 21, 3)
为了使批次的封闭长度获得更好的性能
替换dataset = dataset.padded_batch(4)
如下
...
dataset = dataset.apply(tf.data.experimental.bucket_by_sequence_length(
element_length_func=lambda sample: tf.shape(sample["video"])[0],
bucket_boundaries=[20, 30],
bucket_batch_sizes=[5, 4, 3],
))
...
for batch in dataset.as_numpy_iterator():
print(batch["video"].shape)
# (4, 27, 2, 21, 3)
# (5, 16, 2, 21, 3)
# (5, 19, 2, 21, 3)
# (4, 26, 2, 21, 3)
# (2, 11, 2, 21, 3)
或者使用
tf.data.Dataset.bucket_by_sequence_length
最新的 TensorFlow 版本。
你也可以试试
tf.RaggedTensor
但我不能推荐它。对于像整个视频数据集这样非常大的张量,它可能是不稳定的,而对于批处理几乎没有用。
为了进一步优化,通过视频长度预先计算,在实际文件上传之前进行分桶。
【讨论】:
以上是关于Tensorflow:使用不同长度的多维输入数据创建 TensorFlow 数据集。 (视频数据)的主要内容,如果未能解决你的问题,请参考以下文章