合并 tensorflow 数据集批次

Posted

技术标签:

【中文标题】合并 tensorflow 数据集批次【英文标题】:Merging tensorflow dataset batches 【发布时间】:2021-12-20 03:21:47 【问题描述】:

请考虑下面的代码:

import tensorflow as tf
import numpy as np
 
simple_features = np.array([
         [1, 1, 1],
         [2, 2, 2],
         [3, 3, 3],
         [4, 4, 4],
         [5, 5, 5],

])
 
simple_labels = np.array([
         [-1, -1],
         [-2, -2],
         [-3, -3],
         [-4, -4],
         [-5, -5],

])
 

simple_features1 = np.array([
         [1, 4, 1],
         [2, 2, 2],
         [3, 3, 3],
         [6, 4, 4],
         [5, 4, 5],

])
 
simple_labels1 = np.array([
         [8, -7],
         [-2, -2],
         [-3, 7],
         [-4, 9],
         [-5, -5],

])

def print_dataset(ds):
    for inputs, targets in ds:
        print("---Batch---")
        print("Feature:", inputs.numpy())
        print("Label:", targets.numpy())
        print("")
        
ds1 = tf.keras.preprocessing.timeseries_dataset_from_array(simple_features, simple_labels, sequence_length=4, batch_size=1)
print_dataset(ds1)

ds2 = tf.keras.preprocessing.timeseries_dataset_from_array(simple_features1, simple_labels1, sequence_length=4, batch_size=1)
print_dataset(ds2)

上面的代码将创建特征和标签。我想以以下方式合并两个相应的批次。例如,第一批 ds1 显示如下:

---Batch---
Feature: [[[1 1 1]
  [2 2 2]
  [3 3 3]
  [4 4 4]]]
Label: [[-1 -1]]

...第一批ds2是这样的。

---Batch---
Feature: [[[1 4 1]
  [2 2 2]
  [3 3 3]
  [6 4 4]]]
Label: [[ 8 -7]]

第一批 ds1 和第一批 ds2 应该以这样的方式合并,给我以下输出:

---Batch---
Feature: [[[1 1 1 1 4 1]
  [2 2 2 2 2 2]
  [3 3 3 3 3 3]
  [4 4 4 6 4 4 ]]]
Label: [[-1 -1 8 -7]]

【问题讨论】:

【参考方案1】:

您可以使用tf.concat 连接您的两个数据集:

import tensorflow as tf
import numpy as np
 
simple_features = np.array([
         [1, 1, 1],
         [2, 2, 2],
         [3, 3, 3],
         [4, 4, 4],
         [5, 5, 5],
])
simple_labels = np.array([
         [-1, -1],
         [-2, -2],
         [-3, -3],
         [-4, -4],
         [-5, -5],
])
simple_features1 = np.array([
         [1, 4, 1],
         [2, 2, 2],
         [3, 3, 3],
         [6, 4, 4],
         [5, 4, 5],
])
simple_labels1 = np.array([
         [8, -7],
         [-2, -2],
         [-3, 7],
         [-4, 9],
         [-5, -5],
])

def print_dataset(ds):
    for inputs, targets in ds:
        print("---Batch---")
        print("Feature:", inputs.numpy())
        print("Label:", targets.numpy())
        print("")
        
ds1 = tf.keras.preprocessing.timeseries_dataset_from_array(simple_features, simple_labels, sequence_length=4, batch_size=1)
ds2 = tf.keras.preprocessing.timeseries_dataset_from_array(simple_features1, simple_labels1, sequence_length=4, batch_size=1)

def merge(data1, data2):
  x1, y1 = data1
  x2, y2 = data2
  return tf.concat([x1, x2], axis=-1), tf.concat([y1, y2], axis=-1)

dataset = tf.data.Dataset.zip((ds1, ds2)).map(merge)
print_dataset(dataset)
---Batch---
Feature: [[[1 1 1 1 4 1]
  [2 2 2 2 2 2]
  [3 3 3 3 3 3]
  [4 4 4 6 4 4]]]
Label: [[-1 -1  8 -7]]

---Batch---
Feature: [[[2 2 2 2 2 2]
  [3 3 3 3 3 3]
  [4 4 4 6 4 4]
  [5 5 5 5 4 5]]]
Label: [[-2 -2 -2 -2]]

【讨论】:

以上是关于合并 tensorflow 数据集批次的主要内容,如果未能解决你的问题,请参考以下文章

tensorflow怎么训练tfrecords 数据集

通过 TensorFlow 消费大数据

TensorFlow:tensorboard网络结构

tensorflow数据集加载

tensorflow读取tfrecord数据集

如何保存Tensorflow中的Tensor参数,保存训练中的中间参数,存储卷积层的数据