tf.data.Dataset:不能为给定的输入类型指定 `batch_size` 参数

Posted

技术标签:

【中文标题】tf.data.Dataset:不能为给定的输入类型指定 `batch_size` 参数【英文标题】:tf.data.Dataset: The `batch_size` argument must not be specified for the given input type 【发布时间】:2020-03-16 09:53:57 【问题描述】:

我正在使用 Talos 和 Google colab TPU 运行 Keras 模型的超参数调整。请注意,我使用的是 Tensorflow 1.15.0 和 Keras 2.2.4-tf。

import os
import tensorflow as tf
import talos as ta
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split

def iris_model(x_train, y_train, x_val, y_val, params):

    # Specify a distributed strategy to use TPU
    resolver = tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
    tf.contrib.distribute.initialize_tpu_system(resolver)
    strategy = tf.contrib.distribute.TPUStrategy(resolver)

    # Use the strategy to create and compile a Keras model
    with strategy.scope():
      model = Sequential()
      model.add(Dense(32, input_shape=(4,), activation=tf.nn.relu, name="relu"))
      model.add(Dense(3, activation=tf.nn.softmax, name="softmax"))
      model.compile(optimizer=Adam(learning_rate=0.1), loss=params['losses'])

    # Convert data type to use TPU
    x_train = x_train.astype('float32')
    x_val = x_val.astype('float32')

    dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    dataset = dataset.cache()
    dataset = dataset.shuffle(1000, reshuffle_each_iteration=True).repeat()
    dataset = dataset.batch(params['batch_size'], drop_remainder=True)

    # Fit the Keras model on the dataset
    out = model.fit(dataset, batch_size=params['batch_size'], epochs=params['epochs'], validation_data=[x_val, y_val], verbose=0, steps_per_epoch=2)

    return out, model

# Load dataset
X, y = ta.templates.datasets.iris()

# Train and test set
x_train, x_val, y_train, y_val = train_test_split(X, y, test_size=0.30, shuffle=False)

# Create a hyperparameter distributions 
p = 'losses': ['logcosh'], 'batch_size': [128, 256, 384, 512, 1024], 'epochs': [10, 20]

# Use Talos to scan the best hyperparameters of the Keras model
scan_object = ta.Scan(x_train, y_train, params=p, model=iris_model, experiment_name='test', x_val=x_val, y_val=y_val, fraction_limit=0.1)

使用tf.data.Dataset 将训练集转换为数据集后,使用out = model.fit 拟合模型时出现以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-c812209b95d0> in <module>()
      8 
      9 # Use Talos to scan the best hyperparameters of the Keras model
---> 10 scan_object = ta.Scan(x_train, y_train, params=p, model=iris_model, experiment_name='test', x_val=x_val, y_val=y_val, fraction_limit=0.1)

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py in _validate_or_infer_batch_size(self, batch_size, steps, x)
   1813             'The `batch_size` argument must not be specified for the given '
   1814             'input type. Received input: , batch_size: '.format(
-> 1815                 x, batch_size))
   1816       return
   1817 

ValueError: The `batch_size` argument must not be specified for the given input type. Received input: <DatasetV1Adapter shapes: ((512, 4), (512, 3)), types: (tf.float32, tf.float32)>, batch_size: 512

然后,如果我按照这些说明操作并且不将批量大小参数设置为 model.fit。我收到另一个错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-c812209b95d0> in <module>()
      8 
      9 # Use Talos to scan the best hyperparameters of the Keras model
---> 10 scan_object = ta.Scan(x_train, y_train, params=p, model=iris_model, experiment_name='test', x_val=x_val, y_val=y_val, fraction_limit=0.1)

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py in _distribution_standardize_user_data(self, x, y, sample_weight, class_weight, batch_size, validation_split, shuffle, epochs, allow_partial_batch)
   2307             strategy) and not drop_remainder:
   2308           dataset_size = first_x_value.shape[0]
-> 2309           if dataset_size % batch_size == 0:
   2310             drop_remainder = True
   2311 

TypeError: unsupported operand type(s) for %: 'int' and 'NoneType'

【问题讨论】:

如果你可以发布一个完整的堆栈跟踪,这对最后一个错误会很有帮助,因为这个函数似乎在这个文件的很多地方被调用,所以我不知道你在哪里:github.com/tensorflow/tensorflow/blob/r1.15/tensorflow/python/… 我刚刚编辑了问题,您可以查看堆栈跟踪,感谢您的时间和考虑。 【参考方案1】:

在我看来,您的代码的问题在于培训 和验证数据的格式不同。您正在批处理 训练数据,但不是验证示例。

您可以通过替换 iris_model 函数的下半部分:

def fix_data(x, y):
    x = x.astype('float32')
    ds = Dataset.from_tensor_slices((x, y))
    ds = ds.cache()
    ds = ds.shuffle(1000, reshuffle_each_iteration = True)
    ds = ds.repeat()
    ds = ds.batch(params['batch_size'], drop_remainder = True)
    return ds
train = fix_data(x_train, y_train)
val = fix_data(x_val, y_val)

# Fit the Keras model on the dataset
out = model.fit(x = train, epochs = params['epochs'],
                steps_per_epoch = 2,
                validation_data = val,
                validation_steps = 2)

至少这对我有用,并且您的代码可以正常运行。

【讨论】:

非常感谢,是的,我认为那是我的错误。【参考方案2】:

您能否从您的代码中删除这些行并尝试:

    dataset = dataset.cache()
    dataset = dataset.shuffle(1000, reshuffle_each_iteration=True).repeat()
    dataset = dataset.batch(params['batch_size'], drop_remainder=True)
WITH THESE:
    dataset = dataset.repeat()
    dataset = dataset.batch(128, drop_remainder=True)
    dataset = dataset.prefetch(1)

否则你在tf.data.Dataset.from_tensor_slices中写的内容与错误有关。

【讨论】:

还是不行。正如您所说, tf.data.Dataset 与错误有关。但是,文档说使用 Cloud TPU tensorflow.org/guide/tpu#input_datasets 时必须包含它【参考方案3】:

当您没有通过 batch_size 以适应时,您在 _distribution_standardize_user_data 中遇到的第二个错误。

您为该功能运行的代码在这里:

https://github.com/tensorflow/tensorflow/blob/r1.15/tensorflow/python/keras/engine/training.py#L2192

您没有发布回溯,但我敢打赌它在 line 2294 上失败了,因为这是 batch_size 乘以某物的唯一地方。

if shuffle:
          # We want a buffer size that is larger than the batch size provided by
          # the user and provides sufficient randomness. Note that larger
          # numbers introduce more memory usage based on the size of each
          # sample.
          ds = ds.shuffle(max(1024, batch_size * 8))

看起来你可以通过设置shuffle=False来关闭它。

fit(ds, shuffle=False,...)

这行得通吗?

【讨论】:

谢谢,但我仍然得到与 shuffle=False 相同的错误。它在第 2309 行失败,而不是 2294。 @SamiBelkacem,那个'【参考方案4】:

不确定以下是否符合您的要求,但可以尝试一下。我所做的只是从数据集中删除了 repeat() 并从 model.fit 中删除了 batch_size=params['batch_size']

如果以上内容不是你准备牺牲的,那么请忽略该帖子。

import os
import tensorflow as tf
import talos as ta
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

def iris_model(x_train, y_train, x_val, y_val, params):

    # Specify a distributed strategy to use TPU
    resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
    tf.config.experimental_connect_to_host(resolver.master())
    tf.tpu.experimental.initialize_tpu_system(resolver)
    strategy = tf.distribute.experimental.TPUStrategy(resolver)

    with strategy.scope():
        model = Sequential()
        model.add(Dense(32, input_dim=4, activation=params['activation']))
        model.add(Dense(3, activation='softmax'))
        model.compile(optimizer=params['optimizer'], loss=params['losses'])

    # Convert the train set to a Dataset to use TPU
    dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    dataset = dataset.cache().shuffle(1000, reshuffle_each_iteration=True).batch(params['batch_size'], drop_remainder=True)

    out = model.fit(dataset, epochs=params['epochs'], validation_data=[x_val, y_val], verbose=0)

    return out, model

x, y = ta.templates.datasets.iris()

p = 'activation': ['relu', 'elu'],
       'optimizer': ['Nadam', 'Adam'],
       'losses': ['logcosh'],
       'batch_size': (20, 50, 5),
       'epochs': [10, 20]

scan_object = ta.Scan(x, y, model=iris_model, params=p, fraction_limit=0.1, experiment_name='first_test')

【讨论】:

它不工作:TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'【参考方案5】:

来自github code:

ValueError 将是 如果 x 是生成器或 Sequence 实例且 batch_size 是,则引发 指定为我们希望用户提供批处理数据集。

尝试使用batch_size = None

【讨论】:

我在 _distribution_standardize_user_data(self, x, y, sample_weight, class_weight, batch_size, validation_split, shuffle, epochs, allow_partial_batch) 中遇到另一个错误 TypeError: unsupported operand type(s) for *: 'NoneType' and 'int 你也应该设置steps_per_epoch = None 它不起作用,我收到另一个错误:ValueError: Attempt to convert a value (None) with an unsupported type () to a Tensor.我认为您可以通过复制短程序轻松重现错误

以上是关于tf.data.Dataset:不能为给定的输入类型指定 `batch_size` 参数的主要内容,如果未能解决你的问题,请参考以下文章

规范化 tf.data.Dataset

tf.keras 模型 多个输入 tf.data.Dataset

TensorFlow - tf.data.Dataset 读取大型 HDF5 文件

如何将 tf.data.Dataset 与 kedro 一起使用?

两个tf.data.Dataset可以共存并由tf.cond()控制

TensorFlow - tf.data.Dataset读取大型HDF5文件