Keras 在训练分类 LSTM 序列到序列模型时给出 nan

Posted 2023-02-16

技术标签:

【中文标题】Keras 在训练分类 LSTM 序列到序列模型时给出 nan【英文标题】：Keras gives nan when training categorical LSTM sequence-to-sequence model 【发布时间】：2019-06-03 16:56:06 【问题描述】：

我正在尝试编写一个 Keras 模型（使用 Tensorflow 后端），它使用 LSTM 来预测序列的标签，就像在词性标注任务中一样。我编写的模型返回 nan 作为所有训练时期和所有标签预测的损失。我怀疑我的模型配置不正确，但我不知道我做错了什么。

完整的程序在这里。

from random import shuffle, sample
from typing import Tuple, Callable

from numpy import arange, zeros, array, argmax, newaxis


def sequence_to_sequence_model(time_steps: int, labels: int, units: int = 16):
    from keras import Sequential
    from keras.layers import LSTM, TimeDistributed, Dense

    model = Sequential()
    model.add(LSTM(units=units, input_shape=(time_steps, 1), return_sequences=True))
    model.add(TimeDistributed(Dense(labels)))
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    return model


def labeled_sequences(n: int, sequence_sampler: Callable[[], Tuple[array, array]]) -> Tuple[array, array]:
    """
    Create training data for a sequence-to-sequence labeling model.

    The features are an array of size samples * time steps * 1.
    The labels are a one-hot encoding of time step labels of size samples * time steps * number of labels.

    :param n: number of sequence pairs to generate
    :param sequence_sampler: a function that returns two numeric sequences of equal length
    :return: feature and label sequences
    """
    from keras.utils import to_categorical

    xs, ys = sequence_sampler()
    assert len(xs) == len(ys)
    x = zeros((n, len(xs)), int)
    y = zeros((n, len(ys)), int)
    for i in range(n):
        xs, ys = sequence_sampler()
        x[i] = xs
        y[i] = ys
    x = x[:, :, newaxis]
    y = to_categorical(y)
    return x, y


def digits_with_repetition_labels() -> Tuple[array, array]:
    """
    Return a random list of 10 digits from 0 to 9. Two of the digits will be repeated. The rest will be unique.
    Along with this list, return a list of 10 labels, where the label is 0 if the corresponding digits is unique and 1
    if it is repeated.

    :return: digits and labels
    """
    n = 10
    xs = arange(n)
    ys = zeros(n, int)
    shuffle(xs)
    i, j = sample(range(n), 2)
    xs[j] = xs[i]
    ys[i] = ys[j] = 1
    return xs, ys


def main():
    # Train
    x, y = labeled_sequences(1000, digits_with_repetition_labels)
    model = sequence_to_sequence_model(x.shape[1], y.shape[2])
    model.summary()
    model.fit(x, y, epochs=20, verbose=2)
    # Test
    x, y = labeled_sequences(5, digits_with_repetition_labels)
    y_ = model.predict(x, verbose=0)
    x = x[:, :, 0]
    for i in range(x.shape[0]):
        print(' '.join(str(n) for n in x[i]))
        print(' '.join([' ', '*'][int(argmax(n))] for n in y[i]))
        print(y_[i])


if __name__ == '__main__':
    main()

我的特征序列是从 0 到 9 的 10 位数字的数组。我对应的标签序列是 10 个零和一个的数组，其中零表示唯一数字，一表示重复数字。（这个想法是创建一个包含长距离依赖关系的简单分类任务。）

训练是这样的

Epoch 1/20
 - 1s - loss: nan
Epoch 2/20
 - 0s - loss: nan
Epoch 3/20
 - 0s - loss: nan

所有的标签数组预测都是这样的

[[nan nan]
 [nan nan]
 [nan nan]
 [nan nan]
 [nan nan]
 [nan nan]
 [nan nan]
 [nan nan]
 [nan nan]
 [nan nan]]

很明显出了点问题。

传递给model.fit 的特征矩阵的维度为samples × time steps × 1。标签矩阵的维度为samples × time steps × 2，其中 2 来自标签 0 和 1 的 one-hot 编码。

我正在使用 time-distributed dense layer 来预测序列，遵循 Keras 文档和 this 和 this 等帖子。据我所知，上面sequence_to_sequence_model 中定义的模型拓扑是正确的。模型摘要如下所示

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 10, 16)            1152      
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 2)             34        
=================================================================
Total params: 1,186
Trainable params: 1,186
Non-trainable params: 0
_________________________________________________________________

像this 这样的堆栈溢出问题听起来像nan 结果是数字问题的指标：失控梯度等等。但是，由于我正在处理一个很小的数据集并且从我的模型返回的每个数字都是nan，我怀疑我没有看到数字问题，而是我如何构建模型的问题。

上面的代码是否具有适合序列到序列学习的模型/数据形状？如果是这样，为什么我到处都得到nans？

【问题讨论】：

【参考方案1】：

默认情况下Dense 层没有激活。如果您指定一个，nans 就会消失。更改上面代码中的以下行。

model.add(TimeDistributed(Dense(labels, activation='softmax')))

【讨论】：

但是如果是回归模型呢？当我运行我的模型时（有时不是），我有时会看到同样的问题。 Nans 来自 1st epoch 本身，所以这不是梯度爆炸或消失的原因。我的是一个回归模型

以上是关于Keras 在训练分类 LSTM 序列到序列模型时给出 nan的主要内容，如果未能解决你的问题，请参考以下文章