未能在 tensorflow 中训练玩具 LSTM

Posted

技术标签:

【中文标题】未能在 tensorflow 中训练玩具 LSTM【英文标题】:Failed to train toy LSTM in tensorflow 【发布时间】:2018-09-04 19:52:44 【问题描述】:

我正在尝试使用一个用于序列分类的玩具问题来熟悉 tensorflow 中的循环网络。

数据:

half_len = 500
pos_ex = [1, 2, 3, 4, 5] # Positive sequence.
neg_ex = [1, 2, 3, 4, 6] # Negative sequence.
num_input = len(pos_ex)
data = np.concatenate((np.stack([pos_ex]*half_len), np.stack([neg_ex]*half_len)), axis=0)
labels = np.asarray([0, 1] * half_len + [1, 0] * half_len).reshape((2 * half_len, -1))

型号:

_, x_width = data.shape
X = tf.placeholder("float", [None, x_width])
Y = tf.placeholder("float", [None, num_classes])

weights = tf.Variable(tf.random_normal([num_input, n_hidden]))
bias = tf.Variable(tf.random_normal([n_hidden]))


def lstm_model():
    from tensorflow.contrib import rnn
    x = tf.split(X, num_input, 1)
    rnn_cell = rnn.BasicLSTMCell(n_hidden)
    outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)
    return tf.matmul(outputs[-1], weights) + bias

培训:

logits = lstm_model()
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Train...

我的训练准确率在 0.5 左右变化,这让我很困惑,因为问题很简单。

Step 1, Minibatch Loss = 82.2726, Training Accuracy = 0.453
Step 25, Minibatch Loss = 6.7920, Training Accuracy = 0.547
Step 50, Minibatch Loss = 0.8528, Training Accuracy = 0.500
Step 75, Minibatch Loss = 0.6989, Training Accuracy = 0.500
Step 100, Minibatch Loss = 0.6929, Training Accuracy = 0.516

将玩具数据更改为:

pos_ex = [1, 2, 3, 4, 5]
neg_ex = [1, 2, 3, 4, 100]

立即收敛到准确性 1. 谁能解释一下为什么这个网络在如此简单的任务上失败了?谢谢。

以上代码基于this tutorial。

【问题讨论】:

【参考方案1】:

您是否尝试过降低学习率? 在第二个例子中,最后一个坐标上的间隔值较大,这应该没有区别,但对学习率的选择有影响。 如果您要对数据进行归一化(将每个坐标的域设置在 -1 和 1 之间),并找到合适的步长,您应该以相同的步数解决这两个问题。

编辑:稍微玩一下你的玩具示例,即使没有规范化,以下内容也可以正常工作

import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn

# Meta parameters

n_hidden = 10
num_classes = 2
learning_rate = 1e-2
input_dim = 5
num_input = 5



# inputs
X = tf.placeholder("float", [None, input_dim])
Y = tf.placeholder("float", [None, num_classes])

# Model
def lstm_model():
    # input layer
    x = tf.split(X, num_input, 1)

    # LSTM layer
    rnn_cell = rnn.BasicLSTMCell(n_hidden)
    outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)

    # final layer - softmax
    weights = tf.Variable(tf.random_normal([n_hidden, num_classes]))
    bias = tf.Variable(tf.random_normal([num_classes]))
    return tf.matmul(outputs[-1], weights) + bias

# logits and prediction
logits = lstm_model()
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)


# -----------
# Train func
# -----------
def train(data,labels):

    with tf.Session() as session:
        session.run(tf.global_variables_initializer())
        for i in range(1000):
            _, loss, onehot_pred = session.run([train_op, loss_op, prediction], feed_dict=X: data, Y: labels)
            acc = np.mean(np.argmax(onehot_pred,axis=1) == np.argmax(labels,axis=1))
            print('Iteration  accuracy: '.format(i,acc))
            if acc == 1:
                print('---> Finised after  iterations'.format(i+1))
                break

# -----------
# Train 1
# -----------
# data generation
half_len = 500
pos_ex = [1, 2, 3, 4, 5] # Positive sequence.
neg_ex = [1, 2, 3, 4, 6] # Negative sequence.



data = np.concatenate((np.stack([pos_ex]*half_len), np.stack([neg_ex]*half_len)), axis=0)
labels = np.asarray([0, 1] * half_len + [1, 0] * half_len).reshape((2 * half_len, -1))

train(data,labels)

# -----------
# Train 2
# -----------
# data generation
half_len = 500
pos_ex = [1, 2, 3, 4, 5] # Positive sequence.
neg_ex = [1, 2, 3, 4, 100] # Negative sequence.


data = np.concatenate((np.stack([pos_ex]*half_len), np.stack([neg_ex]*half_len)), axis=0)
labels = np.asarray([0, 1] * half_len + [1, 0] * half_len).reshape((2 * half_len, -1))

train(data,labels)

输出是:

Iteration 0 accuracy: 0.5
Iteration 1 accuracy: 0.5
Iteration 2 accuracy: 0.5
Iteration 3 accuracy: 0.5
Iteration 4 accuracy: 0.5
Iteration 5 accuracy: 0.5
Iteration 6 accuracy: 0.5
Iteration 7 accuracy: 0.5
Iteration 8 accuracy: 0.5
Iteration 9 accuracy: 0.5
Iteration 10 accuracy: 1.0
---> Finised after 11 iterations

Iteration 0 accuracy: 0.5
Iteration 1 accuracy: 0.5
Iteration 2 accuracy: 0.5
Iteration 3 accuracy: 0.5
Iteration 4 accuracy: 0.5
Iteration 5 accuracy: 0.5
Iteration 6 accuracy: 0.5
Iteration 7 accuracy: 0.5
Iteration 8 accuracy: 0.5
Iteration 9 accuracy: 1.0
---> Finised after 10 iterations

祝你好运!

【讨论】:

就这么简单......学习率修复了它,标准化是一个好点。感谢您的帮助,对我的经验不足感到抱歉。 @Gerry 没问题,事情有时看起来很复杂,尽管它们并不复杂......我现在在对我的问题进行强化学习时遇到了一些烦人的问题......它很痛苦......小问题总是会导致最大的麻烦。

以上是关于未能在 tensorflow 中训练玩具 LSTM的主要内容,如果未能解决你的问题,请参考以下文章

Tensorflow[LSTM]

Tensorflow LSTM实现

具有 LSTM 网络的连体模型无法使用 tensorflow 进行训练

使用Tensorflow后端的Keras LSTM RNN中令人费解的训练损失与纪元...行为的任何原因

在 Tensorflow 中运行 LSTM 时出现 ResourceExhausted 错误或 OOM

如何从 python 中的预训练模型中获取权重并在 tensorflow 中使用它?