TensorFlow 多个损失值

Posted 2023-02-16

技术标签:

【中文标题】TensorFlow 多个损失值【英文标题】：TensorFlow multiple values for loss 【发布时间】：2018-06-25 16:46:07 【问题描述】：

我正在研究这个RNN tutorial，以大致了解如何使用较低级别的 TensorFlow API 编写 RNN。虽然我已经完成了所有工作，但根据我在会话中评估它的方式，我得到了不同的 total_loss 值。

以下损失的计算方式有何不同？为什么与图中的其他节点（即在同一个运行语句中）一起运行训练步骤会导致不同的损失值，然后分别运行训练步骤和其他节点（即在不同的运行语句中）？

这是图表：

X = tf.placeholder(tf.int32, [batch_size, num_steps], name = 'X')
Y = tf.placeholder(tf.int32, [batch_size, num_steps], name = 'Y')
initial_state = tf.zeros([batch_size, state_size])

X_one_hot = tf.one_hot(X, num_classes)
rnn_inputs = tf.unstack(X_one_hot, axis = 1)

Y_one_hot = tf.one_hot(Y, num_classes)
Y_one_hot_list = tf.unstack(Y_one_hot, axis = 1)

with tf.variable_scope('RNN_cell'):
    W = tf.get_variable('W', [num_classes + state_size, state_size])
    b = tf.get_variable('b', [state_size], initializer = tf.constant_initializer(0.0))

tf.summary.histogram('RNN_cell/weights', W)

# define the RNN cell
def RNNCell(rnn_input, state, activation = tf.tanh):
    with tf.variable_scope('RNN_cell', reuse = True):
        W = tf.get_variable('W', [num_classes + state_size, state_size])
        b = tf.get_variable('b', [state_size], initializer = tf.constant_initializer(0))
        H = activation(tf.matmul(tf.concat([rnn_input, state], axis = 1), W) + b)
    return H

# add RNN cells to the computational graph
state = initial_state
rnn_outputs = []
for rnn_input in rnn_inputs:
    state = RNNCell(rnn_input, state, tf.tanh)
    rnn_outputs.append(state)
final_state = rnn_outputs[-1]

# set up the softmax output layer
with tf.variable_scope('softmax_output'):
    W = tf.get_variable('W', [state_size, num_classes])
    b = tf.get_variable('b', [num_classes], initializer = tf.constant_initializer(0.0))

tf.summary.histogram('softmax_output/weights', W)

logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]
probabilties = [tf.nn.softmax(logit) for logit in logits]
predictions = [tf.argmax(logit, 1) for logit in logits]

# set up loss function
losses = [tf.nn.softmax_cross_entropy_with_logits(labels = label, logits = logit) for 
         logit, label in zip(logits, Y_one_hot_list)]
total_loss = tf.reduce_mean(losses)

# set up the optimizer
train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)

tf.summary.scalar('loss', total_loss)

此版本的会话评估训练损失，采取一个 train_step，然后再次评估损失。

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter( './RNN_Tutorial/temp1', sess.graph)
    summary = tf.summary.merge_all()

    for index, epoch in enumerate(gen_epochs(num_epochs, num_steps)):
        training_state = np.zeros((batch_size, state_size))
        for step, (x, y) in enumerate(epoch):
            training_loss1 = sess.run(total_loss, feed_dict = X: x, Y: y, initial_state: training_state)
            sess.run(train_step, feed_dict = X: x, Y: y, initial_state: training_state)
            training_loss2 = sess.run(total_loss, feed_dict = X: x, Y: y, initial_state: training_state)

            if step % 1 == 0:
                train_writer.add_summary(summary_str, global_step = step)
                print(step, training_loss1, training_loss2)

输出看起来模型没有真正学习。这是（部分）输出，在所有 1000 次迭代中并没有真正改变。它只停留在 0.65 - 0.7 左右

0 0.6757775 0.66556937
1 0.6581067 0.6867344
2 0.70850086 0.66878074
3 0.67115635 0.68184483
4 0.67868954 0.6858209
5 0.6853568 0.66989964
6 0.672376 0.6554015
7 0.66563135 0.6655373
8 0.660332 0.6666234
9 0.6514224 0.6536864
10 0.65912485 0.6518013

这是我使用 train_step 运行 total_loss、loss 和 final_state 时的会话：

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter( './RNN_Tutorial/temp1', sess.graph)
    summary = tf.summary.merge_all()

    for index, epoch in enumerate(gen_epochs(num_epochs, num_steps)):
        training_state = np.zeros((batch_size, state_size))
        for step, (x, y) in enumerate(epoch):
            training_loss1 = sess.run(total_loss, feed_dict = X: x, Y: y, initial_state: training_state)
            tr_losses, training_loss_, training_state, _, summary_str = \
            sess.run([losses,
                      total_loss,
                      final_state,
                      train_step,
                      summary], feed_dict=X:x, Y:y, initial_state:training_state)
            training_loss2 = sess.run(total_loss, feed_dict = X: x, Y: y, initial_state: training_state)

            if step % 1 == 0:
                train_writer.add_summary(summary_str, global_step = step)
                print(step, training_loss1, training_loss_, training_loss2)

然而，在这个输出中，在 train step 之前计算的 total_loss 和 train step 计算的总 loss 稳步下降，然后在 0.53 附近稳定下来，而 train step 之后计算的 loss (training_loss2) 仍然在 0.65 - 0.7 左右波动以与第一次会议相同的方式。下面是另一个部分输出：

900 0.50464576 0.50464576 0.6973026
901 0.51603603 0.51603603 0.7115394
902 0.5465342 0.5465342 0.74994177
903 0.50591564 0.50591564 0.69172275
904 0.54837495 0.54837495 0.7333309
905 0.51697487 0.51697487 0.674438
906 0.5259896 0.5259896 0.70118546
907 0.5242365 0.5242365 0.71549624
908 0.50699174 0.50699174 0.7007787
909 0.5292892 0.5292892 0.7045353
910 0.49432433 0.49432433 0.73515224

我认为两个版本的会话块的训练损失是相同的。为什么单独使用 sess.run(total_loss, ...) 然后 sess.run(train_step, ...) （即在第一个版本中）导致不同的损失值而不是使用 sess.run([losses, total_loss, final_state , train_step], ...)?

【问题讨论】：

这里有什么问题？你能澄清一下你遇到的问题吗？ 【参考方案1】：

想通了。在第二个 for 循环中运行会话而不获取和更新 training_state = final_state 是问题所在。没有它，模型就不会学习生成数据中内置的较长依赖关系。

【讨论】：

以上是关于TensorFlow 多个损失值的主要内容，如果未能解决你的问题，请参考以下文章