深度神经网络中正则化和 dropout 对 NaN 的巨大损失值

Posted

技术标签:

【中文标题】深度神经网络中正则化和 dropout 对 NaN 的巨大损失值【英文标题】:Huge loss value to NaN on regularization and dropout in a deep neural network 【发布时间】:2017-12-17 06:02:28 【问题描述】:

我正在学习 Udacity 的深度学习课程。给出的任务之一是在多层神经网络中实现正则化和 dropout。

实施后,我的小批量损失在第 0 步非常高,在第 1 步变为无穷大,然后在其余输出中变得不存在

Offset at step 0: 0
Minibatch loss at step 0: 187359330304.000000
Minibatch accuracy: 10.2%
Validation accuracy: 10.0% 

Offset at step 1: 128
Minibatch loss at step 1: inf
Minibatch accuracy: 14.1%
Validation accuracy: 10.0% 

Offset at step 2: 256
Minibatch loss at step 2: nan
Minibatch accuracy: 7.8%
Validation accuracy: 10.0% 

Offset at step 3: 384
Minibatch loss at step 3: nan
Minibatch accuracy: 11.7%
Validation accuracy: 10.0% 

这里是所有相关代码。我相信这与我完成优化的方式(因为这是从给定任务中获取的)或我的 正则化,所以我不确定它还能在哪里。我也玩过隐藏层中的节点数(1024 > 300 > 60),但它做同样的事情。

这是我的代码(请原谅缩进,在我的代码中是正确的):

batch_size = 128
num_nodes_1 = 768
num_nodes_2 = 1024
num_nodes_3 = 512
dropout_value = 0.5
beta = 0.01

graph = tf.Graph()
with graph.as_default():

tf_train_data = tf.placeholder(tf.float32, shape=(batch_size, image_size*image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_data = tf.constant(valid_dataset)
tf_test_data = tf.constant(test_dataset)

def gen_weights_biases(input_size, output_size):
    weights = tf.Variable(tf.truncated_normal([input_size, output_size]))
    biases = tf.Variable(tf.zeros([output_size]))
    return weights, biases

weights_1, biases_1 = gen_weights_biases(image_size*image_size, num_nodes_1)
weights_2, biases_2 = gen_weights_biases(num_nodes_1, num_nodes_2)
weights_3, biases_3 = gen_weights_biases(num_nodes_2, num_nodes_3)
weights_4, biases_4 = gen_weights_biases(num_nodes_3, num_labels)

logits_1 = tf.matmul(tf_train_data, weights_1) + biases_1
h_layer_1 = tf.nn.relu(logits_1)
h_layer_1 = tf.nn.dropout(h_layer_1, dropout_value)

logits_2 = tf.matmul(h_layer_1, weights_2) + biases_2
h_layer_2 = tf.nn.relu(logits_2)
h_layer_2 = tf.nn.dropout(h_layer_2, dropout_value)

logits_3 = tf.matmul(h_layer_2, weights_3) + biases_3
h_layer_3 = tf.nn.relu(logits_3)
h_layer_3 = tf.nn.dropout(h_layer_3, dropout_value)

logits_4 = tf.matmul(h_layer_3, weights_4) + biases_4

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_4))
regularization = tf.nn.l2_loss(logits_1) + tf.nn.l2_loss(logits_2) + tf.nn.l2_loss(logits_3) + tf.nn.l2_loss(logits_4)
reg_loss = tf.reduce_mean(loss + regularization * beta)

global_step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(0.5, global_step, 750, 0.8)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(reg_loss, global_step=global_step)

train_prediction = tf.nn.softmax(logits_4)

def make_prediction(input_data):
    p_logits_1 = tf.matmul(input_data, weights_1) + biases_1
    p_layer_1 = tf.nn.relu(p_logits_1)
    p_logits_2 = tf.matmul(p_layer_1, weights_2) + biases_2
    p_layer_2 = tf.nn.relu(p_logits_2)
    p_logits_3 = tf.matmul(p_layer_2, weights_3) + biases_3
    p_layer_3 = tf.nn.relu(p_logits_3)

    p_logits_4 = tf.matmul(p_layer_3, weights_4) + biases_4
    return tf.nn.relu(p_logits_4)

valid_prediction = make_prediction(tf_valid_data)
test_prediction = make_prediction(tf_test_data)

num_steps = 10001

with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print("Initialized \n")

for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]

    feed_dict = tf_train_data:batch_data, tf_train_labels:batch_labels

    _, l, predictions = session.run([optimizer, reg_loss, train_prediction], feed_dict=feed_dict)

    if(step % 1 == 0):
        print("Offset at step %d: %d" % (step, offset))
        print("Minibatch loss at step %d: %f" % (step, l))
        print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
        print("Validation accuracy: %.1f%% \n" % accuracy(valid_prediction.eval(), valid_labels))

print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

为什么会发生这种情况,我该如何解决?

【问题讨论】:

【参考方案1】:

问题在于权重的标准偏差。我不确定为什么要解决这个问题,如果有人可以解释,我将不胜感激。无论如何,解决方法是:

def gen_weights_biases(input_size, output_size):
    weights = tf.Variable(tf.truncated_normal([input_size, output_size], stddev=math.sqrt(2.0/(input_size))))
    biases = tf.Variable(tf.zeros([output_size]))
    return weights, biases

beta 率也必须降低到 0.0001

【讨论】:

权重初始化对于良好的收敛非常重要。您现在使用的初始化来自arxiv.org/abs/1502.01852,您可以在其中阅读更多相关信息。 是的。初始化不良的神经网络很可能会爆炸。我将再添加一篇关于 init 及其重要性的论文:arxiv.org/abs/1511.06422

以上是关于深度神经网络中正则化和 dropout 对 NaN 的巨大损失值的主要内容,如果未能解决你的问题,请参考以下文章

深度学习之dropout正则化

神经网络与深度学习笔记dropout 正则化等其他减小方差的方法

Keras 的深度学习模型中的 Dropout 正则化

Keras减少过拟合的秘诀——Dropout正则化

深度学习中的正则化——L1L2 和 Dropout

激活函数,Batch Normalization和Dropout