深度神经网络中正则化和 dropout 对 NaN 的巨大损失值
Posted
技术标签:
【中文标题】深度神经网络中正则化和 dropout 对 NaN 的巨大损失值【英文标题】:Huge loss value to NaN on regularization and dropout in a deep neural network 【发布时间】:2017-12-17 06:02:28 【问题描述】:我正在学习 Udacity 的深度学习课程。给出的任务之一是在多层神经网络中实现正则化和 dropout。
实施后,我的小批量损失在第 0 步非常高,在第 1 步变为无穷大,然后在其余输出中变得不存在
Offset at step 0: 0
Minibatch loss at step 0: 187359330304.000000
Minibatch accuracy: 10.2%
Validation accuracy: 10.0%
Offset at step 1: 128
Minibatch loss at step 1: inf
Minibatch accuracy: 14.1%
Validation accuracy: 10.0%
Offset at step 2: 256
Minibatch loss at step 2: nan
Minibatch accuracy: 7.8%
Validation accuracy: 10.0%
Offset at step 3: 384
Minibatch loss at step 3: nan
Minibatch accuracy: 11.7%
Validation accuracy: 10.0%
这里是所有相关代码。我相信这与我完成优化的方式(因为这是从给定任务中获取的)或我的 正则化,所以我不确定它还能在哪里。我也玩过隐藏层中的节点数(1024 > 300 > 60),但它做同样的事情。
这是我的代码(请原谅缩进,在我的代码中是正确的):
batch_size = 128
num_nodes_1 = 768
num_nodes_2 = 1024
num_nodes_3 = 512
dropout_value = 0.5
beta = 0.01
graph = tf.Graph()
with graph.as_default():
tf_train_data = tf.placeholder(tf.float32, shape=(batch_size, image_size*image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_data = tf.constant(valid_dataset)
tf_test_data = tf.constant(test_dataset)
def gen_weights_biases(input_size, output_size):
weights = tf.Variable(tf.truncated_normal([input_size, output_size]))
biases = tf.Variable(tf.zeros([output_size]))
return weights, biases
weights_1, biases_1 = gen_weights_biases(image_size*image_size, num_nodes_1)
weights_2, biases_2 = gen_weights_biases(num_nodes_1, num_nodes_2)
weights_3, biases_3 = gen_weights_biases(num_nodes_2, num_nodes_3)
weights_4, biases_4 = gen_weights_biases(num_nodes_3, num_labels)
logits_1 = tf.matmul(tf_train_data, weights_1) + biases_1
h_layer_1 = tf.nn.relu(logits_1)
h_layer_1 = tf.nn.dropout(h_layer_1, dropout_value)
logits_2 = tf.matmul(h_layer_1, weights_2) + biases_2
h_layer_2 = tf.nn.relu(logits_2)
h_layer_2 = tf.nn.dropout(h_layer_2, dropout_value)
logits_3 = tf.matmul(h_layer_2, weights_3) + biases_3
h_layer_3 = tf.nn.relu(logits_3)
h_layer_3 = tf.nn.dropout(h_layer_3, dropout_value)
logits_4 = tf.matmul(h_layer_3, weights_4) + biases_4
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_4))
regularization = tf.nn.l2_loss(logits_1) + tf.nn.l2_loss(logits_2) + tf.nn.l2_loss(logits_3) + tf.nn.l2_loss(logits_4)
reg_loss = tf.reduce_mean(loss + regularization * beta)
global_step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(0.5, global_step, 750, 0.8)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(reg_loss, global_step=global_step)
train_prediction = tf.nn.softmax(logits_4)
def make_prediction(input_data):
p_logits_1 = tf.matmul(input_data, weights_1) + biases_1
p_layer_1 = tf.nn.relu(p_logits_1)
p_logits_2 = tf.matmul(p_layer_1, weights_2) + biases_2
p_layer_2 = tf.nn.relu(p_logits_2)
p_logits_3 = tf.matmul(p_layer_2, weights_3) + biases_3
p_layer_3 = tf.nn.relu(p_logits_3)
p_logits_4 = tf.matmul(p_layer_3, weights_4) + biases_4
return tf.nn.relu(p_logits_4)
valid_prediction = make_prediction(tf_valid_data)
test_prediction = make_prediction(tf_test_data)
num_steps = 10001
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print("Initialized \n")
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
feed_dict = tf_train_data:batch_data, tf_train_labels:batch_labels
_, l, predictions = session.run([optimizer, reg_loss, train_prediction], feed_dict=feed_dict)
if(step % 1 == 0):
print("Offset at step %d: %d" % (step, offset))
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Validation accuracy: %.1f%% \n" % accuracy(valid_prediction.eval(), valid_labels))
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
为什么会发生这种情况,我该如何解决?
【问题讨论】:
【参考方案1】:问题在于权重的标准偏差。我不确定为什么要解决这个问题,如果有人可以解释,我将不胜感激。无论如何,解决方法是:
def gen_weights_biases(input_size, output_size):
weights = tf.Variable(tf.truncated_normal([input_size, output_size], stddev=math.sqrt(2.0/(input_size))))
biases = tf.Variable(tf.zeros([output_size]))
return weights, biases
beta 率也必须降低到 0.0001
【讨论】:
权重初始化对于良好的收敛非常重要。您现在使用的初始化来自arxiv.org/abs/1502.01852,您可以在其中阅读更多相关信息。 是的。初始化不良的神经网络很可能会爆炸。我将再添加一篇关于 init 及其重要性的论文:arxiv.org/abs/1511.06422以上是关于深度神经网络中正则化和 dropout 对 NaN 的巨大损失值的主要内容,如果未能解决你的问题,请参考以下文章