损失似乎与 Keras 的学习率成正比

Posted

技术标签:

【中文标题】损失似乎与 Keras 的学习率成正比【英文标题】:Loss seems to be proportional with learning rate in Keras 【发布时间】:2021-11-27 19:00:37 【问题描述】:

我正在 Keras 的 MNIST 数据集上训练一个具有单个 Dense 层的简单神经网络。

这是代码:

model = Sequential()
model.add(Input(shape=(28, 28)))
model.add(Flatten())
model.add(Dense(10, activation='sigmoid'))

model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

history = model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=32, epochs=10)

这是学习率为0.01时的输出:

Epoch 1/10
1875/1875 [==============================] - 2s 946us/step - loss: 315.4696 - accuracy: 0.8432 - val_loss: 195.9139 - val_accuracy: 0.8957
Epoch 2/10
1875/1875 [==============================] - 2s 877us/step - loss: 263.0978 - accuracy: 0.8674 - val_loss: 233.7138 - val_accuracy: 0.8782
Epoch 3/10
1875/1875 [==============================] - 2s 889us/step - loss: 251.8907 - accuracy: 0.8730 - val_loss: 208.0299 - val_accuracy: 0.8906
Epoch 4/10
1875/1875 [==============================] - 2s 882us/step - loss: 246.9039 - accuracy: 0.8754 - val_loss: 229.8979 - val_accuracy: 0.8937
Epoch 5/10
1875/1875 [==============================] - 2s 876us/step - loss: 234.6116 - accuracy: 0.8786 - val_loss: 263.7991 - val_accuracy: 0.8682
Epoch 6/10
1875/1875 [==============================] - 2s 942us/step - loss: 239.2780 - accuracy: 0.8781 - val_loss: 217.1707 - val_accuracy: 0.8892
Epoch 7/10
1875/1875 [==============================] - 2s 943us/step - loss: 235.9433 - accuracy: 0.8805 - val_loss: 233.0448 - val_accuracy: 0.8926
Epoch 8/10
1875/1875 [==============================] - 2s 941us/step - loss: 237.9058 - accuracy: 0.8812 - val_loss: 229.1561 - val_accuracy: 0.8912
Epoch 9/10
1875/1875 [==============================] - 2s 888us/step - loss: 235.2525 - accuracy: 0.8826 - val_loss: 318.9307 - val_accuracy: 0.8683
Epoch 10/10
1875/1875 [==============================] - 2s 885us/step - loss: 238.1098 - accuracy: 0.8810 - val_loss: 275.0455 - val_accuracy: 0.8809

这是0.03时的输出,所有其他超参数都是固定的:

Epoch 1/10
1875/1875 [==============================] - 2s 1ms/step - loss: 931.7540 - accuracy: 0.8417 - val_loss: 618.5505 - val_accuracy: 0.8952
Epoch 2/10
1875/1875 [==============================] - 2s 945us/step - loss: 767.9313 - accuracy: 0.8701 - val_loss: 618.2877 - val_accuracy: 0.8940
Epoch 3/10
1875/1875 [==============================] - 2s 892us/step - loss: 756.3298 - accuracy: 0.8730 - val_loss: 847.1705 - val_accuracy: 0.8582
Epoch 4/10
1875/1875 [==============================] - 2s 956us/step - loss: 739.8559 - accuracy: 0.8748 - val_loss: 687.9159 - val_accuracy: 0.8901
Epoch 5/10
1875/1875 [==============================] - 2s 888us/step - loss: 731.3071 - accuracy: 0.8760 - val_loss: 693.1130 - val_accuracy: 0.8942
Epoch 6/10
1875/1875 [==============================] - 2s 877us/step - loss: 728.4488 - accuracy: 0.8787 - val_loss: 685.3834 - val_accuracy: 0.8841
Epoch 7/10
1875/1875 [==============================] - 2s 878us/step - loss: 712.8240 - accuracy: 0.8798 - val_loss: 640.9078 - val_accuracy: 0.8972
Epoch 8/10
1875/1875 [==============================] - 2s 890us/step - loss: 693.1299 - accuracy: 0.8811 - val_loss: 657.0080 - val_accuracy: 0.8902
Epoch 9/10
1875/1875 [==============================] - 2s 884us/step - loss: 700.5771 - accuracy: 0.8803 - val_loss: 739.0408 - val_accuracy: 0.8871
Epoch 10/10
1875/1875 [==============================] - 2s 897us/step - loss: 696.2348 - accuracy: 0.8833 - val_loss: 785.1879 - val_accuracy: 0.8762

我尝试了多次,所以这不是随机的。我尝试使用 RMSprop 也得到了相同的结果。

据我了解,损失的减少应该与学习率成正比而不是损失本身。

这是否与 Keras 如何以某种方式计算损失函数有关?

【问题讨论】:

您的损失可能超出all over the place,因为它太高了 有时 lr 太高会发散,因为在梯度下降中,太高的步长会使你到达一个梯度更高的点,从而产生一个正反馈循环。 【参考方案1】:

您的代码有两个问题:

    learning_rate。它肯定太高了,因此出现了分歧。 M.Chak 的观察非常好,当你到达一个具有更高梯度的点时,你会创建一个正反馈外观,因此观察 loss * k = learning_rate * k。 您使用具有 10 个类的 sigmoid 进行多类分类。在这种情况下,您必须使用model.add(Dense(10, activation='softmax'))

【讨论】:

以上是关于损失似乎与 Keras 的学习率成正比的主要内容,如果未能解决你的问题,请参考以下文章

使用 Keras 进行深度学习 - 训练时没有学习率

为啥我的学习率会下降,即使损失正在改善?

在训练的过程中降低学习率

Keras深度学习实战——神经网络性能优化技术详解

人工智能--Keras网络训练

使用Tensorflow后端的Keras LSTM RNN中令人费解的训练损失与纪元...行为的任何原因