为啥我的学习率会下降,即使损失正在改善?

Posted

技术标签:

【中文标题】为啥我的学习率会下降,即使损失正在改善?【英文标题】:Why does my learning rate decrease, even when loss is improving?为什么我的学习率会下降,即使损失正在改善? 【发布时间】:2020-10-08 09:31:59 【问题描述】:

我正在 Google Colab TPU 上训练我的 Keras 模型,如下所示 -

adam = Adam(lr=0.002)
model.compile(loss='mse', metrics=[PSNRLoss, SSIMLoss], optimizer=adam)  

checkpoint = ModelCheckpoint("model_epoch:02d.hdf5", monitor='loss', verbose=1, save_best_only=True,
                              mode='min')
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5,
                          patience=5, min_lr=0.00002)
csv_logger = CSVLogger('history.log')

callbacks_list = [checkpoint,reduce_lr,csv_logger]

model.fit(traindb, batch_size=1024,
            callbacks=callbacks_list,shuffle=True,epochs=100, verbose=2, validation_data = validdb)

在训练期间,我的学习率降低了 0.5 倍,即使损失随着当前学习率的值而改善。正如您在下面的 sn-p 中看到的,学习率从 0.0020 下降到 0.0010 到 0.0005。

Epoch 00011: loss improved from 0.00647 to 0.00646, saving model to ./models_x4/no_noise/dcscn_x2_11.hdf5
1939/1939 - 109s - PSNRLoss: 23.7280 - loss: 0.0065 - SSIMLoss: 0.3329 - val_PSNRLoss: 23.9022 - val_loss: 0.0066 - val_SSIMLoss: 0.3815 - lr: 0.0020
Epoch 12/100

Epoch 00012: loss improved from 0.00646 to 0.00645, saving model to ./models_x4/no_noise/dcscn_x2_12.hdf5
1939/1939 - 111s - PSNRLoss: 23.7245 - loss: 0.0065 - SSIMLoss: 0.3331 - val_PSNRLoss: 23.9397 - val_loss: 0.0066 - val_SSIMLoss: 0.3705 - lr: 0.0020
Epoch 13/100

Epoch 00013: loss improved from 0.00645 to 0.00644, saving model to ./models_x4/no_noise/dcscn_x2_13.hdf5
1939/1939 - 110s - PSNRLoss: 23.7300 - loss: 0.0064 - SSIMLoss: 0.3321 - val_PSNRLoss: 23.9827 - val_loss: 0.0065 - val_SSIMLoss: 0.3745 - lr: 0.0020
Epoch 14/100

Epoch 00014: loss improved from 0.00644 to 0.00643, saving model to ./models_x4/no_noise/dcscn_x2_14.hdf5
1939/1939 - 111s - PSNRLoss: 23.7279 - loss: 0.0064 - SSIMLoss: 0.3376 - val_PSNRLoss: 23.9079 - val_loss: 0.0066 - val_SSIMLoss: 0.3959 - lr: 0.0020
Epoch 15/100

Epoch 00015: loss improved from 0.00643 to 0.00634, saving model to ./models_x4/no_noise/dcscn_x2_15.hdf5
1939/1939 - 110s - PSNRLoss: 23.8356 - loss: 0.0063 - SSIMLoss: 0.3408 - val_PSNRLoss: 23.7063 - val_loss: 0.0067 - val_SSIMLoss: 0.3799 - lr: 0.0010
Epoch 16/100

Epoch 00016: loss did not improve from 0.00634
1939/1939 - 107s - PSNRLoss: 23.8173 - loss: 0.0063 - SSIMLoss: 0.3398 - val_PSNRLoss: 23.7282 - val_loss: 0.0067 - val_SSIMLoss: 0.3853 - lr: 0.0010
Epoch 17/100

Epoch 00017: loss did not improve from 0.00634
1939/1939 - 110s - PSNRLoss: 23.8199 - loss: 0.0063 - SSIMLoss: 0.3426 - val_PSNRLoss: 23.7202 - val_loss: 0.0067 - val_SSIMLoss: 0.4082 - lr: 0.0010
Epoch 18/100

Epoch 00018: loss did not improve from 0.00634
1939/1939 - 110s - PSNRLoss: 23.8138 - loss: 0.0063 - SSIMLoss: 0.3393 - val_PSNRLoss: 23.7523 - val_loss: 0.0066 - val_SSIMLoss: 0.4037 - lr: 0.0010
Epoch 19/100

Epoch 00019: loss improved from 0.00634 to 0.00634, saving model to ./models_x4/no_noise/dcscn_x2_19.hdf5
1939/1939 - 110s - PSNRLoss: 23.8189 - loss: 0.0063 - SSIMLoss: 0.3406 - val_PSNRLoss: 23.7188 - val_loss: 0.0067 - val_SSIMLoss: 0.4115 - lr: 0.0010
Epoch 20/100

Epoch 00020: loss improved from 0.00634 to 0.00634, saving model to ./models_x4/no_noise/dcscn_x2_20.hdf5
1939/1939 - 108s - PSNRLoss: 23.8176 - loss: 0.0063 - SSIMLoss: 0.3407 - val_PSNRLoss: 23.7692 - val_loss: 0.0066 - val_SSIMLoss: 0.3883 - lr: 0.0010
Epoch 21/100

Epoch 00021: loss improved from 0.00634 to 0.00627, saving model to ./models_x4/no_noise/dcscn_x2_21.hdf5
1939/1939 - 108s - PSNRLoss: 23.8889 - loss: 0.0063 - SSIMLoss: 0.3478 - val_PSNRLoss: 24.0306 - val_loss: 0.0064 - val_SSIMLoss: 0.3544 - lr: 5.0000e-04
Epoch 22/100

Epoch 00022: loss improved from 0.00627 to 0.00627, saving model to ./models_x4/no_noise/dcscn_x2_22.hdf5
1939/1939 - 109s - PSNRLoss: 23.8847 - loss: 0.0063 - SSIMLoss: 0.3466 - val_PSNRLoss: 24.0461 - val_loss: 0.0064 - val_SSIMLoss: 0.3679 - lr: 5.0000e-04

谢谢你的期待:) 请建议我哪里出错了?我应该监控其他一些合适的值吗?

【问题讨论】:

【参考方案1】:

ReduceLROnPlateau 对象有一个名为min_delta 的参数,它是衡量新最优值的阈值。 min_delta 的默认值为0.0001。因此,尽管您的日志输出表明损失有所改善,但如果它小于min_delta,则可以避免这种改善。因此,在patience epochs 之后,学习率会降低。

【讨论】:

以上是关于为啥我的学习率会下降,即使损失正在改善?的主要内容,如果未能解决你的问题,请参考以下文章

为啥我的 XOR tensorflow 网络没有学习?

为啥adam不需要太大的学习率

经过一些时代迁移学习后,验证损失增加

GAN - 生成器损失减少,但鉴别器假损失在初始下降后增加,为啥?

机器学习大牛是如何选择回归损失函数的?

深度学习训练集的损失曲线收敛很快而且验证集不下降是啥原因导致的?