验证损失不断减少,而训练损失在 3 个 epoch 后开始增加
Posted
技术标签:
【中文标题】验证损失不断减少,而训练损失在 3 个 epoch 后开始增加【英文标题】:Validation loss is keep decreasing while training loss starts to increase after 3 epochs 【发布时间】:2021-04-13 22:48:00 【问题描述】:我正在使用经过训练的 Word2Vec 训练 LSTM 模型,经过 3 个 epoch 后,我开始观察到我的训练损失开始增加,而验证损失仍然在不断减少。准确性也是如此。训练准确率开始下降,验证准确率不断提高。这是用于比较的数字以及我的模型参数。
我的学习率默认设置为 0.001,当训练损失开始增加时,我无法决定是继续训练还是停止训练。
提前致谢。
model = Sequential()
#model.add(Embedding(maximum_words_number, e_dim, input_length=X.shape[1]))
model.add(Embedding(58137, 100, weights = [embeddings] ,input_length=X_train.shape[1],trainable = False)) # -> This adds Word2Vec encodings
model.add(LSTM(10,return_sequences= True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(10,return_sequences= False, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
#opt = SGD(lr=0.05)
model.compile(loss='binary_crossentropy', optimizer="Nadam", metrics=['accuracy'])
epochs = 4
batch_size = 100
model_outcome = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss', patience=1, min_delta=0.0001)])
Train on 3931 samples, validate on 983 samples
Epoch 1/4
3931/3931 [==============================] - ETA: 2:56:26 - loss: 0.6879 - accuracy: 0.580 - ETA: 2:46:13 - loss: 0.6891 - accuracy: 0.530 - ETA: 2:34:51 - loss: 0.6845 - accuracy: 0.556 - ETA: 2:26:49 - loss: 0.6820 - accuracy: 0.570 - ETA: 2:21:09 - loss: 0.6846 - accuracy: 0.550 - ETA: 2:15:42 - loss: 0.6823 - accuracy: 0.573 - ETA: 2:10:58 - loss: 0.6822 - accuracy: 0.565 - ETA: 2:06:02 - loss: 0.6854 - accuracy: 0.547 - ETA: 2:01:00 - loss: 0.6850 - accuracy: 0.558 - ETA: 1:56:26 - loss: 0.6833 - accuracy: 0.563 - ETA: 1:53:31 - loss: 0.6820 - accuracy: 0.573 - ETA: 1:50:38 - loss: 0.6817 - accuracy: 0.574 - ETA: 1:47:40 - loss: 0.6815 - accuracy: 0.573 - ETA: 1:44:31 - loss: 0.6802 - accuracy: 0.582 - ETA: 1:41:28 - loss: 0.6782 - accuracy: 0.585 - ETA: 1:38:09 - loss: 0.6779 - accuracy: 0.581 - ETA: 1:34:40 - loss: 0.6769 - accuracy: 0.580 - ETA: 1:30:53 - loss: 0.6768 - accuracy: 0.580 - ETA: 1:26:56 - loss: 0.6754 - accuracy: 0.584 - ETA: 1:22:56 - loss: 0.6739 - accuracy: 0.587 - ETA: 1:18:52 - loss: 0.6723 - accuracy: 0.590 - ETA: 1:14:51 - loss: 0.6703 - accuracy: 0.592 - ETA: 1:10:43 - loss: 0.6680 - accuracy: 0.597 - ETA: 1:06:38 - loss: 0.6648 - accuracy: 0.606 - ETA: 1:02:26 - loss: 0.6616 - accuracy: 0.611 - ETA: 58:10 - loss: 0.6594 - accuracy: 0.6142 - ETA: 53:56 - loss: 0.6580 - accuracy: 0.615 - ETA: 49:37 - loss: 0.6572 - accuracy: 0.616 - ETA: 45:18 - loss: 0.6553 - accuracy: 0.618 - ETA: 40:57 - loss: 0.6545 - accuracy: 0.619 - ETA: 36:36 - loss: 0.6527 - accuracy: 0.622 - ETA: 32:15 - loss: 0.6493 - accuracy: 0.626 - ETA: 27:52 - loss: 0.6478 - accuracy: 0.628 - ETA: 23:29 - loss: 0.6455 - accuracy: 0.630 - ETA: 19:06 - loss: 0.6424 - accuracy: 0.634 - ETA: 14:41 - loss: 0.6396 - accuracy: 0.637 - ETA: 10:15 - loss: 0.6378 - accuracy: 0.640 - ETA: 5:49 - loss: 0.6354 - accuracy: 0.643 - ETA: 1:22 - loss: 0.6335 - accuracy: 0.64 - 10937s 3s/step - loss: 0.6331 - accuracy: 0.6459 - val_loss: 0.5066 - val_accuracy: 0.7792
Epoch 2/4
3931/3931 [==============================] - ETA: 3:03:31 - loss: 0.6418 - accuracy: 0.660 - ETA: 2:57:39 - loss: 0.5754 - accuracy: 0.710 - ETA: 2:50:26 - loss: 0.5706 - accuracy: 0.716 - ETA: 2:55:55 - loss: 0.5607 - accuracy: 0.720 - ETA: 2:55:39 - loss: 0.5552 - accuracy: 0.718 - ETA: 2:55:12 - loss: 0.5473 - accuracy: 0.731 - ETA: 2:52:50 - loss: 0.5440 - accuracy: 0.737 - ETA: 2:49:19 - loss: 0.5391 - accuracy: 0.740 - ETA: 2:45:24 - loss: 0.5380 - accuracy: 0.740 - ETA: 2:41:00 - loss: 0.5361 - accuracy: 0.740 - ETA: 2:36:48 - loss: 0.5414 - accuracy: 0.734 - ETA: 2:32:57 - loss: 0.5357 - accuracy: 0.738 - ETA: 2:28:34 - loss: 0.5292 - accuracy: 0.743 - ETA: 2:24:22 - loss: 0.5240 - accuracy: 0.747 - ETA: 2:19:52 - loss: 0.5230 - accuracy: 0.750 - ETA: 2:14:57 - loss: 0.5157 - accuracy: 0.757 - ETA: 2:09:42 - loss: 0.5118 - accuracy: 0.761 - ETA: 2:04:24 - loss: 0.5154 - accuracy: 0.758 - ETA: 1:59:06 - loss: 0.5126 - accuracy: 0.760 - ETA: 1:53:46 - loss: 0.5107 - accuracy: 0.760 - ETA: 1:48:16 - loss: 0.5062 - accuracy: 0.763 - ETA: 1:42:45 - loss: 0.5032 - accuracy: 0.766 - ETA: 1:37:09 - loss: 0.5041 - accuracy: 0.767 - ETA: 1:31:22 - loss: 0.5045 - accuracy: 0.766 - ETA: 1:25:30 - loss: 0.5072 - accuracy: 0.764 - ETA: 1:19:45 - loss: 0.5071 - accuracy: 0.764 - ETA: 1:13:57 - loss: 0.5094 - accuracy: 0.763 - ETA: 1:08:07 - loss: 0.5124 - accuracy: 0.763 - ETA: 1:02:15 - loss: 0.5103 - accuracy: 0.764 - ETA: 56:19 - loss: 0.5101 - accuracy: 0.7630 - ETA: 50:20 - loss: 0.5058 - accuracy: 0.766 - ETA: 44:19 - loss: 0.5052 - accuracy: 0.767 - ETA: 38:19 - loss: 0.5063 - accuracy: 0.766 - ETA: 32:18 - loss: 0.5037 - accuracy: 0.768 - ETA: 26:15 - loss: 0.5041 - accuracy: 0.768 - ETA: 20:11 - loss: 0.5054 - accuracy: 0.766 - ETA: 14:06 - loss: 0.5068 - accuracy: 0.765 - ETA: 8:00 - loss: 0.5024 - accuracy: 0.769 - ETA: 1:53 - loss: 0.5026 - accuracy: 0.76 - 14951s 4s/step - loss: 0.5024 - accuracy: 0.7698 - val_loss: 0.4381 - val_accuracy: 0.8006
Epoch 3/4
3931/3931 [==============================] - ETA: 4:10:44 - loss: 0.5040 - accuracy: 0.750 - ETA: 3:44:47 - loss: 0.4679 - accuracy: 0.780 - ETA: 3:34:11 - loss: 0.4734 - accuracy: 0.780 - ETA: 3:26:02 - loss: 0.4729 - accuracy: 0.785 - ETA: 3:16:47 - loss: 0.4638 - accuracy: 0.784 - ETA: 3:07:57 - loss: 0.4527 - accuracy: 0.796 - ETA: 3:01:40 - loss: 0.4502 - accuracy: 0.800 - ETA: 2:56:22 - loss: 0.4458 - accuracy: 0.803 - ETA: 2:50:30 - loss: 0.4472 - accuracy: 0.801 - ETA: 2:43:48 - loss: 0.4488 - accuracy: 0.797 - ETA: 2:37:21 - loss: 0.4466 - accuracy: 0.802 - ETA: 2:31:07 - loss: 0.4468 - accuracy: 0.803 - ETA: 2:24:57 - loss: 0.4453 - accuracy: 0.806 - ETA: 2:20:04 - loss: 0.4439 - accuracy: 0.810 - ETA: 2:14:58 - loss: 0.4447 - accuracy: 0.811 - ETA: 2:09:36 - loss: 0.4401 - accuracy: 0.814 - ETA: 2:03:28 - loss: 0.4381 - accuracy: 0.816 - ETA: 1:57:37 - loss: 0.4413 - accuracy: 0.813 - ETA: 1:51:48 - loss: 0.4410 - accuracy: 0.814 - ETA: 1:45:59 - loss: 0.4432 - accuracy: 0.812 - ETA: 1:40:19 - loss: 0.4404 - accuracy: 0.814 - ETA: 1:34:33 - loss: 0.4363 - accuracy: 0.817 - ETA: 1:28:51 - loss: 0.4360 - accuracy: 0.817 - ETA: 1:23:12 - loss: 0.4363 - accuracy: 0.816 - ETA: 1:17:37 - loss: 0.4371 - accuracy: 0.816 - ETA: 1:12:05 - loss: 0.4403 - accuracy: 0.817 - ETA: 1:06:31 - loss: 0.4411 - accuracy: 0.816 - ETA: 1:01:01 - loss: 0.4389 - accuracy: 0.817 - ETA: 55:32 - loss: 0.4387 - accuracy: 0.8176 - ETA: 50:05 - loss: 0.4385 - accuracy: 0.817 - ETA: 44:38 - loss: 0.4381 - accuracy: 0.818 - ETA: 39:13 - loss: 0.4329 - accuracy: 0.821 - ETA: 33:48 - loss: 0.4352 - accuracy: 0.819 - ETA: 28:25 - loss: 0.4331 - accuracy: 0.821 - ETA: 23:02 - loss: 0.4344 - accuracy: 0.820 - ETA: 17:40 - loss: 0.4377 - accuracy: 0.818 - ETA: 12:19 - loss: 0.4355 - accuracy: 0.820 - ETA: 6:58 - loss: 0.4353 - accuracy: 0.820 - ETA: 1:39 - loss: 0.4378 - accuracy: 0.82 - 12997s 3s/step - loss: 0.4374 - accuracy: 0.8204 - val_loss: 0.4065 - val_accuracy: 0.8769
Epoch 4/4
3931/3931 [==============================] - ETA: 3:19:12 - loss: 0.4999 - accuracy: 0.810 - ETA: 3:13:36 - loss: 0.4518 - accuracy: 0.825 - ETA: 3:08:18 - loss: 0.4464 - accuracy: 0.826 - ETA: 3:03:24 - loss: 0.4385 - accuracy: 0.825 - ETA: 2:58:52 - loss: 0.4385 - accuracy: 0.826 - ETA: 2:53:35 - loss: 0.4339 - accuracy: 0.825 - ETA: 2:48:13 - loss: 0.4662 - accuracy: 0.811 - ETA: 2:43:02 - loss: 0.4660 - accuracy: 0.811 - ETA: 2:37:49 - loss: 0.4609 - accuracy: 0.815 - ETA: 2:32:42 - loss: 0.4638 - accuracy: 0.816 - ETA: 2:27:37 - loss: 0.4694 - accuracy: 0.813 - ETA: 2:22:25 - loss: 0.4592 - accuracy: 0.818 - ETA: 2:17:16 - loss: 0.4590 - accuracy: 0.819 - ETA: 2:12:02 - loss: 0.4574 - accuracy: 0.820 - ETA: 2:06:47 - loss: 0.4532 - accuracy: 0.822 - ETA: 2:01:35 - loss: 0.4654 - accuracy: 0.816 - ETA: 1:56:20 - loss: 0.4732 - accuracy: 0.812 - ETA: 1:51:06 - loss: 0.4764 - accuracy: 0.811 - ETA: 1:45:54 - loss: 0.4862 - accuracy: 0.805 - ETA: 1:40:41 - loss: 0.4912 - accuracy: 0.803 - ETA: 1:35:29 - loss: 0.4930 - accuracy: 0.801 - ETA: 1:30:17 - loss: 0.4986 - accuracy: 0.797 - ETA: 1:25:03 - loss: 0.5044 - accuracy: 0.793 - ETA: 1:19:50 - loss: 0.5032 - accuracy: 0.792 - ETA: 1:14:37 - loss: 0.4999 - accuracy: 0.794 - ETA: 1:09:24 - loss: 0.4958 - accuracy: 0.796 - ETA: 1:04:11 - loss: 0.4954 - accuracy: 0.795 - ETA: 58:59 - loss: 0.4943 - accuracy: 0.7971 - ETA: 53:45 - loss: 0.4943 - accuracy: 0.796 - ETA: 48:33 - loss: 0.4902 - accuracy: 0.799 - ETA: 43:20 - loss: 0.4883 - accuracy: 0.799 - ETA: 38:07 - loss: 0.4882 - accuracy: 0.799 - ETA: 32:55 - loss: 0.4874 - accuracy: 0.800 - ETA: 27:42 - loss: 0.4839 - accuracy: 0.802 - ETA: 22:29 - loss: 0.4809 - accuracy: 0.804 - ETA: 17:16 - loss: 0.4825 - accuracy: 0.803 - ETA: 12:03 - loss: 0.4821 - accuracy: 0.803 - ETA: 6:50 - loss: 0.4810 - accuracy: 0.804 - ETA: 1:37 - loss: 0.4816 - accuracy: 0.80 - 12786s 3s/step - loss: 0.4823 - accuracy: 0.8031 - val_loss: 0.3392 - val_accuracy: 0.8911
【问题讨论】:
您可以尝试使用 32 的 batch_size 吗? 好的,你能解释一下你的理由吗? 在实践中,当使用更大的批次时,模型的质量会显着下降,这是通过其泛化能力来衡量的。但通常我们会尝试以 32 的 batch_size 作为起点。这背后没有真正的推理,但它运作良好。 我明白了,谢谢你的解释。我会尝试一下,看看我的结果是否会改变。 This post discusses the effects of batch size in greater detail. 【参考方案1】:我认为你的模型太小了,在第三个时期过拟合了。
【讨论】:
我的模型有 10k 个实例和 45k 个特征。我不认为它太小了。 @FatihEnes 两个 10 个单元的 LSTM 应该很小。请在您的问题中添加model.summary()
感谢您的建议,但是当我尝试增加我的 lstm 单位时,我遇到了内存不足错误并且找不到解决问题的方法,您有什么建议吗?我想到的第一件事是减少功能的数量,但我不想减少它。另外,我会在运行模型后添加 model.summary()。
@FatihEnes 你可以减少batch_size。你有多少内存?你的模型看起来不大
我有大约 16gb 的内存,是的,你是对的 batch_size。对于这种情况,您对单位和层数有何建议?我正在做一个有两个标签类的 nlp 分类任务。以上是关于验证损失不断减少,而训练损失在 3 个 epoch 后开始增加的主要内容,如果未能解决你的问题,请参考以下文章
绘制学习模型的训练损失和验证损失图形绘制训练精度和验证精度图形