LSTM 分类问题 (Keras) - 奇怪的结果
Posted
技术标签:
【中文标题】LSTM 分类问题 (Keras) - 奇怪的结果【英文标题】:LSTM classification problem (Keras) - weird result 【发布时间】:2021-08-03 09:51:03 【问题描述】:我学习网络以 3 个类作为输出的分类问题。由于 Input 有一系列浮点数(典型的历史滞后是 50)
x_test shape: (1663, 7, 1)
[[[17.749]
[18.366]
[17.898]
...
[25.287]
[25.128]
[24.596]]]
y_train shape: (3879, 3)
y_test shape: (1663, 3)
[[1. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]
...
[0. 0. 1.]
[0. 0. 1.]
[0. 0. 1.]]
所以 y 是“热编码”
型号:
batch_size_to_train = 32
history_lag = 7
epoch_to_train = 100
model = Sequential()
opt = Adam(lr=0.001)
model.add(LSTM(units=history_lag, return_sequences=True, input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(Dropout(0.02))
model.add(LSTM(units = history_lag, return_sequences = True))
model.add(Dropout(0.02))
model.add(LSTM(units = history_lag, return_sequences = True))
model.add(Dropout(0.02))
model.add(LSTM(units = history_lag))
model.add(Dropout(0.02))
model.add(Dense(units=3, activation='softmax'))
model.summary()
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit( x_train
, y_train
, epochs = epoch_to_train
, batch_size = batch_size_to_train
, validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, batch_size=batch_size_to_train, verbose=1)
predicted = model.predict(x_test, verbose=1)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 7, 7) 252
_________________________________________________________________
dropout_1 (Dropout) (None, 7, 7) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 7, 7) 420
_________________________________________________________________
dropout_2 (Dropout) (None, 7, 7) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 7, 7) 420
_________________________________________________________________
dropout_3 (Dropout) (None, 7, 7) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 7) 420
_________________________________________________________________
dropout_4 (Dropout) (None, 7) 0
_________________________________________________________________
dense_1 (Dense) (None, 3) 24
=================================================================
Total params: 1,536
Trainable params: 1,536
Non-trainable params: 0
_________________________________________________________________
Train on 3879 samples, validate on 1663 samples
Epoch 1/100
3879/3879 [==============================] - 0s - loss: 1.0640 - acc: 0.4029 - val_loss: 1.0858 - val_acc: 0.4011
Epoch 2/100
3879/3879 [==============================] - 0s - loss: 1.0607 - acc: 0.3942 - val_loss: 1.0877 - val_acc: 0.4011
Epoch 3/100
...
Epoch 100/100
3879/3879 [==============================] - 0s - loss: 1.0543 - acc: 0.4112 - val_loss: 1.0880 - val_acc: 0.4011
1376/1663 [=======================>......] - ETA: 0s
Model evaluation for T Score=[1.0879698376016165, 0.4010823809878337]
Start prediction...
1440/1663 [========================>.....] - ETA: 0
Predicted
(1663, 3)
[[0.374 0.305 0.322]
[0.374 0.305 0.322]
[0.374 0.305 0.322]
...
[0.374 0.305 0.321]
[0.374 0.305 0.321]
[0.374 0.305 0.321]]
Y_test len=1663
Y_test shape=(1663, 3)
Y_test after reshape shape=(1663, 3)
Y_test and predicted result I've saved to csv table and this is the first 20 records
c1 c2 c3 pred_c1 pred_c2 pred_c3
1.0, 0.0, 0.0, 0.3736170828342438,0.3046604096889496,0.32172250747680664
0.0, 0.0, 1.0, 0.3736182451248169,0.30466771125793457,0.32171404361724854
0.0, 0.0, 1.0, 0.3736218512058258,0.304688423871994,0.32168975472450256
0.0, 0.0, 1.0, 0.3736271560192108,0.3047129809856415,0.3216598629951477
1.0, 0.0, 0.0, 0.37367793917655945,0.3045872747898102,0.32173481583595276
0.0, 1.0, 0.0, 0.3737723231315613,0.30433595180511475,0.321891725063324
1.0, 0.0, 0.0, 0.3739013969898224,0.3039909899234772,0.32210761308670044
0.0, 1.0, 0.0, 0.3740204870700836,0.30357825756073,0.3224012553691864
1.0, 0.0, 0.0, 0.37405434250831604,0.30318766832351685,0.3227579891681671
0.0, 0.0, 1.0, 0.37375959753990173,0.3039003908634186,0.3223400115966797
0.0, 1.0, 0.0, 0.37365707755088806,0.304235577583313,0.32210731506347656
0.0, 0.0, 1.0, 0.37363600730895996,0.30434468388557434,0.3220193386077881
0.0, 1.0, 0.0, 0.37363964319229126,0.30440154671669006,0.3219588100910187
1.0, 0.0, 0.0, 0.3736586272716522,0.30439913272857666,0.3219422399997711
0.0, 0.0, 1.0, 0.3736720085144043,0.3044174313545227,0.321910560131073
0.0, 1.0, 0.0, 0.3737075924873352,0.30425700545310974,0.32203540205955505
0.0, 1.0, 0.0, 0.37369728088378906,0.3042283356189728,0.32207438349723816
1.0, 0.0, 0.0, 0.373714417219162,0.304155558347702,0.3221299946308136
0.0, 1.0, 0.0, 0.3737255036830902,0.30413076281547546,0.3221437335014343
1.0, 0.0, 0.0, 0.3737127482891083,0.30416059494018555,0.3221266567707062
0.0, 0.0, 1.0, 0.37368619441986084,0.30419445037841797,0.3221193552017212
问题是 - 我认为我在模型架构上做错了。结果也一样。 val_acc: 0.4011 在学习过程中几乎没有变化。
【问题讨论】:
【参考方案1】:您的模型没有学到任何东西(但不会说模型有问题)。
事实上,给定数据集的“接近完美”平衡,这种情况下的随机预测概率将为 33%。你的是 40%(至少在测试集上不完全平衡),虽然很可能只预测一个类(正如你在测试中看到的那样,完全偏向于第一类)。
您可以尝试使用较小的学习率作为起点(0.0001
可以作为起点)。第一个 epoch 和最后一个 epoch 的结果几乎没有区别。
如果这不起作用,那么增加“滞后”可能会有所帮助(理论上更大的框架可以捕获更多信息)。
如果这仍然不起作用,您可能需要在数据集上工作/准备更多(添加新功能是可能的吗?)。
一般来说,如果上述方法真的根本不起作用,那么是时候退后一步,想想你是否正确地构建了问题。
【讨论】:
非常感谢。我尝试使学习率越来越小,历史滞后但图片学习几乎相同。可能需要其他模型的数据。这很有趣,因为如果我将任务转向回归方法,则数据效果很好。但分类不行。我认为现在采取一个众所周知的分类问题(例如 Iris 分类并运行我的 nn 并检查结果。如果结果很好,如果网络没有问题,那么我的数据有问题。 谢谢。如果我的回答对你有帮助,你能好心点赞吗?我不希望“接受”它,因为不幸的是,较小的 LR 和较大的框架无法解决您的问题。以上是关于LSTM 分类问题 (Keras) - 奇怪的结果的主要内容,如果未能解决你的问题,请参考以下文章
Keras 使用真实值作为训练目标的 LSTM 分类器的官方示例?