如何在 Encoder-Decoder Seq2Seq 模型中添加 Dropout

Posted

技术标签:

【中文标题】如何在 Encoder-Decoder Seq2Seq 模型中添加 Dropout【英文标题】:How to add Dropout in Encoder-Decoder Seq2Seq model 【发布时间】:2021-06-06 01:46:49 【问题描述】:

我正在尝试使用编码器-解码器模型进行语言翻译,但 val_acc 会波动,不会超过 16%。因此,我决定添加 Dropout 以避免过度拟合,但我无法这样做。

请帮助我在我的代码中添加 dropout,如下所示:

# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb =  Embedding(num_encoder_tokens +1, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]


# Decoder
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens +1, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)

decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

【问题讨论】:

【参考方案1】:

什么是训练准确率?我假设您的训练准确率很高(> 80%),因为您说该模型过度拟合。

现在如果是这种情况,即模型确实过拟合,您可以在多个级别添加 dropout,

预致密层
decoder_outputs, _, _ = decoder_lstm(dec_emb,
                                     initial_state=encoder_states)

dropout = Dropout(rate=0.5)
decoder_outputs = dropout(decoder_outputs)

decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
编码器和解码器 LSTM 中的丢失。检查 https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM 中的 dropoutrecurrent_dropout 参数 嵌入层丢弃

要选择在何处添加 dropout,您需要找出模型过度拟合的原因。训练样本的数量是否更少?词汇量是否太小?模型对所有输入的学习行为是否一致?

希望这会有所帮助。一切顺利。

【讨论】:

以上是关于如何在 Encoder-Decoder Seq2Seq 模型中添加 Dropout的主要内容,如果未能解决你的问题,请参考以下文章

基于encoder-decoder和DCGAN的轨迹压缩研究

解码后的Encoder-Decoder噪声问题

神经网络机器翻译Neural Machine Translation: Encoder-Decoder Architecture

神经网络机器翻译Neural Machine Translation: Encoder-Decoder Architecture

encoder-decoder环境部署问题

Constrained Image Splicing Detection and Localization With Attention-Aware Encoder-Decoder and Atrou