如何在 Keras 的两个 LSTM 层之间添加注意力层

Posted 2023-02-16

技术标签:

【中文标题】如何在 Keras 的两个 LSTM 层之间添加注意力层【英文标题】：How to add Attention layer between two LSTM layers in Keras 【发布时间】：2019-05-22 23:04:41 【问题描述】：

我正在尝试在编码器 LSTM（多对多）和解码器 LSTM（多对一）之间添加一个注意力层。

但我的代码似乎只为一个解码器 LSTM 输入创建了注意力层。

如何将注意力层应用于解码器 LSTM 的所有输入？（注意力层的输出 = (None,1440,984)）

这是我模型的注意力层的总结。

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 1440, 5)      0
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 1440, 984)    1960128     input_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 1440, 1)      985         bidirectional_1[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 1440)         0           dense_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 1440)         0           flatten_1[0][0]
__________________________________________________________________________________________________
repeat_vector_1 (RepeatVector)  (None, 984, 1440)    0           activation_1[0][0]
__________________________________________________________________________________________________
permute_1 (Permute)             (None, 1440, 984)    0           repeat_vector_1[0][0]
__________________________________________________________________________________________________
multiply_1 (Multiply)           (None, 1440, 984)    0           bidirectional_1[0][0]
                                                                 permute_1[0][0]
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 984)          0           multiply_1[0][0]
==================================================================================================
Total params: 1,961,113
Trainable params: 1,961,113
Non-trainable params: 0
__________________________________________________________________________________________________

这是我的代码

_input = Input(shape=(self.x_seq_len, self.input_x_shape), dtype='float32')
activations = Bidirectional(LSTM(self.hyper_param['decoder_units'], return_sequences=True), input_shape=(self.x_seq_len, self.input_x_shape,))(_input)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations) 
attention = Flatten()(attention)
attention = Activation('softmax')(attention) 
attention = RepeatVector(self.hyper_param['decoder_units']*2)(attention)
attention = Permute([2, 1])(attention)

sent_representation = Multiply()([activations, attention])
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(self.hyper_param['decoder_units']*2,))(sent_representation)

attn = Model(input=_input, output=sent_representation)
model.add(attn)
#decoder
model.add(LSTM(self.hyper_param['encoder_units'], return_sequences=False, input_shape=(None, self.hyper_param['decoder_units'] * 2 )))

【问题讨论】：

【参考方案1】：

注意力意味着迭代地获取一个解码器输出值（最后一个隐藏状态），然后使用这个“查询”，“关注”所有“值”，这不过是编码器输出的整个列表。

所以 input1 = 前一个时间步的解码器隐藏状态：'key'

input2 = 所有编码器隐藏状态：'value's

输出 = 上下文：所有编码器隐藏状态的加权和

使用上下文，解码器的prev隐藏状态和prev翻译输出生成下一个单词和新的隐藏输出状态，然后再次重复上述过程，直到遇到'EOS'。

您的注意力逻辑本身是完美的（不包括涉及解码器的最后一行）。但是您的其余代码丢失了。如果您可以分享完整的代码，我可以帮助您解决错误。我认为您定义的注意力逻辑没有错误。

更多具体细节请参考https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

【讨论】：

以上是关于如何在 Keras 的两个 LSTM 层之间添加注意力层的主要内容，如果未能解决你的问题，请参考以下文章

Python keras如何将卷积层转换为lstm层后的输入大小