使用 Keras 注意力在 sequence2sequence 模型中连接层形状误差

Posted 2023-03-12

技术标签:

【中文标题】使用 Keras 注意力在 sequence2sequence 模型中连接层形状误差【英文标题】：Concatenate layer shape error in sequence2sequence model with Keras attention 【发布时间】：2021-12-10 17:26:30 【问题描述】：

我正在尝试在 Colab 中使用 Keras 实现一个简单的单词级序列到序列模型。我正在使用 Keras Attention 层。这是模型的定义：

embedding_size=200
UNITS=128

encoder_inputs = Input(shape=(None,), name="encoder_inputs")

encoder_embs=Embedding(num_encoder_tokens, embedding_size, name="encoder_embs")(encoder_inputs)

#encoder lstm
encoder = LSTM(UNITS, return_state=True, name="encoder_LSTM") #(encoder_embs)
encoder_outputs, state_h, state_c = encoder(encoder_embs)

encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,), name="decoder_inputs")
decoder_embs = Embedding(num_decoder_tokens, embedding_size, name="decoder_embs")(decoder_inputs)

#decoder lstm
decoder_lstm = LSTM(UNITS, return_sequences=True, return_state=True, name="decoder_LSTM")
decoder_outputs, _, _ = decoder_lstm(decoder_embs, initial_state=encoder_states)

attention=Attention(name="attention_layer")
attention_out=attention([encoder_outputs, decoder_outputs])

decoder_concatenate=Concatenate(axis=-1, name="concat_layer")([decoder_outputs, attention_out])
decoder_outputs = TimeDistributed(Dense(units=num_decoder_tokens, 
                                  activation='softmax', name="decoder_denseoutput"))(decoder_concatenate)

model=Model([encoder_inputs, decoder_inputs], decoder_outputs, name="s2s_model")
model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

模型编译很好，没有任何问题。编码器和解码器的输入输出形状为：

Encoder training input shape:  (4000, 21)
Decoder training input shape:  (4000, 12)
Decoder training target shape:  (4000, 12, 3106)
--
Encoder test input shape:  (385, 21)

这是 model.fit 代码：

model.fit([encoder_training_input, decoder_training_input], decoder_training_target,
      epochs=100,
      batch_size=32,
      validation_split=0.2,)

当我运行 fit 阶段时，我从 Concatenate 层收到此错误：

ValueError: Dimension 1 in both shapes must be equal, but are 12 and 32. 
Shapes are [32,12] and [32,32]. for 'node s2s_model/concat_layer/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](s2s_model/decoder_LSTM/PartitionedCall:1,
s2s_model/attention_layer/MatMul_1, s2s_model/concat_layer/concat/axis)' with input shapes: [32,12,128], [32,32,128], [] and with computed input tensors: input[2] = <2>.

所以，前 32 个是 batch_size，128 个是来自 decoder_outputs 和 attention_out 的输出形状，12 是解码器输入的令牌数。我无法理解如何解决这个错误，我无法更改我认为的输入令牌的数量，对我有什么建议吗？

【问题讨论】：

【参考方案1】：

将串联层中的axis=-1 替换为axis=1。 this documentation 中的示例应该说明原因。

您的问题在于传递给串联的输入。您需要指定右轴来连接两个不同形状的矩阵或张量，因为它们在 Tensorflow 中被调用。形状 [32, 12, 128] 和 [32, 32, 128] 在通过 1 引用的第二个维度上有所不同（因为维度从 0 开始向上）。这将产生一个形状 [32, (12+32), 128]，增加第二维中的元素。

当您将轴指定为 -1（默认值）时，您的连接层在使用前基本上会展平输入，在您的情况下，由于尺寸不同，这不起作用。

【讨论】：

链接有时会被破坏，最好也留下额外的解释谢谢指出，我会整理的。谢谢你的回答，我试过了，现在出现这个错误 ValueError: Shapes (32, 12, 3106) and (32, 44, 3106) are incompatible 将轴设置回-1，并更改您将decoder_output和encoder_output输入注意层的顺序。推理：检查this。查询形状[batch_size, Tq, dim] 和输出形状[batch_size, Tq, dim] 对齐。注意力权重因此被正确地映射到解码器的以下层。【参考方案2】：

感谢@Majitsima 解决了这个问题。我将输入交换到注意力层，所以不是

attention=Attention(name="attention_layer")
attention_out=attention([encoder_outputs, decoder_outputs])

输入是

attention=Attention(name="attention_layer")
attention_out=attention([decoder_outputs, encoder_outputs])

与

decoder_concatenate=Concatenate(axis=-1, name="concat_layer")([decoder_outputs, attention_out])

现在似乎一切正常，再次感谢@Majitsima，希望对您有所帮助！

【讨论】：

以上是关于使用 Keras 注意力在 sequence2sequence 模型中连接层形状误差的主要内容，如果未能解决你的问题，请参考以下文章