注意力机制/Tensorflow 教程

Posted

技术标签:

【中文标题】注意力机制/Tensorflow 教程【英文标题】:Attention Mechanism / Tensorflow Tutorials 【发布时间】:2020-10-08 00:35:18 【问题描述】:

我正在尝试改进我的注意力机制代码草案,其中我基本上对解码器步骤进行了迭代,并且 LSTM 解码器单元在每个步骤从注意力模块获取上下文向量:

post_activation_LSTM_cell = layers.LSTM(n_s, return_state = True)
output_layer = Dense(1)

s0 = Input(shape=(n_s,), name='s0')
c0 = Input(shape=(n_s,), name='c0')
s = s0
c = c0


outputs = []

input_tensor = Input(shape=(past_period,raw_dataset.shape[-1])) 

h = Bidirectional(LSTM(n_a, return_sequences = True))(input_tensor)

for t in range(preview_period):

    context = one_step_attention(h,s)

    s, _, c = post_activation_LSTM_cell(context,initial_state = [s, c])

    out = output_layer(s)

    outputs.append(out)





model=Model([input_tensor,s0,c0],outputs)
model.summary()  

我发现 tensorflow 教程中的实现要干净得多,但我看不到解码器如何在每个输出步骤从 bahdanau 获得不同的上下文向量,看起来解码器只获得一个上下文向量,什么我失踪了吗???

https://www.tensorflow.org/tutorials/text/nmt_with_attention

class BahdanauAttention(tf.keras.layers.Layer):
    def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)

    def call(self, query, values):
        # query hidden state shape == (batch_size, hidden size)
        # query_with_time_axis shape == (batch_size, 1, hidden size)
        # values shape == (batch_size, max_len, hidden size)
        # we are doing this to broadcast addition along the time axis to calculate the score
        query_with_time_axis = tf.expand_dims(query, 1)

        # score shape == (batch_size, max_length, 1)
        # we get 1 at the last axis because we are applying score to self.V
        # the shape of the tensor before applying self.V is (batch_size, max_length, units)
        score = self.V(tf.nn.tanh(
            self.W1(query_with_time_axis) + self.W2(values)))

        # attention_weights shape == (batch_size, max_length, 1)
        attention_weights = tf.nn.softmax(score, axis=1)

        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)

        return context_vector, attention_weights


class Decoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
        super(Decoder, self).__init__()
        self.batch_sz = batch_sz
        self.dec_units = dec_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.gru = tf.keras.layers.GRU(self.dec_units,
                                       return_sequences=True,
                                       return_state=True,
                                       recurrent_initializer='glorot_uniform')
        self.fc = tf.keras.layers.Dense(vocab_size)

        # used for attention
        self.attention = BahdanauAttention(self.dec_units)

    def call(self, x, hidden, enc_output):
        # enc_output shape == (batch_size, max_length, hidden_size)
        context_vector, attention_weights = self.attention(hidden, enc_output)

        # x shape after passing through embedding == (batch_size, 1, embedding_dim)
        x = self.embedding(x)

        # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

        # passing the concatenated vector to the GRU
        output, state = self.gru(x)

        # output shape == (batch_size * 1, hidden_size)
        output = tf.reshape(output, (-1, output.shape[2]))

        # output shape == (batch_size, vocab)
        x = self.fc(output)

        return x, state, attention_weights



【问题讨论】:

github.com/neqkir/attention-mechanism 【参考方案1】:

你是对的,解码器只得到一个上下文向量。解码器类的call方法只实现了解码器的一个步骤。

本教程进一步介绍了在训练时对目标句子进行循环迭代,在推理时使用另一个循环进行采样。

【讨论】:

感谢@Jindřich,for t in range(1, targ.shape[1]): # passing enc_output to the decoder predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output) loss += loss_function(targ[:, t], predictions) # using teacher forcing dec_input = tf.expand_dims(targ[:, t], 1) 解码器基本上在每一步都作为 RNN 单元执行,非常感谢

以上是关于注意力机制/Tensorflow 教程的主要内容,如果未能解决你的问题,请参考以下文章

深度学习 CNN中的混合域注意力机制(DANet,CBAM),附Tensorflow完整代码

如何使用注意力机制在多层双向中操纵编码器状态

序列分类的注意力机制(seq2seq tensorflow r1.1)

注意力机制 SE-Net 原理与 TensorFlow2.0 实现

NLP教程 - 神经机器翻译seq2seq与注意力机制

Pytorch系列教程-使用Seq2Seq网络和注意力机制进行机器翻译