如何使用注意力机制在多层双向中操纵编码器状态

Posted

技术标签:

【中文标题】如何使用注意力机制在多层双向中操纵编码器状态【英文标题】:How to manipulate encoder state in a multi-layer bidirectional with Attention Mechanism 【发布时间】:2019-06-11 10:36:33 【问题描述】:

我正在实现一个具有多层双向 rnn 和注意力机制的 Seq2Seq 模型,在学习本教程 https://github.com/tensorflow/nmt 时,我对如何在双向层之后正确操作 encoder_state 感到困惑。

引用教程“对于多个双向层,我们需要稍微操纵encoder_state,有关更多详细信息,请参见model.py,方法_build_bidirectional_rnn()。”这是代码的相关部分(https://github.com/tensorflow/nmt/blob/master/nmt/model.py 第 770 行):

encoder_outputs, bi_encoder_state = (
            self._build_bidirectional_rnn(
            inputs=self.encoder_emb_inp,
            sequence_length=sequence_length,
            dtype=dtype,
            hparams=hparams,
            num_bi_layers=num_bi_layers,
            num_bi_residual_layers=num_bi_residual_layers))

if num_bi_layers == 1:
   encoder_state = bi_encoder_state
else:
   # alternatively concat forward and backward states
   encoder_state = []
   for layer_id in range(num_bi_layers):
      encoder_state.append(bi_encoder_state[0][layer_id])  # forward
      encoder_state.append(bi_encoder_state[1][layer_id])  # backward
   encoder_state = tuple(encoder_state)

这就是我现在所拥有的:

def get_a_cell(lstm_size):
    lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size)
    #drop = tf.nn.rnn_cell.DropoutWrapper(lstm, 
                       output_keep_prob=keep_prob)
    return lstm


encoder_FW = tf.nn.rnn_cell.MultiRNNCell(
    [get_a_cell(num_units) for _ in range(num_layers)])
encoder_BW = tf.nn.rnn_cell.MultiRNNCell(
    [get_a_cell(num_units) for _ in range(num_layers)])


bi_outputs, bi_encoder_state = tf.nn.bidirectional_dynamic_rnn(
encoder_FW, encoder_BW, encoderInput,
sequence_length=x_lengths, dtype=tf.float32)
encoder_output = tf.concat(bi_outputs, -1)

encoder_state = []

for layer_id in range(num_layers):
    encoder_state.append(bi_encoder_state[0][layer_id])  # forward
    encoder_state.append(bi_encoder_state[1][layer_id])  # backward
encoder_state = tuple(encoder_state)

#DECODER -------------------

decoder_cell = tf.nn.rnn_cell.MultiRNNCell([get_a_cell(num_units) for _ in range(num_layers)])

# Create an attention mechanism
attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units_attention, encoder_output ,memory_sequence_length=x_lengths)

decoder_cell = tf.contrib.seq2seq.AttentionWrapper(
              decoder_cell,attention_mechanism,
              attention_layer_size=num_units_attention)

decoder_initial_state = decoder_cell.zero_state(batch_size,tf.float32)
                        .clone(cell_state=encoder_state)

问题是我收到错误

The two structures don't have the same nested structure.

First structure: type=AttentionWrapperState 
str=AttentionWrapperState(cell_state=(LSTMStateTuple(c=, h=), 
LSTMStateTuple(c=, h=)), attention=, time=, alignments=, alignment_history=
(), attention_state=)

Second structure: type=AttentionWrapperState 
str=AttentionWrapperState(cell_state=(LSTMStateTuple(c=, h=), 
LSTMStateTuple(c=, h=), LSTMStateTuple(c=, h=), LSTMStateTuple(c=, h=)), 
attention=, time=, alignments=, alignment_history=(), attention_state=)

这对我来说有点道理,因为我们没有包括所有输出层,但(我猜)只包括最后一层。而对于状态,我们实际上是连接所有层。

正如我所料,当只连接最后一层状态时,如下所示:

encoder_state = []
encoder_state.append(bi_encoder_state[0][num_layers-1])  # forward
encoder_state.append(bi_encoder_state[1][num_layers-1])  # backward
encoder_state = tuple(encoder_state)

它运行没有错误。

据我所知,在将编码器状态传递到注意力层之前,他们没有在代码中再次转换编码器状态的部分。那么他们的代码是如何工作的呢?更重要的是,我的修复是否破坏了注意力机制的正确行为?

【问题讨论】:

【参考方案1】:

这是问题:

只有编码器是双向的,但您为解码器提供双向状态(始终是单向的)。

解决办法:

你要做的就是简单地连接状态,所以,你再次操纵“单向数据”!

encoder_state = []

for layer_id in range(num_layers):
    state_fw = bi_encoder_state[0][layer_id]
    state_bw = bi_encoder_state[1][layer_id]

    # Merging the fw state and the bw state
    cell_state = tf.concat([state_fw.c, state_bw.c], 1)
    hidden_state= tf.concat([state_fw.h, state_bw.h], 1)

    # This state as the same structure than an uni-directional encoder state
    state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state, h=hidden_state)

    encoder_state.append(state)

encoder_state = tuple(encoder_state)

【讨论】:

以上是关于如何使用注意力机制在多层双向中操纵编码器状态的主要内容,如果未能解决你的问题,请参考以下文章

蚁群算法ACO优化LSTM超参数

笔记 基于双向多层次注意力网络的视觉文本情感分类

神经网络MLP 编码器-解码器 注意力机制 残差连接

Transformer(需修改)

Seq2Seq模型与注意力机制

神经网络中注意力机制概述