Keras 中带有 LSTM 的多层 Seq2Seq 模型
Posted
技术标签:
【中文标题】Keras 中带有 LSTM 的多层 Seq2Seq 模型【英文标题】:Multilayer Seq2Seq model with LSTM in Keras 【发布时间】:2018-11-27 15:55:41 【问题描述】:我在 keras 中制作了一个 seq2seq 模型。我已经构建了单层编码器和解码器,它们工作正常。但现在我想将它扩展到多层编码器和解码器。 我正在使用 Keras 功能 API 构建它。
培训:-
编码器代码:-
encoder_input=Input(shape=(None,vec_dimension))
encoder_lstm=LSTM(vec_dimension,return_state=True,return_sequences=True)(encoder_input)
encoder_lstm=LSTM(vec_dimension,return_state=True)(encoder_lstm)
encoder_output,encoder_h,encoder_c=encoder_lstm
解码器代码:-
encoder_state=[encoder_h,encoder_c]
decoder_input=Input(shape=(None,vec_dimension))
decoder_lstm= LSTM(vec_dimension,return_state=True,return_sequences=True (decoder_input,initial_state=encoder_state)
decoder_lstm=LSTM(vec_dimension,return_state=True,return_sequences=True)(decoder_lstm)
decoder_output,_,_=decoder_lstm
用于测试:-
encoder_model=Model(inputs=encoder_input,outputs=encoder_state)
decoder_state_input_h=Input(shape=(None,vec_dimension))
decoder_state_input_c=Input(shape=(None,vec_dimension))
decoder_states_input=[decoder_state_input_h,decoder_state_input_c]
decoder_output,decoder_state_h,decoder_state_c =decoder_lstm #(decoder_input,initial_state=decoder_states_input)
decoder_states=[decoder_state_h,decoder_state_c]
decoder_model=Model(inputs=[decoder_input]+decoder_states_input,outputs=[decoder_output]+decoder_states)
现在当我尝试增加数字时。解码器中用于训练的层数,然后训练工作正常,但测试它不起作用并引发错误。
实际上的问题是,当它变成多层时,我已经将initial_state
转移到了一个中间层,该中间层过去是在最后指定的。所以
当我在测试期间调用它时,它会抛出错误。
RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("input_64:0", shape=(?, ?, 150), dtype=float32) at layer "input_64".The following previous layers were accessed without issue: []
我应该如何传递用于输入层的initial_state=decoder_states_input
,以便它不会引发错误。
对于第一个输入层,我应该如何在最后一层传递initial_state=decoder_states_input
??
编辑:-
在那段代码中,我尝试制作多层解码器 LSTM。但这会出错。 使用单层时。正确的代码是:-
编码器(训练):-
encoder_input=Input(shape=(None,vec_dimension))
encoder_lstm =LSTM(vec_dimension,return_state=True)(encoder_input)
encoder_output,encoder_h,encoder_c=encoder_lstm
解码器(训练):-
encoder_state=[encoder_h,encoder_c]
decoder_input=Input(shape=(None,vec_dimension))
decoder_lstm= LSTM(vec_dimension, return_state=True, return_sequences=True)
decoder_output,_,_=decoder_lstm(decoder_input,initial_state=encoder_state)
解码器(测试)
decoder_output,decoder_state_h,decoder_state_c=decoder_lstm( decoder_input, initial_state=decoder_states_input)
decoder_states=[decoder_state_h,decoder_state_c]
decoder_output,decoder_state_h,decoder_state_c=decoder_lstm (decoder_input,initial_state=decoder_states_input)
decoder_model=Model(inputs=[decoder_input]+decoder_states_input,outputs=[decoder_output]+decoder_states)
【问题讨论】:
如果我理解正确,您提供的代码有效。您能否添加无法阐明问题所在的代码? 其实那是错误的代码,因为我在那里添加了多层解码器。我已经为单层添加了正确的代码。该代码应该如何扩展到 LSTM 的多层 你设法让它工作了吗?我也有同样的问题。 【参考方案1】:编辑 - 更新为在 Keras 与 RNN 中使用功能 API 模型
from keras.models import Model
from keras.layers import Input, LSTM, Dense, RNN
layers = [256,128] # we loop LSTMCells then wrap them in an RNN layer
encoder_inputs = Input(shape=(None, num_encoder_tokens))
e_outputs, h1, c1 = LSTM(latent_dim, return_state=True, return_sequences=True)(encoder_inputs)
_, h2, c2 = LSTM(latent_dim, return_state=True)(e_outputs)
encoder_states = [h1, c1, h2, c2]
decoder_inputs = Input(shape=(None, num_decoder_tokens))
out_layer1 = LSTM(latent_dim, return_sequences=True, return_state=True)
d_outputs, dh1, dc1 = out_layer1(decoder_inputs,initial_state= [h1, c1])
out_layer2 = LSTM(latent_dim, return_sequences=True, return_state=True)
final, dh2, dc2 = out_layer2(d_outputs, initial_state= [h2, c2])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(final)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
这是推理设置:
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_state_input_h1 = Input(shape=(latent_dim,))
decoder_state_input_c1 = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c,
decoder_state_input_h1, decoder_state_input_c1]
d_o, state_h, state_c = out_layer1(
decoder_inputs, initial_state=decoder_states_inputs[:2])
d_o, state_h1, state_c1 = out_layer2(
d_o, initial_state=decoder_states_inputs[-2:])
decoder_states = [state_h, state_c, state_h1, state_c1]
decoder_outputs = decoder_dense(d_o)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
decoder_model.summary()
最后,如果您遵循 Keras seq2seq 示例,则必须更改预测脚本,因为需要管理多个隐藏状态,而单层示例中只有两个隐藏状态。将有 2 倍的层隐藏状态数
# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
(i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
(i, char) for char, i in target_token_index.items())
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c, h1, c1 = decoder_model.predict(
[target_seq] + states_value) #######NOTICE THE ADDITIONAL HIDDEN STATES
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1, 1, num_decoder_tokens))
target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c, h1, c1]#######NOTICE THE ADDITIONAL HIDDEN STATES
return decoded_sentence
for seq_index in range(100):
# Take one sequence (part of the training set)
# for trying out decoding.
input_seq = encoder_input_data[seq_index: seq_index + 1]
decoded_sentence = decode_sequence(input_seq)
print('-')
print('Input sentence:', input_texts[seq_index])
print('Target sentence:', target_texts[seq_index])
print('Decoded sentence:', decoded_sentence)
【讨论】:
感谢您的详细回答。你能帮我确认一下,如果创建两个 LSTMCell 并将它们包装在一个 RNN 层中,是否在功能上与创建两个 LSTM 层相同,其中第一层馈入第二层? 如果您希望使用 CuDNN 层,这将不起作用,因为它们在 keras 中没有提供的等效单元。 我更新了答案以使用具有多个 LSTM 层的功能 API。测试了一下,好像翻译得很好 谢谢@JeremyWortz 我已经为此苦苦挣扎了一个月,终于解决了。看来我的主要错误步骤没有考虑推理中的额外层状态。 这非常有帮助!我已经修改了您的代码,使其适用于 depth-n 而不是固定的 2,因此它循环通过一个 latent_dims 数组,该数组的长度定义了堆叠的 LSTM 层的数量。我会在明天结束之前使用它,我会确保清理它并在之后(明天)发布! PS:您的“图层”数组实际上并未使用,并且两个图层都使用相同的 latent_dim 变量。【参考方案2】:我已经概括了 Jeremy Wortz 的 awesome 答案,以从列表“latent_dims”创建模型,该列表将是“len(latent_dims)”深度,而不是固定的 2 深度。
从“latent_dims”声明开始:
# latent_dims is an array which defines the depth of the encoder/decoder, as well as how large
# the layers should be. So an array of sizes [a,b,c] would produce a depth-3 encoder and decoder
# with layer sizes equal to [a,b,c] and [c,b,a] respectively.
latent_dims = [1024, 512, 256]
为训练创建模型:
# Define an input sequence and process it by going through a len(latent_dims)-layer deep encoder
encoder_inputs = Input(shape=(None, num_encoder_tokens))
outputs = encoder_inputs
encoder_states = []
for j in range(len(latent_dims))[::-1]:
outputs, h, c = LSTM(latent_dims[j], return_state=True, return_sequences=bool(j))(outputs)
encoder_states += [h, c]
# Set up the decoder, setting the initial state of each layer to the state of the layer in the encoder
# which is it's mirror (so for encoder: a->b->c, you'd have decoder initial states: c->b->a).
decoder_inputs = Input(shape=(None, num_decoder_tokens))
outputs = decoder_inputs
output_layers = []
for j in range(len(latent_dims)):
output_layers.append(
LSTM(latent_dims[len(latent_dims) - j - 1], return_sequences=True, return_state=True)
)
outputs, dh, dc = output_layers[-1](outputs, initial_state=encoder_states[2*j:2*(j+1)])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
推断如下:
# Define sampling models (modified for n-layer deep network)
encoder_model = Model(encoder_inputs, encoder_states)
d_outputs = decoder_inputs
decoder_states_inputs = []
decoder_states = []
for j in range(len(latent_dims))[::-1]:
current_state_inputs = [Input(shape=(latent_dims[j],)) for _ in range(2)]
temp = output_layers[len(latent_dims)-j-1](d_outputs, initial_state=current_state_inputs)
d_outputs, cur_states = temp[0], temp[1:]
decoder_states += cur_states
decoder_states_inputs += current_state_inputs
decoder_outputs = decoder_dense(d_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
最后,对 Jeremy Wortz 的“decode_sequence”函数进行了一些修改,得到以下结果:
def decode_sequence(input_seq, encoder_model, decoder_model):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = [] #Creating a list then using "".join() is usually much faster for string creation
while not stop_condition:
to_split = decoder_model.predict([target_seq] + states_value)
output_tokens, states_value = to_split[0], to_split[1:]
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, 0])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence.append(sampled_char)
# Exit condition: either hit max length
# or find stop character.
if sampled_char == '\n' or len(decoded_sentence) > max_decoder_seq_length:
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1, 1, num_decoder_tokens))
target_seq[0, 0, sampled_token_index] = 1.
return "".join(decoded_sentence)
【讨论】:
以上是关于Keras 中带有 LSTM 的多层 Seq2Seq 模型的主要内容,如果未能解决你的问题,请参考以下文章
Keras深度学习实战——使用长短时记忆网络构建情感分析模型
Keras 中的 CuDNNLSTM 和 LSTM 有啥区别?