如何使用 TensorFlow 保存编码器-解码器模型?
Posted
技术标签:
【中文标题】如何使用 TensorFlow 保存编码器-解码器模型?【英文标题】:How do I save an encoder-decoder model with TensorFlow? 【发布时间】:2022-01-04 18:35:54 【问题描述】:我有一个编码器-解码器模型,它可以做出很好的预测,但我正在努力保存层的隐藏状态,以便可以重复使用该模型。
下面的文字描述了我训练、测试、保存和加载模型所采取的每一步。
进口
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Input, TimeDistributed, Dense, Embedding
from tensorflow.keras.models import Model
培训
在对数据进行预处理后,我训练了如下图所示的编码器-解码器模型。
训练模型代码
embedding_size = 175
vocab_size = len(tokenizer.word_index)
encoder_inputs = Input(shape=(None,))
en_x = Embedding(vocab_size, embedding_size, mask_zero=True)(encoder_inputs)
# Encoder lstm
encoder = LSTM(512, return_state=True)
encoder_outputs, state_h, state_c = encoder(en_x)
# discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
# target word embeddings
dex = Embedding(vocab_size, embedding_size, mask_zero=True)
final_dex = dex(decoder_inputs)
# decoder lstm
decoder_lstm = LSTM(512, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(final_dex,
initial_state=encoder_states)
decoder_dense = TimeDistributed(Dense(vocab_size, activation='softmax'))
decoder_outputs = decoder_dense(decoder_outputs)
# While training, model takes eng and french words and outputs #translated french word
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# rmsprop is preferred for nlp tasks
model.compile(optimizer='rmsprop', loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
model.fit([X_train, X_decoder], y_train,
batch_size=32,
epochs=50,
validation_split=0.1)
训练模型总结
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, None)] 0
__________________________________________________________________________________________________
input_3 (InputLayer) [(None, None)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, None, 175) 499800 input_2[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, None, 175) 499800 input_3[0][0]
__________________________________________________________________________________________________
lstm (LSTM) [(None, 512), (None, 1409024 embedding[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, None, 512), 1409024 embedding_1[0][0]
lstm[0][1]
lstm[0][2]
__________________________________________________________________________________________________
time_distributed (TimeDistribut (None, None, 2856) 1465128 lstm_1[0][0]
==================================================================================================
Total params: 5,282,776
Trainable params: 5,282,776
Non-trainable params: 0
__________________________________________________________________________________________________
推理
训练后,我创建了以下推理模型(因为训练模型使用教师强化,不能用于预测)。
推理模型
encoder_model = Model(encoder_inputs, encoder_states)
# Redefine the decoder model with decoder will be getting below inputs from encoder while in prediction
decoder_state_input_h = Input(shape=(512,))
decoder_state_input_c = Input(shape=(512,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
final_dex2 = dex(decoder_inputs)
decoder_outputs2, state_h2, state_c2 = decoder_lstm(final_dex2, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2)
# sampling model will take encoder states and decoder_input (seed initially) and output the predictions. We don't care about decoder_states2
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs2] + decoder_states2)
现在我只需要一个进行预测的函数(见下文),经过一些测试后发现我的模型在测试集上的准确率为 97.2%。
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1))
# Populate the first character of target sequence with the start character.
target_seq[0, 0] = tokenizer.word_index['<sos>']
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = []
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = tokenizer.index_word[sampled_token_index]
decoded_sentence.append(sampled_char)
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '<eos>' or
len(decoded_sentence) > 6):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1,1))
target_seq[0, 0] = sampled_token_index
# Update states
states_value = [h, c]
return decoded_sentence
保存模型
然后我保存了训练模型和两个推理模型。我还保存了用于预处理数据的标记器。
model.save('training_model.h5')
encoder_model.save('encoder_model.h5')
decoder_model.save('decoder_model.h5')
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
加载模型
这就是我卡住的地方!为了进行预测,我需要加载图层和状态:encoder_inputs
、encoder_states
、dex
、decoder_inputs
、decoder_lstm
和 decoder_dense
尝试 1
起初我尝试简单地加载 encoder_model
和 decoder_model
然后简单地调用 decode_sequence()
但加载的模型的准确度为 0% - 显然隐藏状态没有像我预期的那样被保存。
尝试 2
然后我尝试加载初始训练模型的层,然后重新创建推理模型。这是我尝试过的......
encoder_inputs = model.layers[0]
_, state_h, state_c = model.layers[4].output
encoder_states = [state_h, state_c]
decoder_inputs = model.layers[1]
decoder_lstm = model.layers[5]
然后重新运行推理部分的代码。
这会导致以下错误...
ValueError: Input tensors to a Functional must come from `tf.keras.Input`. Received: <keras.engine.input_layer.InputLayer object at 0x16b7010a0> (missing previous layer metadata).
我现在不确定该怎么做。有人可以帮忙吗?
【问题讨论】:
您能否为模型创建添加完整的工作代码以及导入语句? @AniketBote 完成 :) 如果您不说明为什么它不起作用以及您实际尝试了什么,我们将无法帮助您。 @Dr.Snoopy 抱歉,我以为我已经添加了足够的信息。我已经更新了我的问题以包含我所做的一切和我尝试的一切。你介意再看看我的问题吗?谢谢 您在不支持的 keras 和 tf.keras 之间混合导入(只需查看提到 tf.keras 和 keras 的错误) 【参考方案1】:我想出了一个解决办法!这有点hacky,但它有效!以下是我保存和加载训练模型的步骤。
第 1 步 - 保存标记器和每个单独层的权重
# Save the tokenizer
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
# save the weights individually
for layer in model.layers:
weights = layer.get_weights()
if weights != []:
np.savez(f'layer.name.npz', weights)
第 2 步 - 加载分词器和层
# load the tokenizer
with open('tokenizer.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
# load the weights
w_encoder_embeddings = np.load('encoder_embeddings.npz', allow_pickle=True)
w_decoder_embeddings = np.load('decoder_embeddings.npz', allow_pickle=True)
w_encoder_lstm = np.load('encoder_lstm.npz', allow_pickle=True)
w_decoder_lstm = np.load('decoder_lstm.npz', allow_pickle=True)
w_dense = np.load('dense.npz', allow_pickle=True)
第 3 步 - 重新创建训练模型
这是我的模型(详情请参阅问题):
embedding_size = 175
vocab_size = len(tokenizer.word_index) + 1
encoder_inputs = Input(shape=(None,), name="encoder_inputs")
encoder_embeddings = Embedding(vocab_size, embedding_size, mask_zero=True, name="encoder_embeddings")(encoder_inputs)
# Encoder lstm
encoder_lstm = LSTM(512, return_state=True, name="encoder_lstm")
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embeddings)
# discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,), name="decoder_inputs")
# target word embeddings
decoder_embeddings = Embedding(vocab_size, embedding_size, mask_zero=True, name="decoder_embeddings")
training_decoder_embeddings = decoder_embeddings(decoder_inputs)
# decoder lstm
decoder_lstm = LSTM(512, return_sequences=True, return_state=True, name="decoder_lstm")
decoder_outputs, _, _ = decoder_lstm(training_decoder_embeddings,
initial_state=encoder_states)
decoder_dense = TimeDistributed(Dense(vocab_size, activation='softmax'), name="dense")
decoder_outputs = decoder_dense(decoder_outputs)
# While training, model takes input and traget words and outputs target strings
loaded_model = Model([encoder_inputs, decoder_inputs], decoder_outputs, name="training_model")
现在我们要将保存的权重应用到此模型中的层
# set the weights of the model
loaded_model.layers[2].set_weights(w_encoder_embeddings['arr_0'])
loaded_model.layers[3].set_weights(w_decoder_embeddings['arr_0'])
loaded_model.layers[4].set_weights(w_encoder_lstm['arr_0'])
loaded_model.layers[5].set_weights(w_decoder_lstm['arr_0'])
loaded_model.layers[6].set_weights(w_dense['arr_0'])
第 4 步 - 创建推理模型
encoder_model = Model(encoder_inputs, encoder_states)
# Redefine the decoder model with decoder will be getting below inputs from encoder while in prediction
decoder_state_input_h = Input(shape=(512,))
decoder_state_input_c = Input(shape=(512,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
inference_decoder_embeddings = decoder_embeddings(decoder_inputs)
decoder_outputs2, state_h2, state_c2 = decoder_lstm(inference_decoder_embeddings, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2)
# sampling model will take encoder states and decoder_input(seed initially) and output the predictions(french word index) We dont care about decoder_states2
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs2] + decoder_states2)
瞧!我现在可以使用之前训练的模型进行推断!
【讨论】:
以上是关于如何使用 TensorFlow 保存编码器-解码器模型?的主要内容,如果未能解决你的问题,请参考以下文章
将 Tensorflow Keras 模型(编码器 - 解码器)保存为 SavedModel 格式
在 Tensorflow 中生成特殊输出词后如何停止 RNN?
Tensorflow 可变图像输入大小(自动编码器,放大...)