从磁盘加载包含预训练 Keras 模型的 scikit-learn 管道
Posted
技术标签:
【中文标题】从磁盘加载包含预训练 Keras 模型的 scikit-learn 管道【英文标题】:Load a scikit-learn pipeline containing a pre-trained Keras model from disk 【发布时间】:2021-01-12 12:59:33 【问题描述】:我已经构建了一个 scikit-learn 管道,它使用 LSTM Keras 模型(包装在 keras.wrappers.scikit_learn.KerasClassifier
中)作为管道的最后一步。一旦管道完成训练,我将整个管道保存到磁盘(见下文)。我无法将管道加载回内存然后进行预测。 scikit-learn 管道和 Keras 模型目前似乎不能很好地结合在一起,这让事情变得很棘手。有人有这方面的经验吗?
张量流:2.3.1 喀拉拉邦:2.4.3 scikit-learn:0.23.2
代码:
import pandas as pd
from model_lstm.config import config
import joblib
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from model_lstm.utils import data_management as dm
def save_fitted_pipeline(pipeline):
model_path = config.TRAINED_MODEL_DIR / config.TRAINED_MODEL_FILE
pipeline_path = config.TRAINED_MODEL_DIR / config.TRAINED_PIPELINE_FILE
pipeline.named_steps["lstm_model"].model.save(model_path)
pipeline.named_steps["lstm_model"].model = None
joblib.dump(pipeline, pipeline_path)
def load_fitted_pipeline():
model_path = config.TRAINED_MODEL_DIR / config.TRAINED_MODEL_FILE
pipeline_path = config.TRAINED_MODEL_DIR / config.TRAINED_PIPELINE_FILE
pipeline = joblib.load(pipeline_path)
model_func = lambda: keras.models.load_model(model_path)
wrapped_model = KerasClassifier(build_fn=model_func)
pipeline.named_steps["lstm_model"] = wrapped_model
pipeline.named_steps["lstm_model"].model = keras.models.load_model(model_path)
return pipeline
def predict():
lstm_pipeline = load_fitted_pipeline()
data_path = config.DATA_DIR / config.TRAINING_DATA_FILE
X_train, y_train = dm.load_data(data_path)
pred = lstm_pipeline.predict(X_train)
当前错误:
../model_lstm/predict.py:8: in predict
pred = lstm_pipeline.predict(X_train)
../../../../anaconda/envs/sa_model_lstm/lib/python3.7/site-packages/sklearn/utils/metaestimators.py:119: in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
../../../../anaconda/envs/sa_model_lstm/lib/python3.7/site-packages/sklearn/pipeline.py:408: in predict
return self.steps[-1][-1].predict(Xt, **predict_params)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x1a4256f690>
x = array([[ 0, 0, 0, ..., 125, 309, 310],
[ 0, 0, 0, ..., 19, 3, 312],
[ 0, 0...076],
[ 0, 0, 0, ..., 2, 1077, 13],
[ 0, 0, 0, ..., 1080, 160, 1081]], dtype=int32)
kwargs = 'batch_size': 128, 'verbose': 1
def predict(self, x, **kwargs):
"""Returns the class predictions for the given test data.
Arguments:
x: array-like, shape `(n_samples, n_features)`
Test samples where `n_samples` is the number of samples
and `n_features` is the number of features.
**kwargs: dictionary arguments
Legal arguments are the arguments
of `Sequential.predict_classes`.
Returns:
preds: array-like, shape `(n_samples,)`
Class predictions.
"""
kwargs = self.filter_sk_params(Sequential.predict_classes, kwargs)
> classes = self.model.predict_classes(x, **kwargs)
E AttributeError: 'Functional' object has no attribute 'predict_classes'
../../../../anaconda/envs/sa_model_lstm/lib/python3.7/site-packages/tensorflow/python/keras/wrappers/scikit_learn.py:241: AttributeError
【问题讨论】:
【参考方案1】:这是我的问题。对于那些感兴趣的人,我已经设法自己解决了这个问题。事实证明,这个问题实际上与写入然后将管道读取到磁盘无关。 keras.wrappers.scikit_learn.KerasClassifier
包装器似乎仅在您的 Keras 模型是 Sequential
的实例而不是 Model
的实例时才能正常工作,就像我的情况一样。我将我的模型转换为Sequential
,一切正常。事实上,保存和加载逻辑变得比我在上面的代码中显示的要简单。
【讨论】:
以上是关于从磁盘加载包含预训练 Keras 模型的 scikit-learn 管道的主要内容,如果未能解决你的问题,请参考以下文章
如何使用分布式 Dask 和预训练的 Keras 模型进行模型预测?
如何使用 PyTorch 在预训练模型上添加新层? (给出了 Keras 示例。)