从磁盘加载包含预训练 Keras 模型的 scikit-learn 管道

Posted

技术标签:

【中文标题】从磁盘加载包含预训练 Keras 模型的 scikit-learn 管道【英文标题】:Load a scikit-learn pipeline containing a pre-trained Keras model from disk 【发布时间】:2021-01-12 12:59:33 【问题描述】:

我已经构建了一个 scikit-learn 管道,它使用 LSTM Keras 模型(包装在 keras.wrappers.scikit_learn.KerasClassifier 中)作为管道的最后一步。一旦管道完成训练,我将整个管道保存到磁盘(见下文)。我无法将管道加载回内存然后进行预测。 scikit-learn 管道和 Keras 模型目前似乎不能很好地结合在一起,这让事情变得很棘手。有人有这方面的经验吗?

张量流:2.3.1 喀拉拉邦:2.4.3 scikit-learn:0.23.2

代码:

import pandas as pd
from model_lstm.config import config
import joblib
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from model_lstm.utils import data_management as dm

def save_fitted_pipeline(pipeline):
    model_path = config.TRAINED_MODEL_DIR / config.TRAINED_MODEL_FILE
    pipeline_path = config.TRAINED_MODEL_DIR / config.TRAINED_PIPELINE_FILE
    pipeline.named_steps["lstm_model"].model.save(model_path)
    pipeline.named_steps["lstm_model"].model = None
    joblib.dump(pipeline, pipeline_path)

def load_fitted_pipeline():
    model_path = config.TRAINED_MODEL_DIR / config.TRAINED_MODEL_FILE
    pipeline_path = config.TRAINED_MODEL_DIR / config.TRAINED_PIPELINE_FILE
    pipeline = joblib.load(pipeline_path)
    model_func = lambda: keras.models.load_model(model_path)
    wrapped_model = KerasClassifier(build_fn=model_func)
    pipeline.named_steps["lstm_model"] = wrapped_model
    pipeline.named_steps["lstm_model"].model = keras.models.load_model(model_path)
    return pipeline

def predict():
    lstm_pipeline = load_fitted_pipeline()
    data_path = config.DATA_DIR / config.TRAINING_DATA_FILE
    X_train, y_train = dm.load_data(data_path)
    pred = lstm_pipeline.predict(X_train)

当前错误:

../model_lstm/predict.py:8: in predict
    pred = lstm_pipeline.predict(X_train)
../../../../anaconda/envs/sa_model_lstm/lib/python3.7/site-packages/sklearn/utils/metaestimators.py:119: in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
../../../../anaconda/envs/sa_model_lstm/lib/python3.7/site-packages/sklearn/pipeline.py:408: in predict
    return self.steps[-1][-1].predict(Xt, **predict_params)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x1a4256f690>
x = array([[   0,    0,    0, ...,  125,  309,  310],
       [   0,    0,    0, ...,   19,    3,  312],
       [   0,    0...076],
       [   0,    0,    0, ...,    2, 1077,   13],
       [   0,    0,    0, ..., 1080,  160, 1081]], dtype=int32)
kwargs = 'batch_size': 128, 'verbose': 1

    def predict(self, x, **kwargs):
      """Returns the class predictions for the given test data.
    
      Arguments:
          x: array-like, shape `(n_samples, n_features)`
              Test samples where `n_samples` is the number of samples
              and `n_features` is the number of features.
          **kwargs: dictionary arguments
              Legal arguments are the arguments
              of `Sequential.predict_classes`.
    
      Returns:
          preds: array-like, shape `(n_samples,)`
              Class predictions.
      """
      kwargs = self.filter_sk_params(Sequential.predict_classes, kwargs)
>     classes = self.model.predict_classes(x, **kwargs)
E     AttributeError: 'Functional' object has no attribute 'predict_classes'

../../../../anaconda/envs/sa_model_lstm/lib/python3.7/site-packages/tensorflow/python/keras/wrappers/scikit_learn.py:241: AttributeError

【问题讨论】:

【参考方案1】:

这是我的问题。对于那些感兴趣的人,我已经设法自己解决了这个问题。事实证明,这个问题实际上与写入然后将管道读取到磁盘无关。 keras.wrappers.scikit_learn.KerasClassifier 包装器似乎仅在您的 Keras 模型是 Sequential 的实例而不是 Model 的实例时才能正常工作,就像我的情况一样。我将我的模型转换为Sequential,一切正常。事实上,保存和加载逻辑变得比我在上面的代码中显示的要简单。

【讨论】:

以上是关于从磁盘加载包含预训练 Keras 模型的 scikit-learn 管道的主要内容,如果未能解决你的问题,请参考以下文章

如何使用分布式 Dask 和预训练的 Keras 模型进行模型预测?

如何使用 PyTorch 在预训练模型上添加新层? (给出了 Keras 示例。)

如何取出预训练的 keras 模型的中间层

预训练模型与Keras.applications.models权重资源地址

尝试在 tf.keras 上重命名预训练模型时出错

从预训练模型中移除顶层,迁移学习,张量流(load_model)