重新加载腌制的 sklearn 管道时出现问题。未导入计数向量分析器功能

Posted

技术标签:

【中文标题】重新加载腌制的 sklearn 管道时出现问题。未导入计数向量分析器功能【英文标题】:Problems while reloading pickled sklearn pipeline. Countvector analyzer function not being imported 【发布时间】:2019-10-31 12:39:32 【问题描述】:

我正在尝试腌制我的文本分类模型并重新加载到烧瓶应用程序界面中。

我有一个特定的函数用作分析器,称为 split_into_lemmas

def split_into_lemmas(message):
    message = unicode(message, 'utf8').lower()
    words = TextBlob(message).words
    # for each word, take its "base form" = lemma 
    return [word.lemma for word in words]

from sklearn.pipeline import Pipeline

count_vect = CountVectorizer(analyzer=split_into_lemmas,ngram_range= (1, 3), encoding='utf8',stop_words =None)
tfidf_transformer = TfidfTransformer()
text_clf = Pipeline([('vect', count_vect), ('tdif', tfidf_transformer), ('clf', best_svc)])

%%time
text_clf.fit(X=data['Condition'], y=data['condition_predict'])

我拟合模型并通过酸洗保存它

_ = joblib.dump(text_clf, 'classification_pipeline.pkl')

另一方面 当我尝试重新加载管道时

import pandas as pd 
import pickle
from sklearn.feature_extraction.text import CountVectorizer
from textblob import TextBlob
from sklearn.externals import joblib

clf_pipeline = open('C:/Users/Falco/Desktop/directory/WRMD_paper/classification_pipeline.pkl','rb')
clf = joblib.load(clf_pipeline)

我收到以下错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-bb0859b3946a> in <module>()
      6 
      7 clf_pipeline = open('C:/Users/Falco/Desktop/directory/WRMD_paper/classification_pipeline.pkl','rb')
----> 8 clf = joblib.load(clf_pipeline)

C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\numpy_pickle.pyc in load(filename, mmap_mode)
    586         filename = getattr(fobj, 'name', '')
    587         with _read_fileobject(fobj, filename, mmap_mode) as fobj:
--> 588             obj = _unpickle(fobj)
    589     else:
    590         with open(filename, 'rb') as f:

C:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\numpy_pickle.pyc in _unpickle(fobj, filename, mmap_mode)
    524     obj = None
    525     try:
--> 526         obj = unpickler.load()
    527         if unpickler.compat_mode:
    528             warnings.warn("The file '%s' has been generated with a "

C:\ProgramData\Anaconda2\lib\pickle.pyc in load(self)
    862             while 1:
    863                 key = read(1)
--> 864                 dispatch[key](self)
    865         except _Stop, stopinst:
    866             return stopinst.value

C:\ProgramData\Anaconda2\lib\pickle.pyc in load_global(self)
   1094         module = self.readline()[:-1]
   1095         name = self.readline()[:-1]
-> 1096         klass = self.find_class(module, name)
   1097         self.append(klass)
   1098     dispatch[GLOBAL] = load_global

C:\ProgramData\Anaconda2\lib\pickle.pyc in find_class(self, module, name)
   1130         __import__(module)
   1131         mod = sys.modules[module]
-> 1132         klass = getattr(mod, name)
   1133         return klass
   1134 

AttributeError: 'module' object has no attribute 'split_into_lemmas'

当我在笔记本中重新声明该函数时,模型可以很好地加载并运行,但是当我将笔记本保存为 .py 文件并将其作为烧瓶应用程序运行时,它不会运行并给我同样的错误。

有人可以帮我正确保存管道,这样我就不必声明函数了吗?

【问题讨论】:

【参考方案1】:

当您重新加载泡菜时,您还需要定义 split_into_lemmas..

【讨论】:

以上是关于重新加载腌制的 sklearn 管道时出现问题。未导入计数向量分析器功能的主要内容,如果未能解决你的问题,请参考以下文章

如何腌制sklearn管道中的各个步骤?

使用 joblib 加载腌制 scikit-learn 模型时出现 KeyError

如何为多标签分类器/一对休息分类器腌制 sklearn 管道?

使用viewdeck,关闭leftView时出现chenterView但未重新加载

如何使用 mlflow.pyfunc.log_model() 通过 Keras 步骤记录 sklearn 管道?类型错误:无法腌制 _thread.RLock 对象

将 Sklearn Pipiline 分支到 GridSearchCV 时出现问题