运行经过训练的机器学习模型时出错
Posted
技术标签:
【中文标题】运行经过训练的机器学习模型时出错【英文标题】:Getting error on running the trained Machine Learning model 【发布时间】:2019-05-16 01:56:44 【问题描述】:我有一个包含“studentDetails”和“studentId”列的数据集。我在这个数据集上训练了我的模型并保存了它。当我训练模型并保存训练模型,然后加载训练模型进行预测时,它成功地给了我输出。但是当我单独加载保存的模型并使用它进行预测时,它给了我一个错误“CountVectorizer - Vocabulary was not fit”
这是我正在使用的代码:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import pickle
from sklearn.svm import LinearSVC
X_train, X_test, y_train, y_test = train_test_split(df['studentDetails'], df['studentId'], random_state = 0)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
classificationModel = LinearSVC().fit(X_train_tfidf, y_train)
filename = 'finalized_model.sav'
pickle.dump(classificationModel, open(filename, 'wb'))
现在加载模型并进行预测:
from sklearn.feature_extraction.text import CountVectorizer
data_to_be_predicted="Alicia Scott is from United States"
filename = 'finalized_model.sav'
loaded_model = pickle.load(open(filename, 'rb'))
count_vect = CountVectorizer()
result = loaded_model.predict(count_vect.transform([data_to_be_predicted]))
print(result)
输出:
94120
当我只运行第二个代码 sn-p 独立时,它给了我一个错误
错误:
CountVectorizer - Vocabulary wasn't fitted
我只是想知道,为什么我在第二种情况下会出错,因为当我得到正确的结果时,我没有在第一种情况下的任何地方重新定义 count_vect = CountVectorizer()。
【问题讨论】:
【参考方案1】:第二个 sn-p 的问题是您没有使用已安装的 CounVectorizer,它是新的,因此未安装。
我建议您使用 fit 而不是 fit_transform,这将返回一个已安装的 CountVectorizer,然后您可以像处理模型一样保存它。 p>
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import pickle
from sklearn.svm import LinearSVC
X_train, X_test, y_train, y_test = train_test_split(df['studentDetails'], df['studentId'], random_state = 0)
count_vect = CountVectorizer().fit(X_train)
X_train_counts = count_vect.transform(X_train)
tfidf_transformer = TfidfTransformer().fit(X_train_counts)
X_train_tfidf = tfidf_transformer.transform(X_train_counts)
classificationModel = LinearSVC().fit(X_train_tfidf, y_train)
filename = 'finalized_model.sav'
pickle.dump(classificationModel, open(filename, 'wb'))
pickle.dump(count_vect, open('count_vect, 'wb'))
pickle.dump(tfidf_transformer, open('tfidf_transformer, 'wb'))
现在您可以在要进行预测时加载其中的 3 个:
from sklearn.feature_extraction.text import CountVectorizer
data_to_be_predicted="Alicia Scott is from United States"
filename = 'finalized_model.sav'
loaded_model = pickle.load(open(filename, 'rb'))
count_vect = pickle.load(open('count_vect', 'rb'))
result = loaded_model.predict(count_vect.transform([data_to_be_predicted]))
print(result)
【讨论】:
这一行是否需要count_vect = CountVectorizer()? 谢谢我忘了从原始代码中删除它。以上是关于运行经过训练的机器学习模型时出错的主要内容,如果未能解决你的问题,请参考以下文章
R语言︱机器学习模型评价指标+(转)模型出错的四大原因及如何纠错