Python scikit svm“未安装或提供词汇”
Posted
技术标签:
【中文标题】Python scikit svm“未安装或提供词汇”【英文标题】:Python scikit svm "Vocabulary not fitted or provided" 【发布时间】:2020-06-13 19:25:25 【问题描述】:使用 Python 的 scikit SVM 线性支持向量分类,当我尝试进行预测时遇到错误:
import pickle
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem import PorterStemmer
from nltk import word_tokenize
import string
# Function to pass the list to the Tf-idf vectorizer
def returnPhrase(inputList):
return inputList
# Pre-processing the sentence which we input to predict the emotion
def transformSentence(sentence):
s = []
sentence = sentence.replace('\n', '')
sentTokenized = word_tokenize(sentence)
s.append(sentTokenized)
sWithoutPunct = []
punctList = list(string.punctuation)
curSentList = s[0]
newSentList = []
for word in curSentList:
if word.lower() not in punctList:
newSentList.append(word.lower())
sWithoutPunct.append(newSentList)
mystemmer = PorterStemmer()
tokenziedStemmed = []
for i in range(0, len(sWithoutPunct)):
curList = sWithoutPunct[i]
newList = []
for word in curList:
newList.append(mystemmer.stem(word))
tokenziedStemmed.append(newList)
return tokenziedStemmed
# Extracting the features for SVM
myVectorizer = TfidfVectorizer(analyzer='word', tokenizer=returnPhrase, preprocessor=returnPhrase,
token_pattern=None,
ngram_range=(1, 3))
# The SVM Model
curC = 2 # cost factor in SVM
SVMClassifier = svm.LinearSVC(C=curC)
filename = 'finalized_model.sav'
# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
# Input sentence
with open('trial_truth_001.txt', 'r') as file:
sent = file.read().replace('\n', '')
transformedTest = transformSentence(sent)
X_test = myVectorizer.transform(transformedTest).toarray()
Prediction = loaded_model.predict(X_test)
# Printing the predicted emotion
print(Prediction)
当我尝试使用 LinearSVC 来预测我被告知时:
sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided
我在这里缺少什么?显然,这是我拟合和转换数据的方式。
【问题讨论】:
您需要使用已在其上训练模型的矢量化器。在您的代码中,您正在为 tfidfvectorizer 创建对象,而不是将其与变换一起使用,这将引发错误。因此,您需要在训练模型时保存矢量化器,而不是使用相同的矢量化器进行预测 【参考方案1】:我想你只需要换行
X_test = myVectorizer.transform(transformedTest).toarray()
到
X_test = myVectorizer.fit_transform(transformedTest).toarray()
【讨论】:
以上是关于Python scikit svm“未安装或提供词汇”的主要内容,如果未能解决你的问题,请参考以下文章
使用 scikit-learn python 的线性 SVM 时出现 ValueError
如何在 scikit-learn 的 SVM 中使用非整数字符串标签? Python