tfit怎么设置时间
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了tfit怎么设置时间相关的知识,希望对你有一定的参考价值。
参考技术A TFIT是一款基于Python的测试框架,用于进行软件测试。如果你想要在TFIT中设置时间,可以使用Python自带的time模块来实现。以下是一个简单的示例代码:```
import time
# 获取当前时间
current_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))
# 输出当前时间
print('当前时间为:', current_time)
```
上述代码中,使用time模块的strftime函数来格式化时间,将当前时间转化为指定的格式('%Y-%m-%d %H:%M:%S')。然后使用time模块的localtime函数获取本地时间,并将其转化为指定格式的字符串。最后使用print函数将时间输出到控制台。
你可以根据自己的需要,修改时间格式和输出方式,来实现在TFIT中设置时间的功能。例如,可以将当前时间作为参数传入测试函数中,以便在测试报告中展示时间信息。
NotFittedError:TfidfVectorizer - 未安装词汇
【中文标题】NotFittedError:TfidfVectorizer - 未安装词汇【英文标题】:NotFittedError: TfidfVectorizer - Vocabulary wasn't fitted 【发布时间】:2017-10-26 20:24:53 【问题描述】:我正在尝试使用 scikit-learn/pandas 构建情绪分析器。构建和评估模型有效,但尝试对新的示例文本进行分类则无效。
我的代码:
import csv
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
infile = 'Sentiment_Analysis_Dataset.csv'
data = "SentimentText"
labels = "Sentiment"
class Classifier():
def __init__(self):
self.train_set, self.test_set = self.load_data()
self.counts, self.test_counts = self.vectorize()
self.classifier = self.train_model()
def load_data(self):
df = pd.read_csv(infile, header=0, error_bad_lines=False)
train_set, test_set = train_test_split(df, test_size=.3)
return train_set, test_set
def train_model(self):
classifier = BernoulliNB()
targets = self.train_set[labels]
classifier.fit(self.counts, targets)
return classifier
def vectorize(self):
vectorizer = TfidfVectorizer(min_df=5,
max_df = 0.8,
sublinear_tf=True,
ngram_range = (1,2),
use_idf=True)
counts = vectorizer.fit_transform(self.train_set[data])
test_counts = vectorizer.transform(self.test_set[data])
return counts, test_counts
def evaluate(self):
test_counts,test_set = self.test_counts, self.test_set
predictions = self.classifier.predict(test_counts)
print (classification_report(test_set[labels], predictions))
print ("The accuracy score is :.2%".format(accuracy_score(test_set[labels], predictions)))
def classify(self, input):
input_text = input
input_vectorizer = TfidfVectorizer(min_df=5,
max_df = 0.8,
sublinear_tf=True,
ngram_range = (1,2),
use_idf=True)
input_counts = input_vectorizer.transform(input_text)
predictions = self.classifier.predict(input_counts)
print(predictions)
myModel = Classifier()
text = ['I like this I feel good about it', 'give me 5 dollars']
myModel.classify(text)
myModel.evaluate()
错误:
Traceback (most recent call last):
File "sentiment.py", line 74, in <module>
myModel.classify(text)
File "sentiment.py", line 66, in classify
input_counts = input_vectorizer.transform(input_text)
File "/home/rachel/Sentiment/ENV/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 1380, in transform
X = super(TfidfVectorizer, self).transform(raw_documents)
File "/home/rachel/Sentiment/ENV/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 890, in transform
self._check_vocabulary()
File "/home/rachel/Sentiment/ENV/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 278, in _check_vocabulary
check_is_fitted(self, 'vocabulary_', msg=msg),
File "/home/rachel/Sentiment/ENV/lib/python3.5/site-packages/sklearn/utils/validation.py", line 690, in check_is_fitted
raise _NotFittedError(msg % 'name': type(estimator).__name__)
sklearn.exceptions.NotFittedError: TfidfVectorizer - Vocabulary wasn't fitted.
我不确定问题可能是什么。在我的分类方法中,我创建了一个全新的矢量化器来处理我想要分类的文本,与用于从模型创建训练和测试数据的矢量化器分开。
谢谢
【问题讨论】:
无论如何,在您的classify
函数中,您创建一个新的矢量化器对象,然后在它被安装之前调用transform
。
添加到@AryaMcCarthy 的回答中,这个类中的整个分类函数具有误导性。构造函数允许传递输入数据。那么为什么要在分类中再次传递呢?
另一种方法here
【参考方案1】:
您可以同时保存模型和矢量化器并在以后使用它们:我是这样做的:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.svm import LinearSVC
import pickle
# Train the classification model
def train_model():
df = pd.read_json('intent_data.json')
X_train, X_test, y_train, y_test = train_test_split(df['Utterance'], df['Intent'], random_state=0)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
model = LinearSVC().fit(X_train_tfidf, y_train)
# Save the vectorizer
vec_file = 'vectorizer.pickle'
pickle.dump(count_vect, open(vec_file, 'wb'))
# Save the model
mod_file = 'classification.model'
pickle.dump(model, open(mod_file, 'wb'))
# Load the classification model from disk and use for predictions
def classify_utterance(utt):
# load the vectorizer
loaded_vectorizer = pickle.load(open('vectorizer.pickle', 'rb'))
# load the model
loaded_model = pickle.load(open('classification.model', 'rb'))
# make a prediction
print(loaded_model.predict(loaded_vectorizer.transform([utt])))
【讨论】:
【参考方案2】:将vectorizer
保存为pickle
或joblib
文件,并在需要预测时加载。
pickle.dump(vectorizer, open("vectorizer.pickle", "wb")) //Save vectorizer
pickle.load(open("models/vectorizer.pickle", 'rb')) // Load vectorizer
【讨论】:
你拯救了这一天!【参考方案3】:你已经安装了一个矢量化器,但你把它扔掉了,因为它在你的 vectorize
函数的生命周期之后就不存在了。相反,将模型转换后保存在 vectorize
中:
self._vectorizer = vectorizer
然后在您的 classify
函数中,不要创建新的矢量化器。相反,请使用适合训练数据的那个:
input_counts = self._vectorizer.transform(input_text)
【讨论】:
如果您想在一周后回来使用它怎么办?我的第一个想法是腌制矢量化器,但得到can't pickle instancemethod objects
。那么如何保存矢量化器以进行长期存储呢?
你真的应该把它作为一个单独的问题发布,这样它就会得到更大的可见性。如果需要,可以链接到这个。以上是关于tfit怎么设置时间的主要内容,如果未能解决你的问题,请参考以下文章