无法弄清楚如何定义我的 y_test
Posted
技术标签:
【中文标题】无法弄清楚如何定义我的 y_test【英文标题】:Not able to figure out how to define my y_test 【发布时间】:2020-03-26 16:01:00 【问题描述】:我是 Python 和 sklearn 的新手,如有任何帮助,将不胜感激。我之前唯一的经验是使用 mnist,我不确定如何在使用 csv 时定义 y_test。
我已经尝试了其他一些迭代,到目前为止没有任何效果。我没有包括进口和实用程序。
dataDir = '/content/drive/My Drive/Colab Notebooks/Final/dataQ2/' # Directory with input files
trainFile = 'q2train.csv' # Training examples
labelFile = 'q2label.csv' # Test label
validFile = 'q2valid.csv' # Valid Files
train = pd.read_csv(dataDir+trainFile) # Read training data
valid = pd.read_csv(dataDir+validFile) # Read test data
label = pd.read_csv(dataDir+labelFile) # Unlabeled file data
x_train = train[list(train)[1:]].values
x_test = valid[list(train)[1:]].values
# Specify output directories
modelDir = 'model/' # directory for saved models
outputDir = 'output/' #directory for output files
# Create Directories if needed:
os.makedirs(os.path.dirname(modelDir), exist_ok=True)
os.makedirs(os.path.dirname(outputDir), exist_ok=True)
#Display directory names
print('Models saved in %s' %modelDir)
print('Outputs saved in %s' %outputDir)
models = #dictoionary of SciKit-Learn classifiers with non-default parameters
models['NB'] = MultinomialNB()
models['DT'] = DecisionTreeClassifier()
models['RF'] = RandomForestClassifier(n_estimators=100)
models['KNN'] = KNeighborsClassifier(n_neighbors=10, algorithm='brute')
models['SVM'] = SVC(kernel='poly', gamma='auto')
models['LRM'] = LogisticRegression()
#Define function to evaluate classification accuracy
def evaluatePredictions(modelName, actual, predicted):
"""Returns classification accuracy
-Saves confusion matrix in outputDir
-Displays classification report
-Saves predicted classes in pandas data frame 'predictedDF'"""
acc = accuracy_score(actual, predicted) # accuracy
print("Accuracy with test data: %4.2f%%\n" %(100*acc))
print("CONFUSION MATRIX (Rows correspond to True Values):\n")
cm = confusion_matrix(actual, predicted) #confusion_matrix
cm = pd.DataFrame(cm) #convert to pandas data frame
print(cm) # print confusion matrix
cm.to_csv(outputDir+modelName+'confusionMatrix.csv') # save confusion matrix
print("\nCLASSIFICATION REPORT:\n")
print(classification_report(actual, predicted)) #classification report
return acc #returns accuracy
def displayDigits(images, labels, nCols=10):
"""Displays images with labels (nCols per row)
-images: list of vectors with 784 (28/28) grayscale values
-labels: list of labels for images"""
nRows = np.ceil(len(labels)/nCols).astype('int') # number of rows
plt.figure(figsize=(2*nCols,2*nRows)) #figure size
for i in range(len(labels)):
plt.subplot(nRows,nCols,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(images[i].reshape(28, 28), interpolation='nearest')
plt.xlabel(str(labels[i]), fontsize=14)
plt.show()
return
def get_data(trainFile, test_prop=0.2, seed=2019): #I am pretty sure this is line is my issue.
"""returns data for training, testing, and data characteristics"""
data = data_sets[data_set_name]
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=test_prop,
random_state=seed)
nF = X.shape[1] # number of features
nC = len(np.unique(y)) # number of classes
nTrain, nTest = len(y_train), len(y_test)
print("\nData set: %s" %data_set_name)
print("\tNumber of features %d" %nF)
print("\tNumber of output classes = %d" %(nC))
print("\tNumber of training examples = %d" %(nTrain))
print("\tNumber of testing examples = %d" %(nTest))
return X_train, X_test, y_train, y_test, nF, nC, nTrain, nTest
#Train and test Scikit-Learn models
result = [] #stores accuracy and time for training models
predictedTest = pd.DataFrame()
predictedTest['label'] = y_test
for m in models_used:
model = models[m]
print("Training classifier:\n%s\n" %model)
#train model
st = time.time()
model.fit(x_train, y_train)
tTrain = time.time() - st
print("Time to train classifier: %4.2f seconds\n" %(tTrain))
#predict test examples with trained model
st = time.time() # start time for prediction
predicted = model.predict(x_test) #predict test labels with trained model
tTest = time.time() - st #time to predict test examples
print("Time to test classifier: %4.2f seconds\n" %(tTest))
#Save trained model
modelFile = modelDir + m + '.sav' #name for saved Scikit-Learn model file
pickle.dump(model, open(modelFile, 'wb')) #save model
print('Trained model saved as %s\n' %modelFile)
#evaluate prdeiction accuracy on test examples
acc = evaluatePredictions(m, y_test, predicted) # evaluate prediction accuracy
result.append([m, acc, tTrain, tTest]) #record results
predictedTest[m] = predicted #save predicted class
print(60*'='+'\n') #end training and testing for model
提前谢谢你。
【问题讨论】:
您能展示一下您的训练 csv (q2train.csv) 的样子吗?哪些特征进入特征矩阵“X”,哪些特征(目标)应该是预测“y”的向量? 【参考方案1】:我不确定我是否正确: 当你从预测有效得到答案时 您可以将其传输到 pandadata 框架,然后将其转换为 csv 文件,如下所示:
y_csv='answer':predict
y_csv=pd.DataFrame(data=y_csv)
y_csv.to_csv('Filename',index=False)
【讨论】:
以上是关于无法弄清楚如何定义我的 y_test的主要内容,如果未能解决你的问题,请参考以下文章
无法弄清楚如何在同一 ScrollView 中的 WebView 上方添加自定义视图
无法弄清楚如何让 CMake 为自定义 clang 驱动程序提取正确的标头