使用 sklearn 获得精确度和召回率
Posted
技术标签:
【中文标题】使用 sklearn 获得精确度和召回率【英文标题】:Getting Precision and Recall using sklearn 【发布时间】:2018-07-04 05:38:24 【问题描述】:使用下面的代码,我有 Accuracy
。现在我正在尝试
1) 找到每个折叠的 precision
和 recall
(总共 10 折叠)
2) 为precision
获取mean
3) 为recall
获取mean
这可能类似于下面的print(scores)
和print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
。
有什么想法吗?
import numpy as np
from sklearn import cross_validation
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import StratifiedKFold
iris = datasets.load_iris()
skf = StratifiedKFold(n_splits=10)
clf = svm.SVC(kernel='linear', C=1)
scores = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=10)
print(scores) #[ 1. 0.93333333 1. 1. 0.86666667 1. 0.93333333 1. 1. 1.]
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)) # Accuracy: 0.97 (+/- 0.09)
【问题讨论】:
【参考方案1】:这有点不同,因为cross_val_score不能计算非二分类的precision/recall,所以需要使用recision_score、recall_score并手动进行交叉验证。参数 average='micro' 计算全局精度/召回率。
import numpy as np
from sklearn import cross_validation
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import precision_score, recall_score
iris = datasets.load_iris()
skf = StratifiedKFold(n_splits=10)
clf = svm.SVC(kernel='linear', C=1)
X = iris.data
y = iris.target
precision_scores = []
recall_scores = []
for train_index, test_index in skf.split(X, y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
y_pred = clf.fit(X_train, y_train).predict(X_test)
precision_scores.append(precision_score(y_test, y_pred, average='micro'))
recall_scores.append(recall_score(y_test, y_pred, average='micro'))
print(precision_scores)
print("Recall: %0.2f (+/- %0.2f)" % (np.mean(precision_scores), np.std(precision_scores) * 2))
print(recall_scores)
print("Recall: %0.2f (+/- %0.2f)" % (np.mean(recall_scores), np.std(recall_scores) * 2))
【讨论】:
我的precision
、recall
和accuracy
分数都完全相同[1.0, 0.93333333333333335, 1.0, 1.0, 0.8666666666666667, 1.0, 0.93333333333333335, 1.0, 1.0, 1.0] average: 0.97 (+/- 0.09)
,但情况并非如此。为什么是这样?我们该如何解决?
我得到了同样的结果。我认为这是由于数据:iris 数据集太小太简单,因此您可以尝试使用更大的数据集。
我在我正在使用的实际数据集(np.array
of 2163, 8719
)上尝试了这组代码,但对于precision
、recall
和@,我仍然收到相同的答案987654330@
有什么想法可以解决这个问题吗?
将设置更改为average='macro'
更改了precision
和recall
分数,但我不确定它是否是适当的设置【参考方案2】:
import pandas as pd
import numpy as np
from sklearn.metrics import confusion_matrix, recall_score, precision_score,
accuracy_score, f1_score,roc_auc_score
def binary_classification_performance(y_test, y_pred):
tp, fp, fn, tn = confusion_matrix(y_test, y_pred).ravel()
accuracy = round(accuracy_score(y_pred = y_pred, y_true = y_test),2)
precision = round(precision_score(y_pred = y_pred, y_true = y_test),2)
recall = round(recall_score(y_pred = y_pred, y_true = y_test),2)
f1_score = round(2*precision*recall/(precision + recall),2)
specificity = round(tn/(tn+fp),2)
npv = round(tn/(tn+fn),2)
auc_roc = round(roc_auc_score(y_score = y_pred, y_true = y_test),2)
result = pd.DataFrame('Accuracy' : [accuracy],
'Precision (or PPV)' : [precision],
'Recall (senitivity or TPR)' : [recall],
'f1 score' : [f1_score],
'AUC_ROC' : [auc_roc],
'Specificty (or TNR)': [specificity],
'NPV' : [npv],
'True Positive' : [tp],
'True Negative' : [tn],
'False Positive':[fp],
'False Negative':[fn])
return result
binary_classification_performance(y_test, y_pred)
【讨论】:
以上是关于使用 sklearn 获得精确度和召回率的主要内容,如果未能解决你的问题,请参考以下文章
使用 sklearn 获得相同的精度和召回率 (K-NN) 值