具有分层交叉验证的多个性能指标
Posted
技术标签:
【中文标题】具有分层交叉验证的多个性能指标【英文标题】:Multiple performance metrics with stratified cross validation 【发布时间】:2018-09-19 21:45:18 【问题描述】:我有一个小的、不平衡的数据集,我想用不同的算法对其进行测试。出于评估目的,我需要多个性能指标(准确度、精确度、召回率、fscore、支持)。
我打算这样做,但我并不满意,因为可能有更简单的解决方案:
skf = StratifiedKFold(n_splits=3, random_state=42, shuffle=True)
accuracy = []
for train_index, test_index in skf.split(X,Y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = Y[train_index], Y[test_index]
gradientBoost.fit(X_train, y_train)
y_pred = gradientBoost.predict(X_test)
accuracy.append(round(accuracy_score(y_test, y_pred), 2))
precision, recall, fscore, support = np.round(score(y_test, y_pred), 2)
print('precision: ' + str(precision))
print('recall: ' + str(recall))
print('fscore: ' + str(fscore))
print('support: ' + str(support))
print(classification_report(y_test, y_pred))
meanAcc= np.mean(np.asarray(accuracy))
print('meanAcc: ', meanAcc)
理论上,我可以对所有指标进行平均,就像我为准确性所做的那样。有没有更简单和/或更有效的方法?
编辑:
我尝试将准确率和召回加权作为得分手。不幸的是,情节中只显示了准确性。在图例中提到了准确率+召回率。
#Initialize classifier
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 42,
max_depth=10, min_samples_leaf=8)
scoring = 'Accuracy' : make_scorer(accuracy_score), 'Recall' : 'recall_weighted'
gs = GridSearchCV(DecisionTreeClassifier(criterion= 'entropy', random_state=42, min_samples_leaf = 10), param_grid='max_depth' : range(2, 30, 2),
scoring=scoring, cv=3, refit='Accuracy')
gs.fit(X_Distances, Y)
results = gs.cv_results_
plt.figure(figsize=(13, 13))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
fontsize=16)
plt.xlabel("max_depth")
plt.ylabel("Score")
plt.grid()
ax = plt.axes()
ax.set_xlim(0, 32)
ax.set_ylim(0, 1)
# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_max_depth'].data, dtype=float)
for scorer, color in zip(sorted(scoring), ['g', 'k']):
for sample, style in (('train', '--'), ('test', '-')):
sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
sample_score_std = results['std_%s_%s' % (sample, scorer)]
ax.fill_between(X_axis, sample_score_mean - sample_score_std,
sample_score_mean + sample_score_std,
alpha=0.1 if sample == 'test' else 0, color=color)
ax.plot(X_axis, sample_score_mean, style, color=color,
alpha=1 if sample == 'test' else 0.7,
label="%s (%s)" % (scorer, sample))
best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
best_score = results['mean_test_%s' % scorer][best_index]
# Plot a dotted vertical line at the best score for that scorer marked by x
ax.plot([X_axis[best_index], ] * 2, [0, best_score],
linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)
# Annotate the best score for that scorer
ax.annotate("%0.2f" % best_score,
(X_axis[best_index], best_score + 0.005))
plt.legend(loc="best")
plt.grid('off')
plt.show()
【问题讨论】:
为什么不附加到列表或字典本身?我不确定你还能做什么。 附加到列表是什么意思?对于我目前附加到列表的准确性,我可以对所有指标都这样做,是的。 是的,将它们附加到列表中,就像您正在做的那样以确保准确性。 【参考方案1】:我们可以使用GridSearchCV for multi-metric evaluation:
# Author: Raghav RV <rvraghav93@gmail.com>
# License: BSD
import numpy as np
from matplotlib import pyplot as plt
from sklearn.datasets import make_hastie_10_2
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
使用多个评估指标运行 GridSearchCV¶
X, y = make_hastie_10_2(n_samples=8000, random_state=42)
# The scorers can be either be one of the predefined metric strings or a scorer
# callable, like the one returned by make_scorer
scoring = 'AUC': 'roc_auc', 'Accuracy': make_scorer(accuracy_score)
# Setting refit='AUC', refits an estimator on the whole dataset with the
# parameter setting that has the best cross-validated AUC score.
# That estimator is made available at ``gs.best_estimator_`` along with
# parameters like ``gs.best_score_``, ``gs.best_parameters_`` and
# ``gs.best_index_``
gs = GridSearchCV(DecisionTreeClassifier(random_state=42),
param_grid='min_samples_split': range(2, 403, 10),
scoring=scoring, cv=5, refit='AUC')
gs.fit(X, y)
results = gs.cv_results_
绘制结果
plt.figure(figsize=(13, 13))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
fontsize=16)
plt.xlabel("min_samples_split")
plt.ylabel("Score")
plt.grid()
ax = plt.axes()
ax.set_xlim(0, 402)
ax.set_ylim(0.73, 1)
# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_min_samples_split'].data, dtype=float)
for scorer, color in zip(sorted(scoring), ['g', 'k']):
for sample, style in (('train', '--'), ('test', '-')):
sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
sample_score_std = results['std_%s_%s' % (sample, scorer)]
ax.fill_between(X_axis, sample_score_mean - sample_score_std,
sample_score_mean + sample_score_std,
alpha=0.1 if sample == 'test' else 0, color=color)
ax.plot(X_axis, sample_score_mean, style, color=color,
alpha=1 if sample == 'test' else 0.7,
label="%s (%s)" % (scorer, sample))
best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
best_score = results['mean_test_%s' % scorer][best_index]
# Plot a dotted vertical line at the best score for that scorer marked by x
ax.plot([X_axis[best_index], ] * 2, [0, best_score],
linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)
# Annotate the best score for that scorer
ax.annotate("%0.2f" % best_score,
(X_axis[best_index], best_score + 0.005))
plt.legend(loc="best")
plt.grid('off')
plt.show()
结果:
【讨论】:
看起来不错。我有一个多类问题,所以我选择了recall_weighted 和accuracy 作为记分器。不幸的是,只绘制了准确性,有什么想法吗? (我用我的代码更新了我的初始帖子)【参考方案2】:sklearn
文档建议使用以下指标之一来评估分类:
让我们试试accuracy
和f1_weighted
:
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.metrics import recall_score, make_scorer, accuracy_score
from sklearn.ensemble import RandomForestClassifier
X, y = make_classification(n_classes=10, n_informative=8, random_state=1)
scoring =
'Accuracy' : 'accuracy',
'F1 (macro)' : 'f1_weighted',
scoring =
'Accuracy' : 'accuracy',
'Recall' : 'f1_weighted',
gs = GridSearchCV(RandomForestClassifier(max_depth=5, random_state=42, min_samples_leaf = 10),
param_grid='n_estimators' : range(2, 101, 2), return_train_score=True,
scoring=scoring, cv=3, refit='Accuracy')
gs.fit(X, y)
results = gs.cv_results_
##################
plt.figure(figsize=(12, 8))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
fontsize=16)
plt.xlabel("n_estimators")
plt.ylabel("Score")
#plt.grid()
ax = plt.gca()
ax.set_xlim(0, 101)
ax.set_ylim(0, 1)
# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_n_estimators'].data, dtype=float)
for scorer, color in zip(sorted(scoring), ['g', 'k']):
for sample, style in (('train', '--'), ('test', '-')):
print('plotting: ()'.format(scorer, sample))
sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
sample_score_std = results['std_%s_%s' % (sample, scorer)]
ax.fill_between(X_axis, sample_score_mean - sample_score_std,
sample_score_mean + sample_score_std,
alpha=0.1 if sample == 'test' else 0, color=color)
ax.plot(X_axis, sample_score_mean, style, color=color,
alpha=1 if sample == 'test' else 0.7,
label="%s (%s)" % (scorer, sample))
best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
best_score = results['mean_test_%s' % scorer][best_index]
# Plot a dotted vertical line at the best score for that scorer marked by x
ax.plot([X_axis[best_index], ] * 2, [0, best_score],
linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)
# Annotate the best score for that scorer
ax.annotate("%0.2f" % best_score,
(X_axis[best_index], best_score + 0.005))
plt.legend(loc="best")
plt.grid(False)
plt.show()
结果:
【讨论】:
谢谢,几分钟前我发现我只是认为没有绘制附加指标,因为例如,recall_weighted 是与准确度相同的曲线。当我切换到宏时,它是可见的。 @user3667018,是的,我昨天也得出了同样的结论。我应该在答案中提到它...... 好的,非常感谢!还有一个问题:将我的整个数据集传递给GridSearchCV(cv=3)
是否合理?我只是在考虑过拟合等问题,但GridSearchCV()
应该处理交叉验证吗?
@user3667018,不行,你的模型在训练过程中应该不能看到测试数据集,否则会出现“数据泄露”...
当我想最终进行分层交叉验证时,我该如何处理?我没有一个单一的训练/测试集。参数调整后,我正在使用找到的最佳参数训练一个新模型并执行交叉验证。以上是关于具有分层交叉验证的多个性能指标的主要内容,如果未能解决你的问题,请参考以下文章
R语言使用yardstick包评估模型性能(二分类多分类回归模型交叉验证每一折的指标npvppvaccuracyauckapparecallrmsemaer2等以及可视化)
R语言使用yardstick包评估模型性能(二分类多分类回归模型交叉验证每一折的指标npvppvaccuracyauckapparecallrmsemaer2等以及可视化)
R语言使用yardstick包的roc_auc函数评估多分类(Multiclass默认为macro自定义设置micro)模型的性能查看模型在每个交叉验证(重采样)的每一折fold上的AUC指标
R语言使用yardstick包的rmse函数评估回归模型的性能评估回归模型在每个交叉验证(或者重采样)的每一折fold上的RMSE以及整体的均值RMSE(其他指标maemape等计算方式类似)