如何计算具有 3 个类别的 ROC_AUC 分数

Posted 2023-03-12

技术标签:

【中文标题】如何计算具有 3 个类别的 ROC_AUC 分数【英文标题】：How to calculate ROC_AUC score having 3 classes 【发布时间】：2019-10-07 04:42:42 【问题描述】：

我有一个包含 3 个类别标签 (0,1,2) 的数据。我试图制作 ROC 曲线。并通过使用 pos_label 参数来做到这一点。

fpr, tpr, thresholds = metrics.roc_curve(Ytest, y_pred_prob, pos_label = 0)

通过将 pos_label 更改为 0,1,2- 我得到 3 个图表，现在我在计算 AUC 分数时遇到了问题。如何平均 3 个图表并从中绘制 1 个图表，然后计算 Roc_AUC 分数。我对此有误 metrics.roc_auc_score(Ytest, y_pred_prob)

ValueError：不支持多类格式

请帮助我。

# store the predicted probabilities for class 0
y_pred_prob = cls.predict_proba(Xtest)[:, 0]
#first argument is true values, second argument is predicted probabilities
fpr, tpr, thresholds = metrics.roc_curve(Ytest, y_pred_prob, pos_label = 0)
plt.plot(fpr, tpr)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.title('ROC curve classifier')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.grid(True)

# store the predicted probabilities for class 1
y_pred_prob = cls.predict_proba(Xtest)[:, 1]
#first argument is true values, second argument is predicted probabilities
fpr, tpr, thresholds = metrics.roc_curve(Ytest, y_pred_prob, pos_label = 0)
plt.plot(fpr, tpr)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.title('ROC curve classifier')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')

plt.grid(真)

# store the predicted probabilities for class 2
y_pred_prob = cls.predict_proba(Xtest)[:, 2]
#first argument is true values, second argument is predicted probabilities
fpr, tpr, thresholds = metrics.roc_curve(Ytest, y_pred_prob, pos_label = 0)
plt.plot(fpr, tpr)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.title('ROC curve classifier')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')

plt.grid(真)

从上面的代码。生成 3 条 roc 曲线。由于是多类。

我想通过取平均值或平均值来获得高于 3 的一个 roc 曲线。然后，从中获得一个 roc_auc 分数。

【问题讨论】：

如果您想获得更快的回复，您可能想在与此主题更相关的地方提问，例如data science stackexchange。 【参考方案1】：

对于多类，计算每个类的 AUROC 通常很有用。例如，这是我用来分别计算每个类的 AUROC 的一些代码的摘录，其中 label_meanings 是描述每个标签是什么的字符串列表，并且各种数组的格式设置为每行是不同的示例，每列对应到不同的标签：

for label_number in range(len(label_meanings)):
    which_label = label_meanings[label_number] #descriptive string for the label
    true_labels = true_labels_array[:,label_number]
    pred_probs = pred_probs_array[:,label_number]
   #AUROC and AP (sliding across multiple decision thresholds)
    fpr, tpr, thresholds = sklearn.metrics.roc_curve(y_true = true_labels,
                                     y_score = pred_probs,
                                     pos_label = 1)
    auc = sklearn.metrics.auc(fpr, tpr)

如果您想绘制三个类别的平均 AUC 曲线：此代码 https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html 包含计算平均 AUC 的部分，以便您可以绘制图（如果您有三个类别，它将绘制平均 AUC三个类。）

如果您只需要三个班级的平均 AUC：分别计算每个班级的 AUC 后，您可以对三个数字进行平均以获得总体 AUC。

如果您想了解更多关于 AUROC 的背景知识以及单类与多类的计算方法，您可以查看这篇文章，Measuring Performance: AUC (AUROC)。

【讨论】：

【参考方案2】：

多类AUC的亮点：

您无法计算所有类别的共同 AUC。您必须分别计算每个类别的 AUC。正如您必须计算召回率一样，在进行多类分类时，每个类的精度都是不同的。

计算单个类的 AUC 的最简单方法：

我们选择一个分类器

from sklearn.linear_model import LogisticRegression

LRE = LogisticRegression(solver='lbfgs')

LRE.fit(X_train, y_train)

我正在制作一个多类课程的列表

d = y_test.unique()

class_name = list(d.flatten())

class_name

现在分别计算每个类的 AUC

for p in class_name:

     `fpr, tpr, thresholds = metrics.roc_curve(y_test,  
                     LRE.predict_proba(X_test)[:,1], pos_label = p) 

      auroc = round(metrics.auc(fpr, tpr),2)
      print('LRE',p,'--AUC--->',auroc)`

【讨论】：

以上是关于如何计算具有 3 个类别的 ROC_AUC 分数的主要内容，如果未能解决你的问题，请参考以下文章