这是正确使用sklearn分类报告进行多标签分类报告吗？

Posted 2023-03-12

技术标签:

【中文标题】这是正确使用sklearn分类报告进行多标签分类报告吗？【英文标题】：Is this the correct use of sklearn classification report for multi-label classification reports? 【发布时间】：2021-09-23 06:01:43 【问题描述】：

我正在使用 tf-keras 训练神经网络。这是一个多标签分类，其中每个样本属于多个类别 [1,0,1,0..etc] .. 最终模型线（只是为了清楚起见）是：

model.add(tf.keras.layers.Dense(9, activation='sigmoid'))#final layer

model.compile(loss='binary_crossentropy', optimizer=optimizer, 
                metrics=[tf.keras.metrics.BinaryAccuracy(), 
                tfa.metrics.F1Score(num_classes=9, average='macro',threshold=0.5)])

我需要为这些生成精度、召回率和 F1 分数（我已经得到了训练期间报告的 F1 分数）。为此，我正在使用 sklearns 分类报告，但我需要确认我在多标签设置中正确使用它。

from sklearn.metrics import classification_report

pred = model.predict(x_test)
pred_one_hot = np.around(pred)#this generates a one hot representation of predictions

print(classification_report(one_hot_ground_truth, pred_one_hot))

这很好用，我得到了每个班级的完整报告，包括与来自 tensorflow 插件的 F1score 指标相匹配的 F1 分数（对于宏 F1）。抱歉，这篇文章很冗长，但我不确定的是：

在多标签设置的情况下，预测需要进行 one-hot 编码是否正确？如果我传入正常的预测分数（sigmoid 概率），则会引发错误...

谢谢。

【问题讨论】：

【参考方案1】：

将classification_report 用于二进制、多类和多标签分类是正确的。

在多类分类的情况下，标签不是 one-hot-encoded。他们只需要indices 或labels。

您可以看到下面的两个代码产生相同的输出：

索引示例

from sklearn.metrics import classification_report
import numpy as np

labels = np.array(['A', 'B', 'C'])


y_true = np.array([1, 2, 0, 1, 2, 0])
y_pred = np.array([1, 2, 1, 1, 1, 0])
print(classification_report(y_true, y_pred, target_names=labels))

标签示例

from sklearn.metrics import classification_report
import numpy as np

labels = np.array(['A', 'B', 'C'])

y_true = labels[np.array([1, 2, 0, 1, 2, 0])]
y_pred = labels[np.array([1, 2, 1, 1, 1, 0])]
print(classification_report(y_true, y_pred))

两个返回

              precision    recall  f1-score   support

           A       1.00      0.50      0.67         2
           B       0.50      1.00      0.67         2
           C       1.00      0.50      0.67         2

    accuracy                           0.67         6
   macro avg       0.83      0.67      0.67         6
weighted avg       0.83      0.67      0.67         6

在多标签分类的上下文中，classification_report 可以像下面的例子一样使用：

from sklearn.metrics import classification_report
import numpy as np

labels =['A', 'B', 'C']

y_true = np.array([[1, 0, 1],
                   [0, 1, 0],
                   [1, 1, 1]])
y_pred = np.array([[1, 0, 0],
                   [0, 1, 1],
                   [1, 1, 1]])

print(classification_report(y_true, y_pred, target_names=labels))

【讨论】：

谢谢 - 那么多标签案例呢？ one-hot标签可以接受吗？ - 如果不是，提供多标签的正确方法是什么？ sklearn 文档表明这是正确的方法，但我不是 100% 清楚我编辑了我的问题，并举例说明了如何将classification_report 用于多标签案例。非常感谢！是的，这澄清了它-最后一个问题-每一行= 1个样本正确吗？..所以对于y_true，[1,0,1] =样本1-[0,1,0] =样本2和[1， 1,1] = 样本 3？是的，y_true 和 y_pred 中的每一行对应 1 个样本。完美 - 感谢您的确认。答案接受:)

以上是关于这是正确使用sklearn分类报告进行多标签分类报告吗？的主要内容，如果未能解决你的问题，请参考以下文章

使用 Sklearn 进行多标签分类

使用 Sklearn 进行多标签文本分类

Sklearn Linear SVM 无法在多标签分类中进行训练

如何为多标签分类器/一对休息分类器腌制 sklearn 管道？

为多标签分类生成 sklearn 指标的问题

Sklearn：使用 CalibratedClassifierCV 校准多标签分类