精度、召回率、F1 分数等于 sklearn

Posted

技术标签:

【中文标题】精度、召回率、F1 分数等于 sklearn【英文标题】:Precision, recall, F1 score equal with sklearn 【发布时间】:2017-05-28 06:01:26 【问题描述】:

我正在尝试比较 k 近邻算法中不同的距离计算方法和不同的投票系统。目前我的问题是,无论我做什么 precision_recall_fscore_support 来自 scikit-learn 的方法在精度、召回率和 fscore 方面产生完全相同的结果。这是为什么?我已经在不同的数据集(虹膜、玻璃和葡萄酒)上进行了尝试。我究竟做错了什么?到目前为止的代码:

#!/usr/bin/env python3
from collections import Counter
from data_loader import DataLoader
from sklearn.metrics import precision_recall_fscore_support as pr
import random
import math
import ipdb

def euclidean_distance(x, y):
    return math.sqrt(sum([math.pow((a - b), 2) for a, b in zip(x, y)]))

def manhattan_distance(x, y):
    return sum(abs([(a - b) for a, b in zip(x, y)]))

def get_neighbours(training_set, test_instance, k):
    names = [instance[4] for instance in training_set]
    training_set = [instance[0:4] for instance in training_set]
    distances = [euclidean_distance(test_instance, training_set_instance) for training_set_instance in training_set]
    distances = list(zip(distances, names))
    print(list(filter(lambda x: x[0] == 0.0, distances)))
    sorted(distances, key=lambda x: x[0])
    return distances[:k]

def plurality_voting(nearest_neighbours):
    classes = [nearest_neighbour[1] for nearest_neighbour in nearest_neighbours]
    count = Counter(classes)
    return count.most_common()[0][0]

def weighted_distance_voting(nearest_neighbours):
    distances = [(1/nearest_neighbour[0], nearest_neighbour[1]) for nearest_neighbour in nearest_neighbours]
    index = distances.index(min(distances))
    return nearest_neighbours[index][1]

def weighted_distance_squared_voting(nearest_neighbours):
    distances = list(map(lambda x: 1 / x[0]*x[0], nearest_neighbours))
    index = distances.index(min(distances))
    return nearest_neighbours[index][1]

def main():
    data = DataLoader.load_arff("datasets/iris.arff")
    dataset = data["data"]
    # random.seed(42)
    random.shuffle(dataset)
    train = dataset[:100]
    test = dataset[100:150]
    classes = [instance[4] for instance in test]
    predictions = []
    for test_instance in test:
        prediction = weighted_distance_voting(get_neighbours(train, test_instance[0:4], 15))
        predictions.append(prediction)
    print(pr(classes, predictions, average="micro"))

if __name__ == "__main__":
    main()

【问题讨论】:

【参考方案1】:

问题在于您使用的是“微观”平均值。

如here所述:

正如文档中所写:“请注意,对于“微”平均 在多类设置中将产生相同的精度、召回率和 [图片:F],而“加权”平均可能会产生一个 F 分数,即 不在精确率和召回率之间。” http://scikit-learn.org/stable/modules/model_evaluation.html

但是,如果您使用标签参数删除多数标签,那么 微平均不同于准确度,而精确度不同于 回忆一下。

【讨论】:

以上是关于精度、召回率、F1 分数等于 sklearn的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 python 打印精度、召回率、f 分数?

sklearn中计算准确率召回率精确度F1值

第二章 | 分类问题 | F1-score | ROC曲线 | 精准率召回率 | tensorflow2.6+sklearn | 学习笔记

『NLP学习笔记』Sklearn计算准确率精确率召回率及F1 Score

python实现计算精度召回率和F1值

F1 小于 Scikit-learn 中的精度和召回率