在scikit的precision_recall_curve中，为啥threshold与recall和precision的维度不同？

Posted 2023-03-12

技术标签:

【中文标题】在scikit的precision_recall_curve中，为啥threshold与recall和precision的维度不同？【英文标题】：In scikit's precision_recall_curve, why does thresholds have a different dimension from recall and precision?在scikit的precision_recall_curve中，为什么threshold与recall和precision的维度不同？ 【发布时间】：2015-10-16 19:23:09 【问题描述】：

我想看看准确率和召回率如何随阈值而变化（不仅仅是彼此）

model = RandomForestClassifier(500, n_jobs = -1);  
model.fit(X_train, y_train);  
probas = model.predict_proba(X_test)[:, 1]  
precision, recall, thresholds = precision_recall_curve(y_test, probas)  
print len(precision)   
print len(thresholds)

283  
282

因此，我不能将它们绘制在一起。关于为什么会出现这种情况的任何线索？

【问题讨论】：

要将 N 个元素分成 3 个组，您需要 2 个阈值，这就是原因。泛化适用于您需要n-1 决策函数的n 箱（在这种情况下为阈值）。 【参考方案1】：

对于这个问题，最后的精度和召回值应该被忽略最后的precision和recall值总是分别为1.和0.，没有对应的阈值。

例如这里是一个解决方案：

def plot_precision_recall_vs_threshold(precisions, recall, thresholds): 
    fig = plt.figure(figsize= (8,5))
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
    plt.plot(thresholds, recall[:-1], "g-", label="Recall")
    plt.legend()

plot_precision_recall_vs_threshold(precision, recall, thresholds)

这些值应该存在，以便在您绘制精度与召回率时，绘图从 y 轴 (x=0) 开始。

【讨论】：

以上是关于在scikit的precision_recall_curve中，为啥threshold与recall和precision的维度不同？的主要内容，如果未能解决你的问题，请参考以下文章