在 Gridsearchcv 中评分

Posted 2023-02-23

技术标签:

【中文标题】在 Gridsearchcv 中评分【英文标题】：Scoring in Gridsearch CV 【发布时间】：2019-03-03 13:05:36 【问题描述】：

我刚开始使用 Python 中的 GridSearchCV，但我对其中的得分感到困惑。我见过的地方

scorers = 
    'precision_score': make_scorer(precision_score),
    'recall_score': make_scorer(recall_score),
    'accuracy_score': make_scorer(accuracy_score)


grid_search = GridSearchCV(clf, param_grid, scoring=scorers, refit=refit_score,
                       cv=skf, return_train_score=True, n_jobs=-1)

使用这些值（即准确率、召回率、评分准确性）的目的是什么？

gridsearch 是否使用它来根据这些评分值为我们提供优化的参数....比如为了获得最佳精度分数，它会找到最佳参数或类似的东西？

它计算可能参数的precision、recall、accuracy并给出结果，现在的问题是如果这是真的，那么它会根据precision、recall或accuracy选择最佳参数？以上说法属实吗？

【问题讨论】：

【参考方案1】：

您的假设基本上是正确的。该参数字典允许网格搜索在每个评分指标上进行优化，并为每个评分找到最佳参数。

但是，您不能让 gridsearch 自动适应并返回 best_estimator_，而不选择用于 refit 的分数，而是会引发以下错误：

ValueError: For multi-metric scoring, the parameter refit must be set to a scorer 
key to refit an estimator with the best parameter setting on the whole data and make
the best_* attributes available for that metric. If this is not needed, refit should 
be set to False explicitly. True was passed.

【讨论】：

好的，所以我得到的是，如果我给 refit='precision_score' ，那么它将为最佳精度分数提供最佳参数绝对正确！补充一点，您可以在拟合网格搜索后使用lr_grid.cv_results_ 或者更易读的pd.DataFrame(lr_grid.cv_results_) 访问所有拟合和分数非常感谢 :) 对我的确认帮助很大。【参考方案2】：

使用这些值的目的是什么，即准确率、召回率、评分准确度？

以防万一您的问题还包括“什么是准确率、召回率和准确率以及为什么要使用它们？”...

准确度 =（正确预测的数量）/（总预测）精度 =（真阳性）/（真阳性 + 假阳性）召回率 =（真阳性）/（真阳性 + 假阴性）

真阳性是对真的预测是正确的，假阳性是对真的预测是不正确的，假阴性是对假的预测是不正确的。

在处理不平衡数据集时，召回率和精度是有用的指标（即，标签为“0”的样本很多，但标签为“1”的样本要少得多。

Recall 和 Precision 还导致了稍微复杂的评分指标，例如 F1_score（和 Fbeta_score），它们也非常有用。

这是一个great article，解释了召回和精确的工作原理。

【讨论】：

以上是关于在 Gridsearchcv 中评分的主要内容，如果未能解决你的问题，请参考以下文章