GridSearchCV (scikit learn) 确定超参数

Posted 2023-04-17

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了GridSearchCV (scikit learn) 确定超参数相关的知识，希望对你有一定的参考价值。

参考技术A 在我们建立模型的时候需要（尽量）确定最优的参数，比如以下KNN的例子，如果直接实现，我们可以用for-loop来寻找最大的score，从而确定对应的参数：

但是这样搜索比较麻烦，而且weights还有别的参数，自己写比较麻烦。而scikit learn已经给我们封装好了GridSearchCV方法，我们直接调用即可：

如何使用 GridSearchCV 输出进行 scikit 预测？

【中文标题】如何使用 GridSearchCV 输出进行 scikit 预测？【英文标题】：How to use GridSearchCV output for a scikit prediction? 【发布时间】：2016-05-25 03:21:48 【问题描述】：

在以下代码中：

# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

rf_feature_imp = RandomForestClassifier(100)
feat_selection = SelectFromModel(rf_feature_imp, threshold=0.5)

clf = RandomForestClassifier(5000)

model = Pipeline([
          ('fs', feat_selection), 
          ('clf', clf), 
        ])

 params = 
    'fs__threshold': [0.5, 0.3, 0.7],
    'fs__estimator__max_features': ['auto', 'sqrt', 'log2'],
    'clf__max_features': ['auto', 'sqrt', 'log2'],
 

 gs = GridSearchCV(model, params, ...)
 gs.fit(X,y)

预测应该使用什么？

gs? gs.best_estimator_? 或 gs.best_estimator_.named_steps['clf']?

这三个有什么区别？

【问题讨论】：

【参考方案1】：

gs.predict(X_test) 等价于gs.best_estimator_.predict(X_test)。使用其中任何一个，X_test 将通过您的整个管道并返回预测。

gs.best_estimator_.named_steps['clf'].predict()，然而只是流水线的最后一个阶段。要使用它，必须已经执行了特征选择步骤。这只有在您之前通过 gs.best_estimator_.named_steps['fs'].transform() 运行数据时才有效

生成预测的三种等效方法如下所示：

直接使用gs。

pred = gs.predict(X_test)

使用best_estimator_。

pred = gs.best_estimator_.predict(X_test)

单独调用管道中的每个步骤。

X_test_fs = gs.best_estimator_.named_steps['fs'].transform(X_test)
pred = gs.best_estimator_.named_steps['clf'].predict(X_test_fs)

【讨论】：

非常感谢！有没有官方文档这么说？

以上是关于GridSearchCV (scikit learn) 确定超参数的主要内容，如果未能解决你的问题，请参考以下文章

Scikit-learn 在 DecisionTreeClassifier 上使用 GridSearchCV

scikit-learn GridSearchCV 弃用警告

Scikit-learn 多输出分类器使用：GridSearchCV、Pipeline、OneVsRestClassifier、SGDClassifier

python GridSearchCV scikit

如何使用 KerasClassifier 验证拆分和使用 scikit 学习 GridSearchCV

为啥 scikit-learn 中的 GridSearchCV 会产生这么多线程