在 sklearn 管道中拟合定制的 LGBM 参数

Posted 2023-03-12

技术标签:

【中文标题】在 sklearn 管道中拟合定制的 LGBM 参数【英文标题】：Fitting customized LGBM parameters in sklearn pipeline 【发布时间】：2021-09-01 23:56:21 【问题描述】：

我正在使用 LightGBM 开发二进制分类器。我的分类器定义如下所示：

# sklearn version, for the sake of calibration
bst_ = LGBMClassifier(**search_params, **static_params, n_estimators = 1500)

bst_.fit(X = X_train, y = y_train, sample_weight = TRAIN_WEIGHTS,
         eval_set = (X_test, y_test), eval_sample_weight = [TEST_WEIGHTS],
         eval_metric = my_scorer,
         early_stopping_rounds = 150, 
         callbacks = [lgb.reset_parameter(learning_rate = lambda current_round: learning_rate_decay(current_round, 
                                                                                                    base_learning_rate = learning_rate,
                                                                                                    decay_power = decay_power))],
         categorical_feature = cat_vars)

其中**search_params 是由 Optuna 优化的超参数，**static_params 是预定义参数，例如 'objective' 或 'random_state'。

除此之外，我使用sample_weight 定义每个目标的权重，我使用自定义目标函数my_scorer，提前停止和衰减学习率定义如下：

def learning_rate_decay(current_iter, base_learning_rate = 0.05, decay_power = 0.99):
    lr = base_learning_rate  * np.power(decay_power, current_iter)
    return lr if lr > 1e-3 else 1e-3

由于我希望通过建模获得概率，因此我想使用等渗回归作为预测管道的最后一部分。我知道我可以使用以下代码：

# Calibrate 
calibrated_clf = CalibratedClassifierCV(
    base_estimator=bst_,
    method = 'isotonic',
    cv="prefit"
)
calibrated_clf.fit(X_train, y_train)

但根据this，我不应该在火车数据集上使用“prefit”。我想创建一个管道，它将维护我的分类器.fit 中定义的所有参数（例如回调），如下所示：

calibrated_clf = CalibratedClassifierCV(
    base_estimator=bst_,
    method='isotonic',
    cv=5
)
calibrated_clf.fit(X_train, y_train)
calibrated_clf.set_params(**customized_lgbm_params)

但显然它不起作用，因为这些是用于将数据拟合到模型的特定参数。

我的问题是：如何定义一个包含我已经定义的LGBMClassifier.fit 的所有功能（例如提前停止、应用权重、回调）的管道？

【问题讨论】：

【参考方案1】：

使用 sklearn 的管道

pipeline = Pipeline([
    ("classifier", CalibratedClassifier(
        base_estimator=bst_(**search_params),
        method='isotonic', 
        cv=5) 
    )
])

【讨论】：

谢谢，但它不起作用 - 我需要可在 LGBMClassifier 的 .fit 函数中使用的参数，例如回调。在您的解决方案中，它们不会被使用。

以上是关于在 sklearn 管道中拟合定制的 LGBM 参数的主要内容，如果未能解决你的问题，请参考以下文章