如何嵌套 Sklearn 的多个回归函数？

Posted 2023-03-12

技术标签:

【中文标题】如何嵌套 Sklearn 的多个回归函数？【英文标题】：How to nest multiple regression functions of Sklearn? 【发布时间】：2021-12-17 08:38:12 【问题描述】：

我正在尝试单独实现一个嵌套回归模型，我将其作为 TPOT 的输出。 TPOT的输出是：

RandomForestRegressor(XGBRegressor(XGBRegressor(**args1), **args2), **args3)

到目前为止我的代码：

from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor

xgb1 = XGBRegressor(**args1)
xgb2 = XGBRegressor(**args2)
rf = RandomForestRegressor(**args3)

我不确定如何按照 TPOT 的回答顺序正确组合它们。

【问题讨论】：

【参考方案1】：

TPOT 分类器和回归器提供了一个 scikit-learn 管道对象，该对象已经为您完成了这项工作。

如果您查看TPOT API，TPOTClassifier 和 TPOTRegressor 都会显示一个属性 fitted_pipeline_，它将包含 TPOT 可以找到的最佳 scikit-learn 管道。 scikit-learn 管道的示例：

PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
    XGBRegressor(learning_rate=0.1, max_depth=4, min_child_weight=14, n_estimators=100, n_jobs=1, objective="reg:squarederror", subsample=1.0, verbosity=0)

您可以将其转储以供以后加载，这样您就不必重新训练您的模型，或者您可以使用 TPOT 分类器和回归器内置函数将优化的管道导出为 Python 代码，这样您就可以简单地导出最佳管道。可以重新拟合您的模型：

tpot.export('tpot_digits_pipeline.py')

如果由于某种原因您只在问题中发布了该输出，您可以像这样重新创建 scikit-learn 管道：

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline

tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'], random_state=42)

exported_pipeline = make_pipeline(
  RandomForestRegressor(XGBRegressor(XGBRegressor(<replace with actual arg list>), <replace with actual arg list>), <replace with actual arg list>)
)

exported_pipeline.fit(training_features, training_target)

【讨论】：

以上是关于如何嵌套 Sklearn 的多个回归函数？的主要内容，如果未能解决你的问题，请参考以下文章