为 scikit-learn 估计器子类化 XGBoostRegressor 会收到“TypeError：super() 不接受关键字参数”。

Posted 2023-02-23

技术标签:

【中文标题】为 scikit-learn 估计器子类化 XGBoostRegressor 会收到“TypeError：super() 不接受关键字参数”。【英文标题】：Subclassing XGBoostRegressor for scikit-learn estimators receives "TypeError: super() takes no keyword arguments." 【发布时间】：2022-01-24 00:16:18 【问题描述】：

我尝试继承 XGBRegressor 来创建一个自定义的 scikit-learn 兼容估计器，并嵌入了 GridSearchCV。我不断收到 TypeError 消息说“super() 不接受关键字参数。”

在下面的上下文中，第一个代码是第二个代码的程序版本。第二个代码是我打算做的，但失败了：我想为 XGBoost 回归器创建一个新类，并使用 GridSearchCV 作为交叉验证器。

from xgboost.sklearn import XGBRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt

# procedural version
X, y = make_regression(n_samples=20, n_features=3, random_state=42)
parameters = 'n_estimators': [10, 20], 'max_depth': [3, 4]
tunned_regr = GridSearchCV(XGBRegressor(), parameters)
tunned_regr.fit(X, y)

pred_y = tunned_regr.predict(X)
fig, ax = plt.subplots(figsize=(10,6))
plt.scatter(range(len(X)), pred_y, label="predicted")
plt.scatter(range(len(X)), y, label="true")
plt.legend()

# the new xgboost regressor with gridsearchCV embedded
class XGBR(XGBRegressor):
    def __init__(self, objective='reg:linear'):
        super(XGBR, self).__init__(objective=objective)

    def fit(self, X, y):
        parameters = 'n_estimators': [10, 20], 'max_depth': [3, 4]
        self.regr = GridSearchCV(super(XGBR, self), parameters)
        self.regr.fit(X, y)
        return self
    
    def predict(self, X):
        return self.regr.predict(X)

运行以下命令xgbr=XGBR(); xgbr.fit(X, y)，您应该会看到错误消息：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/test.py in <module>
     13         return self.regr.predict(X)
     14 xgbr = XGBR()
---> 15 xgbr.fit(X, y)

/test.py in fit(self, X, y)
      7         parameters = 'n_estimators': [10, 20], 'max_depth': [3, 4]
      8         self.regr = GridSearchCV(super(XGBR, self), parameters)
----> 9         self.regr.fit(X, y)
     10         return self
     11 

~/.local/lib/python3.9/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    803         n_splits = cv_orig.get_n_splits(X, y, groups)
    804 
--> 805         base_estimator = clone(self.estimator)
    806 
    807         parallel = Parallel(n_jobs=self.n_jobs, pre_dispatch=self.pre_dispatch)

~/.local/lib/python3.9/site-packages/sklearn/base.py in clone(estimator, safe)
     80     for name, param in new_object_params.items():
     81         new_object_params[name] = clone(param, safe=False)
---> 82     new_object = klass(**new_object_params)
     83     params_set = new_object.get_params(deep=False)
     84 

TypeError: super() takes no keyword arguments

【问题讨论】：

我相信这实际上不是关于super()，而是关于它用于调用__init__()。 【参考方案1】：

我觉得这条线很可疑：

        self.regr = GridSearchCV(super(XGBR, self), parameters)

我怀疑你想改写以下内容：

        self.regr = GridSearchCV(self, parameters)

在代码的程序版本中，您编写

tunned_regr = GridSearchCV(XGBRegressor(), parameters)

所以您将XGBRegressor 类的实例作为GridSearchCV 构造函数的第一个参数传递。在您的代码中，XGRB 是XGBRegressor 的子类，因此self 将是XGRB 的实例，因此也是XGBRegressor 的实例。

但是，在花更多时间查看您的代码和问题之后，我不确定继承是否适合这里。

在软件开发中有一个普遍的格言是“优先组合胜过继承”。在某些情况下继承很有用，但它往往用在不是最佳方法的地方，我认为这是其中一种情况。

XGBR 也是 XGBRegressor 吗？你可以在任何可以使用XGBRegressor 的地方使用XGBR 类的实例吗？如果这两个问题的答案是否定的，则不要使用继承。

您的类的以下版本改为使用组合：它在fit() 方法中创建XGBRegressor。您创建它并以与以前完全相同的方式使用它：

class XGBR:
    def __init__(self, objective='reg:linear'):
        self.objective = objective

    def fit(self, X, y):
        parameters = 'n_estimators': [10, 20], 'max_depth': [3, 4]
        self.regr = GridSearchCV(XGBRegressor(objective=self.objective), parameters)
        self.regr.fit(X, y)
        return self
    
    def predict(self, X):
        return self.regr.predict(X)

目前我选择在对fit() 的调用中初始化XGBRegressor。如果 XGBRegressor 创建速度很慢，您可能希望在 __init__ 中创建它。但是，如果您这样做，您还需要确保您可以使用相同的 XGBRegressor 来分析多个数据集，并且任何数据集的分析都不会受到 XGBRegressor 之前看到的任何数据集的影响。这可能是问题，也可能不是问题，我不知道。

最后，我添加一个免责声明，我不是数据科学家，我也没有测试过这段代码。

【讨论】：

以上是关于为 scikit-learn 估计器子类化 XGBoostRegressor 会收到“TypeError：super() 不接受关键字参数”。的主要内容，如果未能解决你的问题，请参考以下文章