调用 XGBoost .fit 后的 Python sklearn NotFittedError

Posted

技术标签:

【中文标题】调用 XGBoost .fit 后的 Python sklearn NotFittedError【英文标题】:Python sklearn NotFittedError after XGBoost .fit has been called 【发布时间】:2020-10-19 00:02:45 【问题描述】:

我正在尝试在 XGBoost 拟合模型上使用 sklearn plot_partial_dependence 函数,即在调用 .fit 之后。但我不断收到错误:

NotFittedError:此 XGBRegressor 实例尚未安装。在使用此估算器之前,使用适当的参数调用“fit”。

这是我使用虚拟数据集所采取的步骤。

带有虚拟数据的完整示例:

import numpy as np
# dummy dataset
from sklearn.datasets import make_regression
X_train, y_train = make_regression(n_samples = 1000, n_features = 10)


# Import xgboost
import xgboost as xgb

# Initialize the model 
model_xgb_1 = xgb.XGBRegressor(max_depth = 5, 
                               learning_rate = 0.01, 
                               n_estimators = 100, 
                               objective = 'reg:squarederror', 
                               booster = 'gbtree') 

# Fit the model 
# Not assigning to a new variable 
model_xgb_1.fit(X_train, y_train)

# Just to check that .predict can be called and works
# without error 
print(np.sum(model_xgb_1.predict(X_train)))
# the above works ok and prints the output

#This next step throws an error:
from sklearn.inspection import plot_partial_dependence
plot_partial_dependence(model_xgb_1, X_train, [0])

输出:

662.3468

NotFittedError:此 XGBRegressor 实例尚未安装。在使用此估算器之前,使用适当的参数调用“fit”。

更新

booster = 'gblinear' 时的解决方法

# CHANGE 1/2: Use booster = 'gblinear'
# as no coef are returned for the case of 'gbtree' 
model_xgb_1 = xgb.XGBRegressor(max_depth = 5, 
                               learning_rate = 0.01, 
                               n_estimators = 100, 
                               objective = 'reg:squarederror', 
                               booster = 'gblinear') 

# Fit the model 
# Not assigning to a new variable 
model_xgb_1.fit(X_train, y_train)

# Just to check that .predict can be called and works
# without error 
print(np.sum(model_xgb_1.predict(X_train)))
# the above works ok and prints the output


#This next step throws an error:
from sklearn.inspection import plot_partial_dependence
plot_partial_dependence(model_xgb_1, X_train, [0])

# CHANGE 2/2
# Add the following:
model_xgb_1.coef__ = model_xgb_1.coef_
model_xgb_1.intercept__ = model_xgb_1.intercept_

# Now call plot_partial_dependence --- It works ok
from sklearn.inspection import plot_partial_dependence
plot_partial_dependence(model_xgb_1, X_train, [0])

【问题讨论】:

在 xgboost 中可能没有正确考虑 sklearn 检查模型是否适合的方式。如果是这样,这可能会通过使用更新版本的 xgb 来解决。 【参考方案1】:

为避免此错误,请勿将拟合模型影响到变量。

# Import xgboost
import xgboost as xgb

# Initialize the model 
model_xgb_1 = xgb.XGBRegressor(max_depth = max_depth, 
                               learning_rate = shrinkage, 
                               n_estimators = nTrees, 
                               objective = 'reg:squarederror', 
                               booster = 'gbtree') 

# Fit the model 
model_xgb_1.fit(X_train, y_train)

# Just to check that .predict can be called and works
# without error 
model_xgb_1.predict(X_train)
# the above works ok and prints the output

#This next step throws an error:
from sklearn.inspection import plot_partial_dependence
plot_partial_dependence(model_xgb_1, X_train, [0])

【讨论】:

进行了更改,但这并不能解决错误。我还将编辑问题以反映它。【参考方案2】:
from sklearn.ensemble import VotingRegressor
XGB_v=VotingRegressor([("reg",XGB)],)
XGB_RMR=PartialDependenceDisplay.from_estimator(
    XGB_v, x_train, features,
    feature_names=["a"],line_kw="color": "blue"
)

这将帮助您解决问题。

【讨论】:

以上是关于调用 XGBoost .fit 后的 Python sklearn NotFittedError的主要内容,如果未能解决你的问题,请参考以下文章

python模块安装(xgboost)

如何调用xgboost python

XGBoost 最佳迭代

XGBoost 和交叉验证并行

fitting 方法的异常值过滤

python数据挖掘课程十四.Scipy调用curve_fit实现曲线拟合