GridSearchCV 在管道中将 fit_params 传递给 XGBRegressor 会产生“ValueError:需要超过 1 个值才能解包”
Posted
技术标签:
【中文标题】GridSearchCV 在管道中将 fit_params 传递给 XGBRegressor 会产生“ValueError:需要超过 1 个值才能解包”【英文标题】:GridSearchCV passing fit_params to XGBRegressor in a pipeline yields "ValueError: need more than 1 value to unpack" 【发布时间】:2018-12-26 09:53:09 【问题描述】:将 fit_params 传递到包含 XGBRegressor 的管道中会返回错误,无论内容如何
训练数据集已经过一次热编码,并被拆分以供在管道中使用
train_X, val_X, train_y, val_y = train_test_split(final_train, y, random_state = 0)
创建一个 Imputer -> XGBRegressor 管道。设置XGBRegressor的参数和拟合参数
pipe = Pipeline(steps=[("Imputer", Imputer()),
("XGB", XGBRegressor())])
xgb_hyperparams = 'XGB__n_estimators': [1000, 2000, 3000],
'XGB__learning_rate': [0.01, 0.03, 0.05, 0.07],
'XGB__max_depth': [3, 4, 5]
fit_parameters = 'XGB__early_stopping_rounds': 5,
'XGB__eval_metric': 'mae',
'XGB__eval_set': [(val_X, val_y)],
'XGB__verbose': False
grid_search = GridSearchCV(pipe,
xgb_hyperparams,
#fit_params=fit_parameters,
scoring='neg_mean_squared_error',
cv=5,
n_jobs=1,
verbose=3)
grid_search.fit(train_X, train_y, fit_params=fit_parameters)
这会产生以下输出:
Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV] XGB__learning_rate=0.01, XGB__n_estimators=1000, XGB__max_depth=3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-76-0751db18c046> in <module>()
----> 1 grid_search.fit(train_X, train_y, fit_params=fit_parameters)
/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_search.pyc in fit(self, X, y, groups, **fit_params)
638 error_score=self.error_score)
639 for parameters, (train, test) in product(candidate_params,
--> 640 cv.split(X, y, groups)))
641
642 # if one choose to see train score, "out" will contain train score info
/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
780 self._iterating = True
781 else:
/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
623 return False
624 else:
--> 625 self._dispatch(tasks)
626 return True
627
/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch)
586 dispatch_timestamp = time.time()
587 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588 job = self._backend.apply_async(batch, callback=cb)
589 self._jobs.append(job)
590
/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in apply_async(self, func, callback)
109 def apply_async(self, func, callback=None):
110 """Schedule a func to be run"""
--> 111 result = ImmediateResult(func)
112 if callback:
113 callback(result)
/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in __init__(self, batch)
330 # Don't delay the application, to avoid keeping the input
331 # arguments in memory
--> 332 self.results = batch()
333
334 def get(self):
/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)
456 estimator.fit(X_train, **fit_params)
457 else:
--> 458 estimator.fit(X_train, y_train, **fit_params)
459
460 except Exception as e:
/usr/local/lib/python2.7/site-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
246 This estimator
247 """
--> 248 Xt, fit_params = self._fit(X, y, **fit_params)
249 if self._final_estimator is not None:
250 self._final_estimator.fit(Xt, y, **fit_params)
/usr/local/lib/python2.7/site-packages/sklearn/pipeline.pyc in _fit(self, X, y, **fit_params)
195 if step is not None)
196 for pname, pval in six.iteritems(fit_params):
--> 197 step, param = pname.split('__', 1)
198 fit_params_steps[step][param] = pval
199 Xt = X
ValueError: need more than 1 value to unpack
【问题讨论】:
您使用的是哪个版本的 xgboost? 感谢回复,版本为:0.72.1 我看过了,如果不将分类器从管道中分离出来,我就无法让它工作,对不起。希望其他人会出现并提出适当的修复建议。 【参考方案1】:我认为问题不在于 xgboost。这是您将 fit_params
传递给 fit 方法的方式中的一个错误。你需要的是grid_search.fit(train_X, train_y, **fit_parameters)
【讨论】:
【参考方案2】:我遇到了同样的问题,尝试在管道中使用 GridSearch LightGBM,所以我发现您应该以特定于 sklearn Pipeline 的方式命名要传递给 LightGBM 的参数。请注意,我将clf__categorical_feature
作为categorical_feature
参数传递给LGBMClassifier
fit
函数。我想这种方法也应该适用于 XGBRegressor 和其他机器学习模型。
这是我的代码:
import lightgbm as lgb
from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit
gridParams =
'clf__max_bin': [20, 30, 50, 70, 100, 150, 200, 250, 300],
'clf__objective' : ['binary'],
'clf__metric':['auc'],
'clf__num_leaves': [16, 35, 65, 128, 256],
'clf__max_depth' : [4, 5, 6, 7, 8],
'clf__learning_rate': [0.05, 0.01, 0.1, 0.03, 0.001, 0.005],
'clf__min_data_in_leaf': [100, 200, 300, 500, 700, 800, 1000, 1500],
'clf__min_sum_hessian_in_leaf': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
'clf__bagging_fraction': [0.1, 0.3, 0.5, 0.7, 0.9],
'clf__bagging_freq': [2, 4, 5, 6, 8, 10],
'clf__feature_fraction': [0.1,0.3,0.5,0.7,0.9],
'clf__lambda_l1': [0, 0.1, 0.5, 0.6, 1., 2., 5., 6., 7., 8., 9.],
'clf__lambda_l2': [0, 0.1, 0.5, 0.6, 1., 2., 5., 6., 7., 8., 9.],
'clf__num_iterations': [100, 200, 500, 700, 1000, 1500, 2000, 3000, 3500, 5000],
'clf__random_state' : [422],
'clf__boosting_type' : ['gbdt', 'dart', 'random_forest'],
'clf__is_unbalance': [True]
pipeline = Pipeline([('hook', RemoveColumnsHook(cols_to_remove=['start_month', 'start_day'])),
('clf', lgb.LGBMClassifier())])
gs = RandomizedSearchCV(
estimator=pipeline,
param_distributions=gridParams,
n_iter=1500,
scoring='roc_auc',
cv=TimeSeriesSplit(n_splits=5),
refit=True,
random_state=314,
verbose=True,
n_jobs=8)
gs.fit(X_train, y_train, clf__categorical_feature=category_names) #
print('Best score reached: with params: '.format(gs.best_score_, gs.best_params_))
【讨论】:
【参考方案3】:你必须使用**xgb_hyperparams,像这样:
gbm.fit(
X_train_preprocess,
y_train,
**param_fit_grid,
)
【讨论】:
以上是关于GridSearchCV 在管道中将 fit_params 传递给 XGBRegressor 会产生“ValueError:需要超过 1 个值才能解包”的主要内容,如果未能解决你的问题,请参考以下文章
如何使用 GridSearchCV 在嵌套管道中测试预处理组合?
工作管道上的 GridSearchCV 返回 ValueError
如何将最佳参数(使用 GridSearchCV)从管道传递到另一个管道