同时进行特征选择和超参数调整
Posted
技术标签:
【中文标题】同时进行特征选择和超参数调整【英文标题】:Simultaneous feature selection and hyperparameter tuning 【发布时间】:2021-08-29 16:02:06 【问题描述】:我正在尝试对 sklearn SVC 模型进行超参数调整和特征选择。
我尝试了下面的代码,但我收到了一个错误。
clf = Pipeline([('anova', SelectPercentile(f_classif)),
('svc', SVC( probability = True))])
score_means = list()
score_params = list()
percentiles = (1, 3, 6, 10, 15, 20, 30, 40, 60, 80, 100)
params =
"C": np.logspace(-3, 17, 21),
"gamma": np.logspace(-20, 1, 21),
'class_weight' : [None, 'balanced']
halving_search = HalvingGridSearchCV(estimator = clf,
param_grid = params,
scoring = 'neg_brier_score',
factor = 2,
verbose = 2,
cv = 2)
for percentile in percentiles:
clf.set_params(anova__percentile=percentile)
this_scores = halving_search.fit(x_train, y_train)
score_means.append(this_scores.best_score_)
score_params.append(this_scores.best_params)
使用与 HalvingGridSearchCV 分开的 cross_val_score 运行管道代码是可行的,但我想,以找出哪种特征和超参数组合产生最佳模型。
当我运行上面的代码时,我得到以下错误:
Traceback (most recent call last):
File "<ipython-input-83-cf714445297c>", line 4, in <module>
this_scores = halving_search.fit(x_train, y_train)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search_successive_halving.py", line 213, in fit
super().fit(X, y=y, groups=None, **fit_params)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 841, in fit
self._run_search(evaluate_candidates)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search_successive_halving.py", line 320, in _run_search
more_results=more_results)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 809, in evaluate_candidates
enumerate(cv.split(X, y, groups))))
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 1041, in __call__
if self.dispatch_one_batch(iterator):
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 572, in __init__
self.results = batch()
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 263, in __call__
for func, args, kwargs in self.items]
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 263, in <listcomp>
for func, args, kwargs in self.items]
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 581, in _fit_and_score
estimator = estimator.set_params(**cloned_parameters)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 150, in set_params
self._set_params('steps', **kwargs)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\metaestimators.py", line 54, in _set_params
super().set_params(**params)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\base.py", line 233, in set_params
(key, self))
ValueError: Invalid parameter C for estimator Pipeline(steps=[('anova', SelectPercentile(percentile=1)),
('svc', SVC(probability=True))]). Check the list of available parameters with `estimator.get_params().keys()`.
看起来 halvingsearch 正在尝试将管道作为 C 的输入传递。
【问题讨论】:
【参考方案1】:您想对Pipeline
对象执行网格搜索。在为管道的不同步骤定义参数时,必须使用<step>__<parameter>
语法:
params =
"svc__C": np.logspace(-3, 17, 21),
"svc__gamma": np.logspace(-20, 1, 21),
"svc__class_weight" : [None, 'balanced']
请参阅user guide 了解更多信息。
【讨论】:
以上是关于同时进行特征选择和超参数调整的主要内容,如果未能解决你的问题,请参考以下文章