在三级管道中设置 imputer 的参数
Posted
技术标签:
【中文标题】在三级管道中设置 imputer 的参数【英文标题】:Setting the parameters of an imputer within a three levels pipeline 【发布时间】:2020-07-19 15:27:59 【问题描述】:我是这个数据科学领域的新手,为了组织我的代码,我正在使用管道。
我试图组织的代码的 sn-p 如下:
### Preprocessing ###
# Preprocessing for numerical data
numerical_transformer = Pipeline(steps=[
('imputer', SimpleImputer()),
('scaler', StandardScaler())
])
# Preprocessing for categorical data
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('onehot', OneHotEncoder(handle_unknown='ignore', sparse=False))
])
# Bundle preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
transformers=[
('num', numerical_transformer, numerical_cols),
('cat', categorical_transformer, categorical_cols)
])
### Model ###
model = XGBRegressor(objective ='reg:squarederror', n_estimators=1000, learning_rate=0.05)
### Processing ###
# Bundle preprocessing and modeling code in a pipeline
my_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('model', model)
])
parameters =
# => How to set the parameters for one of the parts of the numerical_transformer pipeline?
# GridSearch
CV = GridSearchCV(my_pipeline, parameters, scoring = 'neg_mean_absolute_error', n_jobs= 1)
CV.fit(X_train, y_train)
如何更改在 numeric_transformer 管道中找到的 Imputer 的参数?
谢谢,
【问题讨论】:
您要调整的 imputer 的具体参数是什么? 嗨,策略参数。我希望它是 'mean'、'median' 和 'most_frequent' 我尝试了类似的方法:parameters['preprocessor__numerical_transformer__strategy'] = ['mean', 'median', 'most_frequent'] 但它没有用。preprocessor__transformers__cat__imputer__strategy
呢?
如果我需要更改数值转换器,为什么要更改分类?
这似乎有效:parameters['preprocessor__num__imputer__strategy'] = ['most_frequent']
【参考方案1】:
@desernaut 指出正确的方向后,这就是答案:
parameters['preprocessor__num__imputer__strategy'] = ['most_frequent','mean', 'median',]
感谢@desernaut!
【讨论】:
以上是关于在三级管道中设置 imputer 的参数的主要内容,如果未能解决你的问题,请参考以下文章