sklearn Pipeline:“ColumnTransformer”类型的参数不可迭代
Posted
技术标签:
【中文标题】sklearn Pipeline:“ColumnTransformer”类型的参数不可迭代【英文标题】:sklearn Pipeline: argument of type 'ColumnTransformer' is not iterable 【发布时间】:2020-09-16 14:42:18 【问题描述】:我正在尝试使用管道来为集成投票分类器提供数据,因为我希望集成学习器使用在不同特征集上训练的模型。为此,我遵循了[1] 上的教程。
以下是我目前可以开发的代码。
y = df1.index
x = preprocessing.scale(df1)
phy_features = ['A', 'B', 'C']
phy_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
phy_processer = ColumnTransformer(transformers=[('phy', phy_transformer, phy_features)])
fa_features = ['D', 'E', 'F']
fa_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
fa_processer = ColumnTransformer(transformers=[('fa', fa_transformer, fa_features)])
pipe_phy = Pipeline(steps=[('preprocessor', phy_processer ),('classifier', SVM)])
pipe_fa = Pipeline(steps=[('preprocessor', fa_processer ),('classifier', SVM)])
ens = VotingClassifier(estimators=[pipe_phy, pipe_fa])
cv = KFold(n_splits=10, random_state=None, shuffle=True)
for train_index, test_index in cv.split(x):
x_train, x_test = x[train_index], x[test_index]
y_train, y_test = y[train_index], y[test_index]
ens.fit(x_train,y_train)
print(ens.score(x_test, y_test))
但是,在运行代码时,我在ens.fit(x_train,y_train)
行收到一条错误消息TypeError: argument of type 'ColumnTransformer' is not iterable
。
以下是我收到的完整堆栈跟踪。
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Program Files\JetBrains\PyCharm 2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/ASUS/PycharmProjects/swelltest/enemble.py", line 112, in <module>
ens.fit(x_train,y_train)
File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py", line 265, in fit
return super().fit(X, transformed_y, sample_weight)
File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py", line 65, in fit
names, clfs = self._validate_estimators()
File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py", line 228, in _validate_estimators
self._validate_names(names)
File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 77, in _validate_names
invalid_names = [name for name in names if '__' in name]
File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 77, in <listcomp>
invalid_names = [name for name in names if '__' in name]
TypeError: argument of type 'ColumnTransformer' is not iterable
以下是发生错误时名称列表中的值。
1- ColumnTransformer(transformers=[('phy',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler', StandardScaler())]),
['HR', 'RMSSD', 'SCL'])])
2- ColumnTransformer(transformers=[('fa',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler', StandardScaler())]),
['Squality', 'Sneutral', 'Shappy'])])
这是什么原因,我该如何解决?
【问题讨论】:
【参考方案1】:VotingClassifier
的 estimators
参数应该是对(名称、估计器)的列表,例如
ens = VotingClassifier(estimators=[('phy', pipe_phy),
('fa', pipe_fa)])
(在您的代码中,检查试图找到该对的第二个元素,因此抱怨ColumnTransformer
不可迭代。)
【讨论】:
【参考方案2】:我确实设法使用变通方法让代码运行,这有点难看。
该库似乎正在尝试搜索 ColumnTransform 对象的“__”子字符串,但它无法执行。
由于此名称检查对我的功能没有显着影响,我在sklearn\utils\metaestimators.py
评论了以下 sn-p。
invalid_names = [name for name in names if '__' in name]
if invalid_names:
raise ValueError('Estimator names must not contain __: got '
'0!r'.format(invalid_names))
【讨论】:
以上是关于sklearn Pipeline:“ColumnTransformer”类型的参数不可迭代的主要内容,如果未能解决你的问题,请参考以下文章
sklearn:无法使 OneHotEncoder 与 Pipeline 一起使用
Sklearn ColumnTransformer + Pipeline = TypeError