带有管道和 GridSearchCV 的 StandardScaler
Posted
技术标签:
【中文标题】带有管道和 GridSearchCV 的 StandardScaler【英文标题】:StandardScaler with Pipelines and GridSearchCV 【发布时间】:2019-04-14 14:00:03 【问题描述】:我已将 standardScaler 放入管道中,并且 CV_mlpregressor.predict(x_test) 的结果很奇怪。我认为我必须从标准缩放器中恢复值,但仍然无法弄清楚如何。
pipe_MLPRegressor = Pipeline([('scaler', StandardScaler()),
('MLPRegressor', MLPRegressor(random_state = 42))])
grid_params_MLPRegressor = [
'MLPRegressor__solver': ['lbfgs'],
'MLPRegressor__max_iter': [100,200,300,500],
'MLPRegressor__activation' : ['relu','logistic','tanh'],
'MLPRegressor__hidden_layer_sizes':[(2,), (4,),(2,2),(4,4),(4,2),(10,10),(2,2,2)],
]
CV_mlpregressor = GridSearchCV (estimator = pipe_MLPRegressor,
param_grid = grid_params_MLPRegressor,
cv = 5,return_train_score=True, verbose=0)
CV_mlpregressor.fit(x_train, y_train)
CV_mlpregressor.predict(x_test)
结果:
array([ 2.67564153e+04, 1.90010572e+04, 9.62702942e+04, 3.98791931e+04,
1.48889808e+03, 7.08980726e+03, 3.86311279e+02, 7.05602301e+04,
4.06858486e+03, 4.29186303e+04, 3.86701735e+03, 6.30228075e+04,
6.78276925e+04, -5.91956287e+02, -7.37680434e+02, 3.07485001e+04,
4.81417953e+03, 5.18697686e+03, 1.61221952e+04, 1.33794944e+04,
-1.48375101e+03, 1.80891807e+04, 1.39740243e+04, 6.57156849e+04,
3.32962481e+04, 5.71332087e+05, 1.79130092e+03, 5.25642370e+04,
2.08111172e+04, 4.31060127e+04])
提前致谢。
【问题讨论】:
关于预测响应值,它们不需要在 -1,1 的范围内。仅对解释变量进行缩放。您应该将预测的测试数据与真实的测试数据进行比较,并像@sukhbinder 一样查看结果。 【参考方案1】:@Lian,我认为您所做的一切都是正确的。请检查您的数据。我用 sklearn 数据集做了一个实验,结果按预期工作。
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np
x,y = load_boston(return_X_y=True)
xtrain, xtest, ytrain, ytest = train_test_split(x,y, random_state=6784)
pipe_MLPRegressor = Pipeline([('scaler', StandardScaler()),
('MLPRegressor', MLPRegressor(random_state = 42))])
grid_params_MLPRegressor = [
'MLPRegressor__solver': ['lbfgs'],
'MLPRegressor__max_iter': [100,200,300,500],
'MLPRegressor__activation' : ['relu','logistic','tanh'],
'MLPRegressor__hidden_layer_sizes':[(2,), (4,),(2,2),(4,4),(4,2),(10,10),(2,
2,2)],]
CV_mlpregressor = GridSearchCV (estimator = pipe_MLPRegressor,
param_grid = grid_params_MLPRegressor,
cv = 5,return_train_score=True, verbose=0)
CV_mlpregressor.fit(xtrain, ytrain)
ypred=CV_mlpregressor.predict(xtest)
print np.c_[ytest, ypred]
这会打印出来
array([[ 29.9 , 30.79749986],
[ 22.5 , 24.52180656],
[ 22.6 , 18.9567779 ],
[ 28.7 , 22.17189123],
[ 13.8 , 19.16797811],
[ 21.2 , 24.63527335],
[ 11.3 , 13.58962076],
[ 23. , 18.33693455],
[ 12.7 , 15.52294714],
[ 23.3 , 26.65083451],
[ 25.3 , 24.04219813],
[ 22.6 , 19.81454969],
[ 36.2 , 22.16994764],
[ 17.9 , 11.1221789 ],
[ 18.5 , 17.84162452],
[ 16.8 , 22.99832673],
[ 20.3 , 20.22598426],
[ 23.9 , 26.80997945],
[ 17.6 , 16.08188321],
[ 23.2 , 18.5995955 ],
[ 48.3 , 43.37911488],
[ 19.1 , 22.36379857],
【讨论】:
感谢您的回复,我会检查我的数据库!以上是关于带有管道和 GridSearchCV 的 StandardScaler的主要内容,如果未能解决你的问题,请参考以下文章
SKLEARN // 将 GridsearchCV 与列变换和管道相结合
使用 Imblearn 管道和 GridSearchCV 进行交叉验证