网格搜索需要 30 多分钟，有啥方法可以减少这种情况吗？（朱庇特天青）

Posted 2023-03-12

技术标签:

【中文标题】网格搜索需要 30 多分钟，有啥方法可以减少这种情况吗？（朱庇特天青）【英文标题】：Grid-search taking takes 30+ minutes is there any way of reducing this? (Jupyter Azure)网格搜索需要 30 多分钟，有什么方法可以减少这种情况吗？（朱庇特天青） 【发布时间】：2018-12-13 06:29:36 【问题描述】：

我正在为具有时间序列拆分的 SVR 设计执行grid-search。我的问题是网格搜索大约需要 30 多分钟，这太长了。我有一个包含 17,800 位数据的大型数据集，但是，这个持续时间太长了。有什么办法可以减少这个持续时间吗？我的代码是：

from sklearn.svm import SVR
from sklearn.model_selection import TimeSeriesSplit
from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn import preprocessing as pre

X_feature = X_feature.reshape(-1, 1)
y_label = y_label.reshape(-1,1)

param = ['kernel': ['rbf'], 'gamma': [1e-2, 1e-3, 1e-4, 1e-5],
                       'C': [1, 10, 100, 1000],
                       'kernel': ['poly'], 'C': [1, 10, 100, 1000], 'degree': [1, 2, 3, 4]] 


reg = SVR(C=1)
timeseries_split = TimeSeriesSplit(n_splits=3)
clf = GridSearchCV(reg, param, cv=timeseries_split, scoring='neg_mean_squared_error')


X= pre.MinMaxScaler(feature_range=(0,1)).fit(X_feature)

scaled_X = X.transform(X_feature)


y = pre.MinMaxScaler(feature_range=(0,1)).fit(y_label)

scaled_y = y.transform(y_label)



clf.fit(scaled_X,scaled_y )

我的缩放 y 数据是：

 [0.11321139]
 [0.07218848]
 ...
 [0.64844211]
 [0.4926122 ]
 [0.4030334 ]]

我的缩放 X 数据是：

[[0.2681013 ]
 [0.03454225]
 [0.02062136]
 ...
 [0.92857565]
 [0.64930691]
 [0.20325924]]

【问题讨论】：

【参考方案1】：

根据数据大小和分类器，可能需要很长时间。或者，您可以尝试将进程分解为更小的部分，一次只使用一次内核，

param_rbf = 'kernel': ['rbf'], 'gamma': [1e-2, 1e-3, 1e-4, 1e-5],
                   'C': [1, 10, 100, 1000]

那就这样用吧

clf = GridSearchCV(reg, param_rbf, cv=timeseries_split, scoring='neg_mean_squared_error')

同样，通过不同的参数字典对不同的内核分别进行预测

params_poly = 'kernel': ['poly'], 'C': [1, 10, 100, 1000], 'degree': [1, 2, 3, 4]

我知道这并不完全是一个解决方案，而只是一些建议，以帮助您尽可能减少时间。

另外，将verbose 选项设置为True。这将帮助您显示分类器的进度。

另外，设置n_jobs=-1 不一定会降低速度。 See this answer供参考。

【讨论】：

这会更快，因为代码是拆分的，但我认为无论数据量如何，这都需要一段时间。非常感谢！ @Mohammed Kashif【参考方案2】：

使用GridSearchCV(..., n_jobs=-1) 以并行使用所有可用的 CPU 内核。

您也可以使用RandomizedSearchCV

【讨论】：

只是想问一下，如果我使用 Jupyter Azure 会 n_jobs=-1 仍然暗示吗？只是好奇，因为代码已经编译了 25 分钟，但什么也没发生 @Asif.Khan n_jobs = -1 表示要使用所有处理器。

以上是关于网格搜索需要 30 多分钟，有啥方法可以减少这种情况吗？（朱庇特天青）的主要内容，如果未能解决你的问题，请参考以下文章

网格搜索需要 30 多分钟，有啥方法可以减少这种情况吗？ （朱庇特天青）

网格搜索需要 30 多分钟，有啥方法可以减少这种情况吗？（朱庇特天青）