如何从循环通过多个 alpha 的套索回归中提取给我最低均方误差的 alpha？

Posted 2023-03-12

技术标签:

【中文标题】如何从循环通过多个 alpha 的套索回归中提取给我最低均方误差的 alpha？【英文标题】：How can I extract the alpha that gives me the lowest Mean Squared Error from the lasso regression that loops through multiple alphas? 【发布时间】：2021-12-10 14:34:07 【问题描述】：

我正在尝试找到 x 值、它们各自的指数和 alpha 的最佳组合，这将使我能够找到最低均方误差。

我使用了 SKlearn 的 Lasso 回归，但到目前为止，我只能确定最小 MSE，以及创建它的变量组合。我不确定如何提取允许它的 alpha，或者如何查看变量组合是否有任何与它们相关的指数。

我取得的成果：

最佳 Lasso 回归模型的结果：最低平均测试 MSE：9172.38 变量组合：['Date', 'Cargo_size', 'Parcel_size', 'Rest', 'Sub']

x_combos = []
for n in range(1,9):
    combos = combinations(['Date', 'Cargo_size', 'Parcel_size', 'Rest', 'Age',\
                                           'Sub', 'X_coord', 'Y_coord'], n)
    x_combos.extend(combos)

lasso_models = 
alphas = 10**np.linspace(10,-2, 100)*.5    
   
for n in range(0, len(x_combos)):
        combo_list = list(x_combos[n])
        x = data[combo_list]
        poly = PolynomialFeatures(3)
        poly_x = poly.fit_transform(x)
        model = Lasso(max_iter=100000, normalize=(True))
        for a in alphas:
            model.set_params(alpha = a)
            model.fit(poly_x,y) #
        lasso_cv_scores = cross_validate(model, poly_x, y, cv=10, scoring=('neg_mean_squared_error', 'r2'), return_train_score=(True), return_estimator=(True))
        lasso_models[str(combo_list)] = np.mean(lasso_cv_scores['test_neg_mean_squared_error'])
    
    
    
    
    print("outcomes from the Best Lasso Regression Model:")
    min_mse = abs(max(lasso_models.values()))
    print("minimum Avg Test MSE:", min_mse.round(2))
    for possibles, i in lasso_models.items():
        if i == -min_mse:
            print("The Combination of Variables:", possibles)

【问题讨论】：

【参考方案1】：

您可以使用GridSearchCV 执行此操作，这是一个对估计器的指定参数值执行详尽搜索的对象。

如下：

from sklearn.model_selection import GridSearchCV
param_grid =  'alpha' : alphas
cv = GridSearchCV(model,
                  n_jobs=-1,
                  param_grid=param_grid,
                  cv=5,return_train_score=(True)
                 ).fit(poly_x, y_train)
cv.best_params_

这个 sn-p 将在 alpha 上执行超参数搜索并返回最佳参数集。此外，您可以使用 cv.best_estimator_ 返回最佳拟合模型。

【讨论】：

以上是关于如何从循环通过多个 alpha 的套索回归中提取给我最低均方误差的 alpha？的主要内容，如果未能解决你的问题，请参考以下文章

为啥岭回归和套索回归分类器需要 random_state？ [关闭]

带加权样本的弹性网络回归或套索回归（sklearn）

如何在套索回归或任何其他方法中生成系数 >0？

线性模型系数解读中的那些坑，以套索回归（LASSO）和岭回归（Ridege）为例

如何将套索和岭回归拟合（Glmnet）叠加到数据上？

如何在从列表更新路径的for循环中读取多个json？ [复制]