将 cross_validation 算法转换为 model_selection

Posted

技术标签:

【中文标题】将 cross_validation 算法转换为 model_selection【英文标题】:Translate cross_validation algorithm to model_selection 【发布时间】:2019-01-10 06:20:36 【问题描述】:

2016 年,我使用以下代码运行了一个 lasso 回归模型:

#Import required packages 
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pylab as plt
import matplotlib.pyplot as plp
import seaborn as sns
import statsmodels.formula.api as smf
from scipy import stats
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import LassoLarsCV

# split data into train and test sets
pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, target, test_size=.4, random_state=123)
#%
# specify the lasso regression model
model=LassoLarsCV(cv=10, precompute=False).fit(pred_train,tar_train)
#%
# print variable names and regression coefficients
dict(zip(predictors.columns, model.coef_))
#regcoef.to_csv('variable+regresscoef.csv')
#%%
# plot coefficient progression
m_log_alphas = -np.log10(model.alphas_)
ax = plt.gca()
plt.plot(m_log_alphas, model.coef_path_.T)
plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k',
            label='alpha CV')
plt.ylabel('Regression Coefficients')
plt.xlabel('-log(alpha)')
plt.title('Regression Coefficients Progression for Lasso Paths')
#%
# plot mean square error for each fold
m_log_alphascv = -np.log10(model.cv_alphas_)
plt.figure()
plt.plot(m_log_alphascv, model.cv_mse_path_, ':')
plt.plot(m_log_alphascv, model.cv_mse_path_.mean(axis=-1), 'k',
         label='Average across the folds', linewidth=2)
plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k',
            label='alpha CV')
plt.legend()
plt.xlabel('-log(alpha)')
plt.ylabel('Mean squared error')
plt.title('Mean squared error on each fold')
#%       
# MSE from training and test data
from sklearn.metrics import mean_squared_error
train_error = mean_squared_error(tar_train, model.predict(pred_train))
test_error = mean_squared_error(tar_test, model.predict(pred_test))
print ('training data MSE')
print(train_error)
print ('test data MSE')
print(test_error)
#%
# R-square from training and test data
rsquared_train=model.score(pred_train,tar_train)
rsquared_test=model.score(pred_test,tar_test)
print ('training data R-square')
print(rsquared_train)
print ('test data R-square')
print(rsquared_test)

现在我想再次运行它并收到以下警告:

DeprecationWarning:此模块在 0.18 版中已弃用 支持所有重构的 model_selection 模块 类和函数被移动。

如何使用 model_selection 重写此代码?

【问题讨论】:

【参考方案1】:

我在这里看到的唯一使用 cross_validation 模块的东西是 train_test_split

所以只需将您的导入更改为:

from sklearn.cross_validation import train_test_split

到:

from sklearn.model_selection import train_test_split

你可以走了。

【讨论】:

以上是关于将 cross_validation 算法转换为 model_selection的主要内容,如果未能解决你的问题,请参考以下文章

管道中的自定义 sklearn 转换器为 cross_validate 抛出 IndexError 但在使用 GridSearchCV 时不会

Python scikit-learn机器学习工具包学习笔记:cross_validation模块

在 cross_validate() 函数中使用 Pipeline 来测试不同的机器学习算法

成功解决(机器学习分割数据问题):ModuleNotFoundError: No module named ‘sklearn.cross_validation‘

由于不可克隆性,将 KerasRegressor 与 cross_validate 一起使用失败

嵌套交叉验证:cross_validate 如何处理 GridSearchCV 作为其输入估计器?