在 R 中计算核岭回归以进行模型选择

Posted

技术标签:

【中文标题】在 R 中计算核岭回归以进行模型选择【英文标题】:Compute a kernel ridge regression in R for model selection 【发布时间】:2016-01-29 17:54:43 【问题描述】:

我有一个数据框df

df<-structure(list(P = c(794.102395099402, 1299.01021921817, 1219.80731174175, 
1403.00786976395, 742.749487463385, 340.246973543409, 90.3220586792255, 
195.85557320714, 199.390867672674, 191.4970921278, 334.452413539092, 
251.730350291822, 235.899165861309, 442.969718728163, 471.120193046119, 
458.464154601097, 950.298132134912, 454.660729622624, 591.212003320456, 
546.188716055825, 976.994105334083, 1021.67000560164, 945.965200876724, 
932.324768081307, 3112.60002304117, 624.005047807736, 0, 937.509240627289, 
892.926195849975, 598.564015734103, 907.984807726741, 363.400837339461, 
817.629824627294, 2493.75851182081, 451.149000503123, 1028.41455932241, 
615.640039284434, 688.915621065535, NaN, 988.21297, NaN, 394.7, 
277.7, 277.7, 492.7, 823.6, 1539.1, 556.4, 556.4, 556.4), T = c(11.7087701201175, 
8.38748953516909, 9.07065637842101, 9.96978059247473, 2.87026334756687, 
-1.20497751697385, 1.69057148825093, 2.79168506923385, -1.03659741363293, 
-2.44619473778322, -1.0414166493637, -0.0616510891024765, -2.19566614081763, 
2.101408628412, 1.30197334094966, 1.38963309876057, 1.11283280896495, 
0.570385633957982, 1.05118063842584, 0.816991857384802, 8.95069454902333, 
6.41067954598958, 8.42110173395973, 13.6455092557636, 25.706509843239, 
15.5098014530832, 6.60783204117648, 6.27004335176393, 10.0769600264915, 
3.05237224011361, 7.52869186722913, 11.2970127691776, 6.60356510073103, 
7.3210245298803, 8.4723724171517, 21.6988324356057, 7.34952593890056, 
6.04325232771032, NaN, 25.990913731, NaN, 1.5416666667, 15.1416666667, 
15.1416666667, 0.825, 4.3666666667, 7.225, -2.075, -2.075, -2.075
), A = c(76.6, 52.5, 3.5, 15, 71.5, 161.833333333333, 154, 72.5, 
39, 40, 23, 14.5, 5.5, 78, 129, 73.5, 100, 10, 3, 29.5, 65, 44, 
68.5, 56.5, 101, 52.1428571428571, 66.5, 1, 106, 36.6, 21.2, 
10, 135, 46.5, 17.5, 35.5, 86, 70.5, 65, 97, 30.5, 96, 79, 11, 
162, 350, 42, 200, 50, 250), Y = c(1135.40733061247, 2232.28817154825, 
682.15711101488, 1205.97307573068, 1004.2559099408, 656.537378609781, 
520.796355544007, 437.780508459633, 449.167726897157, 256.552344558528, 
585.618137514404, 299.815636674633, 230.279491515383, 1051.74875971674, 
801.07750760983, 572.337961145761, 666.132923644351, 373.524159859929, 
128.198042456082, 528.555426408071, 1077.30188477292, 1529.43757814094, 
1802.78658590423, 1289.80342084379, 3703.38329098125, 1834.54460388103, 
1087.48954802548, 613.15010408836, 1750.11457900004, 704.123482171384, 
1710.60321283154, 326.663507855032, 1468.32489464969, 1233.05517321796, 
852.500007182098, 1246.5605930537, 1186.31346316832, 1460.48566379373, 
2770, 3630, 3225, 831, 734, 387, 548.8, 1144, 1055, 911, 727, 
777)), .Names = c("P", "T", "A", "Y"), row.names = c(NA, -50L
), class = "data.frame")

我想使用核岭回归进行模型选择。我已经通过简单的逐步回归分析(见下文)完成了它,但我现在想使用核岭回归来完成它。

 library(caret)
    Step <- train(Y~ P+T+A, data=df,
                               preProcess= c("center", "scale"),
                               method = "lmStepAIC",
                               trainControl(method="cv",repeats = 10), na.rm=T)

有人知道我如何计算核岭回归以进行模型选择吗?

【问题讨论】:

看看CVSTpackage 和constructKRRLearner()function @艾蒂安。是的,我查了一下,但不太明白如何实现它。你用过吗? 并非如此。我曾希望文档会有所帮助,但没有太大帮助 是的,具体如何实现还不是很清楚。 【参考方案1】:

使用etienne 链接的CVST 包,您可以使用内核岭回归学习器进行训练和预测:

library(CVST)

## Assuming df is already in your environment
d = constructData(x=df[,1:3], y=df$Y) ## Structure data in CVST format
krr_learner = constructKRRLearner()   ## Build the base learner
params = list(kernel='rbfdot', sigma=100, lambda=0.01) ## Function params; documentation defines lambda as '.1/getN(d)'

krr_trained = krr_learner$learn(d, params)

## Now to predict, format your test data, 'dTest', the same way as you did in 'd'
pred = krr_learner$predict(krr_trained, dTest)

CVST 有点痛苦的是中间数据准备步骤,需要您调用constructData 函数。这是the documentation第7页的改编示例。

值得一提的是,当我在您的示例上运行此代码时,我收到了以下奇点警告:

Lapack routine dgesv: system is exactly singular: U[1,1] = 0

【讨论】:

以上是关于在 R 中计算核岭回归以进行模型选择的主要内容,如果未能解决你的问题,请参考以下文章

Python 对线性模型进行 特征选择,不断模型线性模型的AIC

R语言构建logistic回归模型并评估模型:构建基于混淆矩阵计算分类评估指标的自定义函数阳性样本比例(垃圾邮件比例)变化对应的分类器性能的变化基于数据阳性样本比例选择合适的分类评估指标

R语言 | randomForest包的随机森林回归模型以及对重要变量的选择

如何在R语言中使用Logistic回归模型

最小角回归 LARS算法包的用法以及模型参数的选择

机器学习-线性回归补充-R^