如果使用 doParallel 和 recipes 则出现插入错误
Posted
技术标签:
【中文标题】如果使用 doParallel 和 recipes 则出现插入错误【英文标题】:caret error if doParallel and recipes is used 【发布时间】:2018-08-28 04:46:34 【问题描述】:当我将 caret 与新的食谱包(即 caret::train.recipe())一起使用时,如果 doParallel 也用于注册并行后端,我会收到一条错误消息。附件是一个可重现的示例(插入符号文档中官方食谱示例的简短版本):
### Error only when I use doParallel
library(doParallel)
cl <- makeCluster(3)
registerDoParallel(cl)
###
library(caret)
library(recipes)
library(dplyr)
library(QSARdata)
data(AquaticTox)
tox <- AquaticTox_moe2D
ncol(tox)
tox$Activity <- AquaticTox_Outcome$Activity
tox <- tox %>%
select(-Molecule) %>%
mutate(manufacturability = 1/moe2D_Weight) %>%
mutate(manufacturability = manufacturability/sum(manufacturability))
wt_rmse <- function (pred, obs, wts, na.rm = TRUE)
sqrt(weighted.mean((pred - obs)^2, wts, na.rm = na.rm))
model_stats <- function(data, lev = NULL, model = NULL)
stats <- defaultSummary(data, lev = lev, model = model)
res <- wt_rmse(pred = data$pred,
obs = data$obs,
wts = data$manufacturability)
c(wRMSE = res, stats)
tox_recipe <- recipe(Activity ~ ., data = tox) %>%
add_role(manufacturability, new_role = "performance var")
tox_recipe
tox_ctrl <- trainControl(method = "cv", summaryFunction = model_stats, allowParallel = TRUE)
set.seed(888)
tox_svm <- train(tox_recipe, tox,
method = "svmRadial",
metric = "wRMSE",
maximize = FALSE,
tuneLength = 10,
trControl = tox_ctrl)
tox_svm
如果我评论前三行,这会顺利进行。但如果不是:
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) :
unable to find variable "optimism_boot"
同样的错误出现在旧版本的 caret 中(在没有使用食谱的情况下),参见。这里: https://github.com/topepo/caret/issues/706
它在不使用 doParallel 或使用 doParallel 但没有配方时不会出现的意义上已修复,例如通过 train.formula。我正在使用所有软件包的最新版本:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doParallel_1.0.11 iterators_1.0.9 foreach_1.4.4 kernlab_0.9-25 bindrcpp_0.2 QSARdata_1.3
[7] recipes_0.1.2 broom_0.4.3 dplyr_0.7.4 caret_6.0-78 ggplot2_2.2.1 lattice_0.20-35
loaded via a namespace (and not attached):
[1] tidyselect_0.2.4 purrr_0.2.4 reshape2_1.4.3 splines_3.4.4 colorspace_1.3-2 stats4_3.4.4
[7] yaml_2.1.18 survival_2.41-3 prodlim_1.6.1 rlang_0.2.0 ModelMetrics_1.1.0 pillar_1.2.1
[13] withr_2.1.2 foreign_0.8-69 glue_1.2.0 bindr_0.1.1 plyr_1.8.4 dimRed_0.1.0
[19] lava_1.6 robustbase_0.92-8 stringr_1.3.0 timeDate_3043.102 munsell_0.4.3 gtable_0.2.0
[25] codetools_0.2-15 psych_1.7.8 class_7.3-14 DEoptimR_1.0-8 Rcpp_0.12.16 scales_0.5.0
[31] ipred_0.9-6 CVST_0.2-1 mnormt_1.5-5 stringi_1.1.7 RcppRoll_0.2.2 ddalpha_1.3.1.1
[37] grid_3.4.4 tools_3.4.4 magrittr_1.5 lazyeval_0.2.1 tibble_1.4.2 tidyr_0.8.0
[43] DRR_0.0.3 pkgconfig_2.0.1 MASS_7.3-49 Matrix_1.2-12 lubridate_1.7.2 gower_0.1.2
[49] assertthat_0.2.0 R6_2.2.2 rpart_4.1-13 sfsmisc_1.1-2 nnet_7.3-12 nlme_3.1-131.1
[55] compiler_3.4.4
该错误也出现在运行 Windows 10 的第二台 PC 上。非常感谢任何帮助或建议!
【问题讨论】:
在github上安装devel版本(今天要cran了) 【参考方案1】:安装开发版:
devtools::install_github('topepo/caret/pkg/caret')
今天要去 CRAN。
【讨论】:
更多详情在这里https://github.com/topepo/caret/issues/860
以上是关于如果使用 doParallel 和 recipes 则出现插入错误的主要内容,如果未能解决你的问题,请参考以下文章
R doParallel foreach 对独立工作者进行错误处理