R:从 h2o.randomForest() 和 h2o.gbm() 绘制树
Posted
技术标签:
【中文标题】R:从 h2o.randomForest() 和 h2o.gbm() 绘制树【英文标题】:R: Plot trees from h2o.randomForest() and h2o.gbm() 【发布时间】:2016-08-29 06:25:49 【问题描述】:寻找一种有效的方法在 rstudio、H2O 的 Flow 或本地 html 页面中从 h2o 的 RF 和 GBM 模型中绘制树,类似于下面链接中的图像。 具体来说,你如何为下面的代码生成的对象(拟合模型)rf1 和 gbm2 绘制树,也许是通过解析 h2o.download_pojo(rf1) 或 h2o.download_pojo(gbm1)?
# # The following two commands remove any previously installed H2O packages for R.
# if ("package:h2o" %in% search()) detach("package:h2o", unload=TRUE)
# if ("h2o" %in% rownames(installed.packages())) remove.packages("h2o")
# # Next, we download packages that H2O depends on.
# pkgs <- c("methods","statmod","stats","graphics","RCurl","jsonlite","tools","utils")
# for (pkg in pkgs)
# if (! (pkg %in% rownames(installed.packages()))) install.packages(pkg)
#
#
# # Now we download, install h2o package
# install.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/R")))
library(h2o)
h2o.init(nthreads = -1, max_mem_size = "2G")
h2o.removeAll() ##clean slate - just in case the cluster was already running
## Load data - available to download from link below
## https://www.dropbox.com/s/gu8e2o0mzlozbu4/SampleData.csv?dl=0
df <- h2o.importFile(path = normalizePath("../SampleData.csv"))
splits <- h2o.splitFrame(df, c(0.4, 0.3), seed = 1234)
train <- h2o.assign(splits[[1]], "train.hex")
valid <- h2o.assign(splits[[2]], "valid.hex")
test <- h2o.assign(splits[[2]], "test.hex")
predictor_col_start_pos <- 2
predictor_col_end_pos <- 169
predicted_col_pos <- 1
rf1 <- h2o.randomForest(training_frame = train, validation_frame = valid,
x = predictor_col_start_pos:predictor_col_end_pos, y = predicted_col_pos,
model_id = "rf_covType_v1", ntrees = 2000, stopping_rounds = 10, score_each_iteration = T,
seed = 2001)
gbm1 <- h2o.gbm(training_frame = train, validation_frame = valid, x = predictor_col_start_pos:predictor_col_end_pos,
y = predicted_col_pos, model_id = "gbm_covType2", seed = 2002, ntrees = 20,
learn_rate = 0.2, max_depth = 10, stopping_rounds = 2, stopping_tolerance = 0.01,
score_each_iteration = T)
## Next step would be to plot trees for fitted models rf1 and gbm2
# print the model, POJO (Plain Old Java Object) to screen
h2o.download_pojo(rf1)
h2o.download_pojo(gbm1)
【问题讨论】:
提供了可重现的例子。 感谢您添加可重现的示例。我们现在可以将其迁移到Stack Overflow。如果您稍等片刻,它应该很快就会到达那里。 有没有办法像以图形方式可视化最终的树一样绘制它? ???你说的对吗? PUSH 我对这里的解决方案也很感兴趣。 【参考方案1】:我认为这可能是您正在寻找的解决方案;
library(h2o)
h2o.init()
df = h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
model = h2o.gbm(model_id = "model",
training_frame = df,
x = c("Year", "Month", "DayofMonth", "DayOfWeek", "UniqueCarrier"),
y = "IsDepDelayed",
max_depth = 3,
ntrees = 5)
h2o.download_mojo(model, getwd(), FALSE)
现在从http://www.h2o.ai/download/ 下载最新的稳定版 h2o 并从命令行运行 PrintMojo 工具。
java -cp h2o.jar hex.genmodel.tools.PrintMojo --tree 0 -i model.zip -o model.gv
dot -Tpng model.gv -o model.png
打开模型.png
更多信息:http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html
【讨论】:
源码:github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/main/java/… 你会如何用颜色/形状来表示末端节点的分类?【参考方案2】:3.22.0.1(2018 年 10 月)中引入的新树 API 改变了可视化 H2O 树的整个游戏。一般工作流程可能如下所示: 详细的代码示例可以在这里找到:Finally, You Can Plot H2O Decision Trees in R。
【讨论】:
以上是关于R:从 h2o.randomForest() 和 h2o.gbm() 绘制树的主要内容,如果未能解决你的问题,请参考以下文章
R语言基于h2o包构建二分类模型:使用h2o.randomForest构建随机森林模型使用h2o.auc计算模型的AUC值