从 ctree 对象中提取预测变量

Posted

技术标签:

【中文标题】从 ctree 对象中提取预测变量【英文标题】:extracting predictors from ctree object 【发布时间】:2013-07-16 19:15:10 【问题描述】:

我检查了 binary tree 类方法和 How to extract tree structure from ctree function?(这有助于理解 S4 对象结构和插槽),但仍不清楚如何获得 ctree 对象的最终预测变量。对于rpart,我会使用类似

 extract_preds <- function( tt )
   leaves <- tt$frame$var == '<leaf>'
   as.character( unique( tt$frame$var[ leaves==F ] ) )
 

是否有类似的快捷方式可用,还是我必须编写一个递归函数来遍历ctree 对象并提取预测变量?那,还是带有打印输出的正则表达式?谢谢。

更新:使用 baydoganm 的代码如下。仍然需要弄清楚如何通过递归正确更新res

 library(party)

 ctree_preds <- function(tr,vnames)    
    res <- character(0)
    traverse <- function(treenode,vnames,res)
    if(treenode$terminal)
        return(res)
     else 
        res <- c(res,vnames[treenode$psplit$variableID])
        traverse(treenode$left , vnames, res )
        traverse(treenode$right, vnames, res )
        
    
    traverse(tr,vnames,res)
    return(unique(res))
 

 airq <- subset(airquality, !is.na(Ozone))
 airct <- ctree(Ozone ~ ., data = airq,
                         controls = ctree_control(maxsurrogate = 3))
 plot(airct)

 ctree_preds(airct@tree,names(airq)[-1])

【问题讨论】:

你必须遍历树。 【参考方案1】:

下面是我实现的从ctree 对象遍历树的脚本。我在party 包中使用相同的示例,即airct 数据集。

require(party)
data(airquality)

traverse <- function(treenode)
    if(treenode$terminal)
        bas=paste("Current node is terminal node with",treenode$nodeID,'prediction',treenode$prediction)
        print(bas)
        return(0)
     else 
        bas=paste("Current node",treenode$nodeID,"Split var. ID:",treenode$psplit$variableName,"split value:",treenode$psplit$splitpoint,'prediction',treenode$prediction)
        print(bas)

traverse(treenode$left)
traverse(treenode$right)


airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq,
controls = ctree_control(maxsurrogate = 3))
plot(airct)

traverse(airct@tree)

这个函数,traverse,只是以深度优先的顺序遍历树。可以通过改变递归部分来改变遍历的顺序。

另外,如果要返回其他节点特征,我建议检查 ctree 对象的结构。

编辑:次要代码修订。

【讨论】:

我已经运行了上面的代码并且节点varriableID 没有打印出来。由于目标是获得一个带有预测变量名称的向量,因此我正在使用您的代码来解决这个问题(请参阅问题)。我正在苦苦挣扎的是如何以类似于 C 的地址运算符的方式通过递归更新res 我不确定您是否使用相同的派对包,但上面的代码对我有用。如果您有兴趣打印变量的名称,只需将treenode$psplit$variableID 更改为treenode$psplit$variableName。但是,我不确定这是否是您要问的。我还稍微更新了代码。 我使用函数capture.output将值存储在data.frame中以供进一步处理【参考方案2】:

mlmeta R 包的 ctree2sas() 函数将拟合的 ctree 模型转换为 SAS 代码。它可以很容易地适应其他语言,并且通常对对象的内部具有指导意义。

【讨论】:

【参考方案3】:
split <- 
c(cart@tree$psplit$splitpoint , cart@tree$right$psplit$splitpoint , cart@tree$left$psplit$splitpoint , cart@tree$right$right$psplit$splitpoint , cart@tree$right$left$psplit$splitpoint , cart@tree$left$right$psplit$splitpoint , cart@tree$left$left$psplit$splitpoint , cart@tree$right$right$right$psplit$splitpoint , cart@tree$right$right$left$psplit$splitpoint , cart@tree$right$left$right$psplit$splitpoint , cart@tree$right$left$left$psplit$splitpoint , cart@tree$left$right$right$psplit$splitpoint , cart@tree$left$right$left$psplit$splitpoint , cart@tree$left$left$right$psplit$splitpoint , cart@tree$left$left$left$psplit$splitpoint , cart@tree$right$right$right$right$psplit$splitpoint , cart@tree$right$right$right$left$psplit$splitpoint , cart@tree$right$right$left$right$psplit$splitpoint , cart@tree$right$right$left$left$psplit$splitpoint , cart@tree$right$left$right$right$psplit$splitpoint , cart@tree$right$left$right$left$psplit$splitpoint , cart@tree$right$left$left$right$psplit$splitpoint , cart@tree$right$left$left$left$psplit$splitpoint , cart@tree$left$right$right$right$psplit$splitpoint , cart@tree$left$right$right$left$psplit$splitpoint , cart@tree$left$right$left$right$psplit$splitpoint , cart@tree$left$right$left$left$psplit$splitpoint , cart@tree$left$left$right$right$psplit$splitpoint , cart@tree$left$left$right$left$psplit$splitpoint , cart@tree$left$left$left$right$psplit$splitpoint , cart@tree$left$left$left$left$psplit$splitpoint , cart@tree$left$left$left$left$left$psplit$splitpoint , cart@tree$left$left$left$left$right$psplit$splitpoint , cart@tree$left$left$left$right$left$psplit$splitpoint , cart@tree$left$left$left$right$right$psplit$splitpoint , cart@tree$left$left$right$left$left$psplit$splitpoint , cart@tree$left$left$right$left$right$psplit$splitpoint , cart@tree$left$left$right$right$left$psplit$splitpoint , cart@tree$left$left$right$right$right$psplit$splitpoint , cart@tree$left$right$left$left$left$psplit$splitpoint , cart@tree$left$right$left$left$right$psplit$splitpoint , cart@tree$left$right$left$right$left$psplit$splitpoint , cart@tree$left$right$left$right$right$psplit$splitpoint , cart@tree$left$right$right$left$left$psplit$splitpoint , cart@tree$left$right$right$left$right$psplit$splitpoint , cart@tree$left$right$right$right$left$psplit$splitpoint , cart@tree$left$right$right$right$right$psplit$splitpoint , cart@tree$right$left$left$left$left$psplit$splitpoint , cart@tree$right$left$left$left$right$psplit$splitpoint , cart@tree$right$left$left$right$left$psplit$splitpoint , cart@tree$right$left$left$right$right$psplit$splitpoint , cart@tree$right$left$right$left$left$psplit$splitpoint , cart@tree$right$left$right$left$right$psplit$splitpoint , cart@tree$right$left$right$right$left$psplit$splitpoint , cart@tree$right$left$right$right$right$psplit$splitpoint , cart@tree$right$right$left$left$left$psplit$splitpoint , cart@tree$right$right$left$left$right$psplit$splitpoint , cart@tree$right$right$left$right$left$psplit$splitpoint , cart@tree$right$right$left$right$right$psplit$splitpoint , cart@tree$right$right$right$left$left$psplit$splitpoint , cart@tree$right$right$right$left$right$psplit$splitpoint , cart@tree$right$right$right$right$left$psplit$splitpoint , cart@tree$right$right$right$right$right$psplit$splitpoint)

split <- split[order(split)]

【讨论】:

以上是关于从 ctree 对象中提取预测变量的主要内容,如果未能解决你的问题,请参考以下文章

从 varImp 中提取预测变量名称

如何获取所有终端节点 - r 中的权重和响应预测“ctree”

如何在 R 中为模型构建一个大的正则公式?

利用多元线性回归法,从大量数据中提取五个因变量来预测一个自变量—Jason niu

ctree()的终端节点如何提取拆分规则

深度学习时间序列预测如何构建矩阵