R中的简单决策树--来自Caret包的奇怪结果

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R中的简单决策树--来自Caret包的奇怪结果相关的知识,希望对你有一定的参考价值。

我想用caret包对以下数据集应用一个简单的决策树,数据是。

> library(caret)
> mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
> mydata$rank <- factor(mydata$rank)
  # create dummy variables
> X = predict(dummyVars(~ ., data=mydata), mydata)
> head(X)

    A matrix: 6 × 7 of type dbl     
admit   gre gpa rank.1  rank.2  rank.3  rank.4
    0   380 3.61    0        0        1      0
    1   660 3.67    0        0        1      0
    1   800 4.00    1        0        0      0
    1   640 3.19    0        0        0      1
    0   520 2.93    0        0        0      1
    1   760 3.00    0        1        0      0

分为训练集和测试集

> trainset <- data.frame(X[1:300,])
> testset <- data.frame(X[301:400,])

现在应用决策树

> tree <- train(factor(admit) ~., data = trainset, method = "rpart")
> tree

CART 

300 samples
  6 predictor
  2 classes: '0', '1' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 300, 300, 300, 300, 300, 300, ... 
Resampling results across tuning parameters:

 cp          Accuracy   Kappa    
0.01956522  0.6856163  0.1865179
0.03260870  0.6888378  0.1684015
0.08695652  0.7080434  0.1079462

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.08695652.

我得到的结果是 NaN 重要性不一!为什么?

> varImp(tree)$importance

A data.frame: 6 × 1     Overall
<dbl>
gre NaN
gpa NaN
rank.1  NaN
rank.2  NaN
rank.3  NaN
rank.4  NaN

而在预测中,决策树只输出一个类,即0类,为什么?我的代码有什么问题?先谢谢你了。

> y_pred <- predict(tree ,newdata=testset)
> y_test <- factor(testset$admit)
> confusionMatrix(y_pred, factor(y_test))

Confusion Matrix and Statistics

      Reference
Prediction  0  1
         0 65 35
         1  0  0

           Accuracy : 0.65            
             95% CI : (0.5482, 0.7427)
No Information Rate : 0.65            
P-Value [Acc > NIR] : 0.5458          

              Kappa : 0               

Mcnemar's Test P-Value : 9.081e-09       

        Sensitivity : 1.00            
        Specificity : 0.00            
     Pos Pred Value : 0.65            
     Neg Pred Value :  NaN            
         Prevalence : 0.65            
     Detection Rate : 0.65            
 Detection Prevalence : 1.00            
  Balanced Accuracy : 0.50            

   'Positive' Class : 0           
答案

我不能回答你的问题,但我可以告诉你我用来计算决策树的方法。

library(data.table)
library(tidyverse)
library(caret)
library(rpart)
library(rpart.plot)

# Reading data into data.table
mydata <- fread("https://stats.idre.ucla.edu/stat/data/binary.csv")

# converting rank and admit to factors
mydata$rank  <- as.factor(mydata$rank)
mydata$admit <- as.factor(mydata$admit)

# creating train and test data
t_index  <- createDataPartition(mydata$admit, p=0.75, list=FALSE)
trainset <- mydata[t_index,]
testset  <- mydata[-t_index,]

# calculating the model using rpart
model <- rpart(admit ~ .,
               data = trainset,
               parms = list(split="information"),
               method = "class")

# plotting the decision tree
model %>%
  rpart.plot(digits = 4)

# get confusion matrix
model %>%
  predict(testset, type = "class") %>%
  table(testset$admit) %>%
  confusionMatrix()

也许这对你有点帮助

以上是关于R中的简单决策树--来自Caret包的奇怪结果的主要内容,如果未能解决你的问题,请参考以下文章

插入符号 rpart 决策树绘图结果

ctree 在 R 中的派对包中绘制决策树,终端节点出现一些奇怪的数字 - 问题?

解析决策树(来自 WEKA 分类器)以在 R 中绘图?

r 中的决策树没有与我的训练数据一起形成

R语言使用caretEnsemble包的caretList函数一次性构建多个机器学习模型并使用caret包的resamples函数比较在同一数据集上多个机器学习模型的比较结果

R:保存循环的结果