混淆矩阵 - 错误：`data` 和 `reference` 应该是具有相同级别的因素

Posted 2023-03-12

技术标签:

【中文标题】混淆矩阵 - 错误：`data` 和 `reference` 应该是具有相同级别的因素【英文标题】：confusionMatrix - Error: `data` and `reference` should be factors with the same levels 【发布时间】：2021-03-18 03:20:20 【问题描述】：

创建了一个 Logistic 模型：

Banks_Logit<- glm(Banks$Financial.Condition ~ .,data = Banks, family="binomial")
options(scipen=999)

summary(Banks_Logit)

然后：

pred <-predict(Banks_Logit,Banks)
gain <-gains(Banks$Financial.Condition,pred,groups=20)

plot(c(0,gain$cume.pct.of.total*sum(Banks$Financial.Condition))~
   c(0,gain$cume.obs), 
 xlab = "Observations", ylab = "Cumulative", main="Model Performance", type="l")
      lines(c(0,sum(Banks$Financial.Condition))~c(0,dim(Banks)[1]),lty=2)

library(caret)
confusionMatrix(ifelse(pred >0.5, 1,0), Banks$Financial.Condition)

错误 - 错误：data 和 reference 应该是具有相同水平的因子。

这是预测数据

        1                        2                        3                        4                        5                        6                        7                        8                        9                       10 
0.9999999999999997779554 0.9999999999999997779554 0.9999999999999997779554 0.9999999999999997779554 0.9999999999999997779554 0.9999999999999997779554 0.9999999999138624584560 0.9999999999891036051025 0.9999999999995110577800 0.9999999999999997779554 
                      11                       12                       13                       14                       15                       16                       17                       18                       19                       20 
0.0000000000176082421301 0.0000000000352379135751 0.0000000000431425778626 0.0000000000000002220446 0.0000000000002227450487 0.0000000000000002220446 0.0000000000000002220446 0.0000000000000002220446 0.0000000000000002220446 0.0000000000000002220446 



str(pred)
 Named num [1:20] 1 1 1 1 1 ...
 - attr(*, "names")= chr [1:20] "1" "2" "3" "4" ...



this is the dataset (Str(Banks):
'data.frame':   20 obs. of  5 variables:
 $ Obs                : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Financial.Condition: int  1 1 1 1 1 1 1 1 1 1 ...
 $ TotCap.Assets      : num  9.7 1 6.9 5.8 4.3 9.1 11.9 8.1 9.3 1.1 ...
 $ TotExp.Assets      : num  0.12 0.11 0.09 0.1 0.11 0.13 0.1 0.13 0.16 0.16 ...
 $ TotLns.Lses.Assets : num  0.65 0.62 1.02 0.67 0.69 0.74 0.79 0.63 0.72 0.57 ...

【问题讨论】：

我想我只是误解了级别的事情。 pred 和 Banks$Financial.Condition 都必须是因子（请参阅 ?confusionMatrix，如果您对因子感到困惑，请参阅 ?factor）。如果你只需要一张桌子，table(ifelse(pred >0.5, 1,0), Banks$Financial.Condition) 就可以了。 【参考方案1】：

一些示例数据：

Banks = data.frame(Obs = 1:100,Financial.Condition=rbinom(100,1,0.5),
TotCap.Assets = runif(100),
TotExp.Assets = runif(100),TotLns.Lses.Assets = runif(100))

您可以只提供confusionMatrix 的表格，以获取其他指标：

library(caret)

Banks_Logit<- glm(Banks$Financial.Condition ~ .,data = Banks, family="binomial")
pred <-predict(Banks_Logit,Banks)
confusionMatrix(table(ifelse(pred >0.5, 1,0), Banks$Financial.Condition))

Confusion Matrix and Statistics

   
     0  1
  0 36 33
  1  8 23
                                          
               Accuracy : 0.59            
                 95% CI : (0.4871, 0.6874)
    No Information Rate : 0.56            
    P-Value [Acc > NIR] : 0.3084356       
                                          
                  Kappa : 0.2158          
                                          
 Mcnemar's Test P-Value : 0.0001781       
                                          
            Sensitivity : 0.8182          
            Specificity : 0.4107          
         Pos Pred Value : 0.5217          
         Neg Pred Value : 0.7419          
             Prevalence : 0.4400          
         Detection Rate : 0.3600          
   Detection Prevalence : 0.6900          
      Balanced Accuracy : 0.6144          
                                          
       'Positive' Class : 0

【讨论】：

以上是关于混淆矩阵 - 错误：`data` 和 `reference` 应该是具有相同级别的因素的主要内容，如果未能解决你的问题，请参考以下文章