混淆矩阵 - 错误:`data` 和 `reference` 应该是具有相同级别的因素
Posted
技术标签:
【中文标题】混淆矩阵 - 错误:`data` 和 `reference` 应该是具有相同级别的因素【英文标题】:confusionMatrix - Error: `data` and `reference` should be factors with the same levels 【发布时间】:2021-03-18 03:20:20 【问题描述】:创建了一个 Logistic 模型:
Banks_Logit<- glm(Banks$Financial.Condition ~ .,data = Banks, family="binomial")
options(scipen=999)
summary(Banks_Logit)
然后:
pred <-predict(Banks_Logit,Banks)
gain <-gains(Banks$Financial.Condition,pred,groups=20)
plot(c(0,gain$cume.pct.of.total*sum(Banks$Financial.Condition))~
c(0,gain$cume.obs),
xlab = "Observations", ylab = "Cumulative", main="Model Performance", type="l")
lines(c(0,sum(Banks$Financial.Condition))~c(0,dim(Banks)[1]),lty=2)
library(caret)
confusionMatrix(ifelse(pred >0.5, 1,0), Banks$Financial.Condition)
错误 -
错误:data
和 reference
应该是具有相同水平的因子。
这是预测数据
1 2 3 4 5 6 7 8 9 10
0.9999999999999997779554 0.9999999999999997779554 0.9999999999999997779554 0.9999999999999997779554 0.9999999999999997779554 0.9999999999999997779554 0.9999999999138624584560 0.9999999999891036051025 0.9999999999995110577800 0.9999999999999997779554
11 12 13 14 15 16 17 18 19 20
0.0000000000176082421301 0.0000000000352379135751 0.0000000000431425778626 0.0000000000000002220446 0.0000000000002227450487 0.0000000000000002220446 0.0000000000000002220446 0.0000000000000002220446 0.0000000000000002220446 0.0000000000000002220446
str(pred)
Named num [1:20] 1 1 1 1 1 ...
- attr(*, "names")= chr [1:20] "1" "2" "3" "4" ...
this is the dataset (Str(Banks):
'data.frame': 20 obs. of 5 variables:
$ Obs : int 1 2 3 4 5 6 7 8 9 10 ...
$ Financial.Condition: int 1 1 1 1 1 1 1 1 1 1 ...
$ TotCap.Assets : num 9.7 1 6.9 5.8 4.3 9.1 11.9 8.1 9.3 1.1 ...
$ TotExp.Assets : num 0.12 0.11 0.09 0.1 0.11 0.13 0.1 0.13 0.16 0.16 ...
$ TotLns.Lses.Assets : num 0.65 0.62 1.02 0.67 0.69 0.74 0.79 0.63 0.72 0.57 ...
【问题讨论】:
我想我只是误解了级别的事情。pred
和 Banks$Financial.Condition
都必须是因子(请参阅 ?confusionMatrix
,如果您对因子感到困惑,请参阅 ?factor)。如果你只需要一张桌子,table(ifelse(pred >0.5, 1,0), Banks$Financial.Condition)
就可以了。
【参考方案1】:
一些示例数据:
Banks = data.frame(Obs = 1:100,Financial.Condition=rbinom(100,1,0.5),
TotCap.Assets = runif(100),
TotExp.Assets = runif(100),TotLns.Lses.Assets = runif(100))
您可以只提供confusionMatrix
的表格,以获取其他指标:
library(caret)
Banks_Logit<- glm(Banks$Financial.Condition ~ .,data = Banks, family="binomial")
pred <-predict(Banks_Logit,Banks)
confusionMatrix(table(ifelse(pred >0.5, 1,0), Banks$Financial.Condition))
Confusion Matrix and Statistics
0 1
0 36 33
1 8 23
Accuracy : 0.59
95% CI : (0.4871, 0.6874)
No Information Rate : 0.56
P-Value [Acc > NIR] : 0.3084356
Kappa : 0.2158
Mcnemar's Test P-Value : 0.0001781
Sensitivity : 0.8182
Specificity : 0.4107
Pos Pred Value : 0.5217
Neg Pred Value : 0.7419
Prevalence : 0.4400
Detection Rate : 0.3600
Detection Prevalence : 0.6900
Balanced Accuracy : 0.6144
'Positive' Class : 0
【讨论】:
以上是关于混淆矩阵 - 错误:`data` 和 `reference` 应该是具有相同级别的因素的主要内容,如果未能解决你的问题,请参考以下文章
使用混淆矩阵`data`和`reference`的错误应该是具有相同水平的因素
R:RF模型中的混淆矩阵返回错误:data`和`reference`应该是具有相同水平的因素