使用 caret/gbm 的多项分类器的 mnLogloss 错误
Posted
技术标签:
【中文标题】使用 caret/gbm 的多项分类器的 mnLogloss 错误【英文标题】:error with mnLogloss for multinomial classifier using caret/gbm 【发布时间】:2020-10-12 20:52:06 【问题描述】:我正在尝试执行多项分类器。它似乎有效,我能够生成一个最小化 logLoss 与提升迭代的图,但是我无法提取错误值。这是我运行 mnLogLoss 函数时的错误。
Error in mnLogLoss(predicted, lev = predicted$label) :
'data' should have columns consistent with 'lev'
data has been partitioned into.
-training
-testing
-in both, the column "label" contains the ground truth
library(MLmetrics)
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, verboseIter = FALSE,
savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
gbmGrid1 <- expand.grid(.interaction.depth = (1:3), .n.trees = (1:10)*20, .shrinkage = 0.01, .n.minobsinnode = 3)
system.time(
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, metric = "logLoss", tuneGrid = gbmGrid1)
)
gbmPredictions <- predict(gbmFit1, testing)
predicted <- cbind(gbmPredictions, testing)
mnLogLoss(predicted, lev = levels(predicted$label))
【问题讨论】:
【参考方案1】:对于 mnLogLoss,它在小插图中说:
data: a data frame with columns ‘obs’ and ‘pred’ for the observed
and predicted outcomes. For metrics that rely on class
probabilities, such as ‘twoClassSummary’, columns should also
include predicted probabilities for each class. See the
‘classProbs’ argument to ‘trainControl’.
所以它不要求训练数据。这里的data参数只是一个输入,所以我使用了一些模拟数据:
library(caret)
df = data.frame(label=factor(sample(c("a","b"),100,replace=TRUE)),
matrix(runif(500),ncol=50))
training = df[1:50,]
testing = df[1:50,]
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, verboseIter = FALSE,
savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
gbmGrid1 <- expand.grid(.interaction.depth = (1:3), .n.trees = (1:10)*20, .shrinkage = 0.01, .n.minobsinnode = 3)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,verbose = 1, metric = "logLoss", tuneGrid = gbmGrid1)
)
我们将obs
、pred
放在一起,最后两列是每个类别的概率:
predicted <- data.frame(obs=testing$label,
pred=predict(gbmFit1, testing),
predict(gbmFit1, testing,type="prob"))
head(predicted)
obs pred a b
1 b a 0.5506054 0.4493946
2 b a 0.5107631 0.4892369
3 a b 0.4859799 0.5140201
4 b a 0.5090264 0.4909736
5 b b 0.4545746 0.5454254
6 a a 0.6211514 0.3788486
mnLogLoss(predicted, lev = levels(predicted$obs))
logLoss
0.6377392
【讨论】:
以上是关于使用 caret/gbm 的多项分类器的 mnLogloss 错误的主要内容,如果未能解决你的问题,请参考以下文章