libsvm / e1071:获取二进制类的非二进制预测值?
Posted
技术标签:
【中文标题】libsvm / e1071:获取二进制类的非二进制预测值?【英文标题】:libsvm / e1071: Getting non-binary prediction value for binary class? 【发布时间】:2014-10-01 09:22:28 【问题描述】:在我的数据中,最后一列显示了样本的状态,即患病 (1) 或 无病(0),目标是将测试样本分类为 diseased(1) 或 free(0) 尽管预测为“0.2189325”并且 “0.1674805”而不是 0 或 1。
sample.train.data <- structure(list(V1 = c(0.0504799681418526, 0.0674893975400467),
V2 = c(0.375190991689635, 2.62836587379837e-07), V3 = c(0,
0), V4 = c(0, 0), V5 = c(0, 0.123349117705797), V6 = c(0,
0), V7 = c(0.0575526864592394, 4.0318003466356e-08), V8 = c(0,
0), V9 = c(0, 0.0819121309767076), V10 = c(0.0837245737400836,
5.8652477615664e-08), V11 = c(0, 0), V12 = c(0, 0), V13 = c(0,
0), V14 = c(0, 0), V15 = c(0, 0), V16 = c(0, 0), V17 = c(0,
0), V18 = c(0.0115973088249164, 8.12438769013043e-09), V19 = c(0,
0), V20 = c(0, 0), V21 = c(0, 0.0642970332370127), V22 = c(0,
0), V23 = c(0, 0), V24 = c(0, 0), V25 = c(0, 0), V26 = c(0,
0), V27 = c(0, 0), V28 = c(0, 0), V29 = c(0, 0), V30 = c(0,
0), V31 = c(0, 0.100087661334886), V32 = c(0, 0), V33 = c(0,
0), V34 = c(0.132277333556899, 9.2665665514059e-08), V35 = c(0.00157299602821123,
1.1019478536923e-09), V36 = c(0.121318235645494, 0.162196905737495
), V37 = c(0, 0), V38 = c(0.0661915890298985, 0.088495112621564
), V39 = c(0.10009431688377, 0.133821501722926), V40 = c(0,
0.039928021903824), V41 = c(0, 0), V42 = c(0, 0), V43 = c(0,
0), V44 = c(0, 0), V45 = c(0, 0.105729116180691), V46 = c(0,
0), V47 = c(0, 0), V48 = c(0, 0), V49 = c(0, 0), V50 = c(0,
0.0230295773750142), V51 = c(0, 0.00966395996496688), V52 = c(0,
0), V53 = c(0, 0), V54 = c(0, 1)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
"V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21",
"V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V30",
"V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", "V39",
"V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", "V48",
"V49", "V50", "V51", "V52", "V53", "V54"), row.names = 1:2, class = "data.frame")
sample.test.data <- structure(list(V1 = c(0, 0.0502553931936882), V2 = c(0.32474835570625,
0.373521844489033), V3 = c(0, 0), V4 = c(0, 0), V5 = c(0.0798572088141946,
0.09185084822725), V6 = c(0, 0), V7 = c(0, 0), V8 = c(0.0913439079721602,
4.76496954607063e-08), V9 = c(0, 0), V10 = c(0.0724682048784116,
0.0833521004105655), V11 = c(0, 0), V12 = c(0, 0.00380492674778399
), V13 = c(0, 0), V14 = c(0.0300930020345612, 1.56980625668248e-08
), V15 = c(0.022461356489053, 1.17170024810405e-08), V16 = c(0.037002165179523,
0.0425594671846318), V17 = c(0, 0), V18 = c(0.0100381060711198,
5.23639406184491e-09), V19 = c(0, 0), V20 = c(0, 0), V21 = c(0,
0), V22 = c(0, 0), V23 = c(0, 0), V24 = c(0, 0), V25 = c(0, 0.0150866858339266
), V26 = c(0, 0.0282083101023333), V27 = c(0, 0), V28 = c(0,
0), V29 = c(0, 0), V30 = c(0, 0), V31 = c(0, 0.0745294069522065
), V32 = c(0, 0), V33 = c(0, 0), V34 = c(0.114493278147107, 0.131688859030858
), V35 = c(0, 0), V36 = c(0.105007578581866, 5.47773710537665e-08
), V37 = c(0, 0), V38 = c(0, 0), V39 = c(0.0866371142792093,
0.0996490179492987), V40 = c(0.0258497218465435, 1.34845486806539e-08
), V41 = c(0, 0), V42 = c(0, 0), V43 = c(0, 0), V44 = c(0, 0.00549299131535034
), V45 = c(0, 0), V46 = c(0, 0), V47 = c(0, 0), V48 = c(0, 0),
V49 = c(0, 0), V50 = c(0, 0), V51 = c(0, 0), V52 = c(0, 0
), V53 = c(0, 0), V54 = c(0, 0)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
"V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21",
"V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V30",
"V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", "V39",
"V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", "V48",
"V49", "V50", "V51", "V52", "V53", "V54"), row.names = 81:82, class = "data.frame")
disease.col <- paste("V", ncol(sample.train.data), sep= '')
f <- paste(disease.col, " ~ . ", sep="")
svm.model <- svm(as.formula(f), data=sample.train.data, cost=100, gamma=1)
svm.pred <- predict(svm.model, sample.test.data[, -ncol(sample.test.data)])
comp.table <- table(pred=svm.pred, true = sample.test.data[, ncol(sample.test.data)])
print(comp.table)
输出:
true
pred 0
0.16748052821151 1
0.21893247843041 1
如您所见,预测输出为 0.167 和 0.218,而样本只能分类为 0 或 1,这也是 svm 的训练数据分类的方式。
注意:我这里复制了样本,实际训练数据有 80 个样本,测试数据有 20 个。这只是训练和测试数据的一个样本,每个样本有两个样本。另外,创建 svm.model 的警告信息不是由实际数据产生的。
我曾尝试为 svm 模型使用不同的成本或 gamma 值,不同的数据组合,即使测试数据具有样本的状态(0,1),我仍然得到类似的结果。如果有人能让我知道我做错了什么,我将不胜感激。
【问题讨论】:
在另一个帖子中回复了类似的问题。以下是链接:***.com/a/37697836/4861626 【参考方案1】:您的响应变量应该是触发分类行为的一个因素。在你的例子中,这将是
sample.train.data$V54<-factor(sample.train.data$V54)
这会将 V54 从数字转换为因子。然后你可以以完全相同的方式运行代码。
【讨论】:
以上是关于libsvm / e1071:获取二进制类的非二进制预测值?的主要内容,如果未能解决你的问题,请参考以下文章
R:在 e1071 包中是不是有比 libsvm 替代的 SVM 实现? [关闭]