在 matlab 中测试 libsvm 时结果不佳

Posted 2023-03-13

技术标签:

【中文标题】在 matlab 中测试 libsvm 时结果不佳【英文标题】：Bad results when testing libsvm in matlab 【发布时间】：2013-03-12 22:53:03 【问题描述】：

有人可以帮我解决这个问题吗？我想测试这个分类是否已经很好。所以，我尝试数据测试=数据训练。如果分类好，它将给出 100% (acc)。这是我从这个网站找到的代码：

data= [170           66           ;
160            50           ;
170            63           ;
173            61           ;
168            58           ;
184            88           ;
189            94           ;
185            88           ]

labels=[-1;-1;-1;-1;-1;1;1;1];

numInst = size(data,1);
numLabels = max(labels);

 testVal = [1 2 3 4 5 6 7 8];
  trainLabel = labels(testVal,:);
  trainData = data(testVal,:);
  testData=data(testVal,:);
  testLabel=labels(testVal,:);
 numTrain = 8; numTest =8

%# train one-against-all models
model = cell(numLabels,1);
for k=1:numLabels
    modelk = svmtrain(double(trainLabel==k), trainData, '-c 1 -t 2 -g 0.2 -b 1');
end

%# get probability estimates of test instances using each model
prob = zeros(numTest,numLabels);
for k=1:numLabels
    [~,~,p] = svmpredict(double(testLabel==k), testData, modelk, '-b 1');
    prob(:,k) = p(:,modelk.Label==1);    %# probability of class==k
end


%# predict the class with the highest probability
[~,pred] = max(prob,[],2);
acc = sum(pred == testLabel) ./ numel(testLabel)    %# accuracy
C = confusionmat(testLabel, pred)                   %# confusion matrix

这是结果：

optimization finished, #iter = 16  
nu = 0.645259 obj = -2.799682, 
rho = -0.437644 nSV = 8, nBSV = 1 Total nSV = 8 
Accuracy = 100% (8/8) (classification)

acc =

    0.3750


C =

     0     5
     0     3

我不知道为什么有两种准确度，而且它不同。第一个是 100%，第二个是 0.375。我的代码是假的吗？它应该是 100% 而不是 37.5%。你能帮我纠正这个代码吗？

【问题讨论】：

【参考方案1】：

如果您使用 libsvm，那么您应该更改 MEX 文件的名称，因为 Matlab 已经有一个名为 svmtrain 的 svm 工具箱。但是，代码正在运行，所以您似乎确实更改了名称，只是没有在您提供的代码上。

第二个是错的，不知道为什么。但是，我可以告诉你，如果你使用 test_Data = training_Data，你几乎总能获得 100% 的准确率。该结果实际上没有任何意义，因为该算法可能会过度拟合并且不会显示在您的结果中。针对新数据测试您的算法，这将为您提供真实的准确性。

【讨论】：

【参考方案2】：

这是您使用的代码吗？我认为您的 svmtrain 调用无效。你应该有svmtrain(MAT, VECT, ...)，其中MAT 是一个数据矩阵，VECT 是一个带有MAT 每行标签的向量。其余参数是字符串值对，这意味着您将拥有一个字符串标识符及其对应的值。

当我运行您的代码（Linux，R2011a）时，我在调用 svmtrain 时遇到错误。使用svmtrain(trainData, double(trainLabel==k)) 运行给出了有效的输出（对于该行）。当然，您似乎没有使用纯 matlab，因为您的 svmpredict 调用不是本机 matlab，而是来自 LIBSVM 的 matlab 绑定...

【讨论】：

是的，对不起，我没有提到我使用了 libsvm.. 在这里我使用了 libsvm。【参考方案3】：

C = 混淆垫（testLabel，pred）交换位置

C=confusionmat(pred,testLabel)

或者使用这个

[ConMat,order] = 混淆垫(pred,testLabel)

显示混淆矩阵和类顺序

【讨论】：

【参考方案4】：

问题出在

[~,~,p] = svmpredict(double(testLabel==k), testData, modelk, '-b 1');

p 不包含预测标签，它具有标签正确的概率估计。 LIBSVM 的svmpredict 已经为您正确计算了准确度，这就是为什么它在调试输出中显示 100%。修复很简单：

[p,~,~] = svmpredict(double(testLabel==k), testData, modelk, '-b 1');

根据 LIBSVM 的 Matlab 绑定自述文件：

The function 'svmpredict' has three outputs. The first one,
predictd_label, is a vector of predicted labels. The second output,
accuracy, is a vector including accuracy (for classification), mean
squared error, and squared correlation coefficient (for regression).
The third is a matrix containing decision values or probability
estimates (if '-b 1' is specified). If k is the number of classes
in training data, for decision values, each row includes results of 
predicting k(k-1)/2 binary-class SVMs. For classification, k = 1 is a
special case. Decision value +1 is returned for each testing instance,
instead of an empty vector. For probabilities, each row contains k values
indicating the probability that the testing instance is in each class.
Note that the order of classes here is the same as 'Label' field
in the model structure.

【讨论】：

【参考方案5】：

很抱歉，所有的答案都是完全错误的！！代码中的主要错误是：

numLabels = max(labels);

因为它返回 (1)，尽管如果标签是正数它应该返回 2，然后 svmtrain/svmpredict 将循环两次。

不管怎样，换行labels=[-1;-1;-1;-1;-1;1;1;1]; 到labels=[2;2;2;2;2;1;1;1]; 它将成功运行;)

【讨论】：

以上是关于在 matlab 中测试 libsvm 时结果不佳的主要内容，如果未能解决你的问题，请参考以下文章