将置换后的数据放入 LibSVM 预计算内核

Posted

技术标签:

【中文标题】将置换后的数据放入 LibSVM 预计算内核【英文标题】:Putting permuted data into LibSVM precomputed kernel 【发布时间】:2014-06-21 14:21:55 【问题描述】:

我目前正在做非常简单的 SVM 分类。我在 LibSVM 中使用带有 RBF 和 DTW 的预计算内核。

当我计算相似度(核)矩阵时,一切似乎都运行良好......直到我置换数据,然后计算核矩阵。

SVM 当然对于输入数据的排列是不变的。在下面的 Matlab 代码中,标有 '

我的 csv 文件格式为:LABEL、val1、val2、...、valN,所有 csv 文件都存储在文件夹 dirName 中。因此,字符串数组包含条目 '10_0.csv 10_1.csv .... 11_7.csv, 11_8.csv'(未置换)或置换时的其他顺序。

我也尝试过置换样本序列号的向量,但这没有什么区别。

function [SimilarityMatrixTrain, SimilarityMatrixTest, trainLabels, testLabels, PermSimilarityMatrixTrain, PermSimilarityMatrixTest, permTrainLabels, permTestLabels] = computeDistanceMatrix(dirName, verificationClass, trainFrac)
fileList = getAllFiles(dirName);
fileList = fileList(1:36);
trainLabels = [];
testLabels = [];
trainFiles = ;
testFiles = ;
permTrainLabels = [];
permTestLabels = [];
permTrainFiles = ;
permTestFiles = ;

n = 0;
sigma = 0.01;

trainFiles = fileList(1:2:end);
testFiles = fileList(2:2:end);

rng(3);
permTrain = randperm(length(trainFiles))
%rng(3); <- !!!!!!!!!!!
permTest = randperm(length(testFiles));

permTrainFiles = trainFiles(permTrain)
permTestFiles = testFiles(permTest);

noTrain = size(trainFiles);
noTest = size(testFiles);

SimilarityMatrixTrain = eye(noTrain);
PermSimilarityMatrixTrain = (noTrain);
SimilarityMatrixTest = eye(noTest);
PermSimilarityMatrixTest = eye(noTest);

% UNPERM
%Train
for i = 1 : noTrain
    x = csvread(trainFilesi);   
    label = x(1);
    trainLabels = [trainLabels, label];
    for j = 1 : noTrain
        y = csvread(trainFilesj);            
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma));
        SimilarityMatrixTrain(i, j) = rbfValue;
        n=n+1
    end
end

SimilarityMatrixTrain = [(1:size(SimilarityMatrixTrain, 1))', SimilarityMatrixTrain];

%Test
for i = 1 : noTest
    x = csvread(testFilesi);
    label = x(1);
    testLabels = [testLabels, label];
    for j = 1 : noTest
        y = csvread(testFilesj);            
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma));
        SimilarityMatrixTest(i, j) = rbfValue;
        n=n+1
    end
end

SimilarityMatrixTest = [(1:size(SimilarityMatrixTest, 1))', SimilarityMatrixTest];

% PERM
%Train
for i = 1 : noTrain
    x = csvread(permTrainFilesi);        
    label = x(1);
    permTrainLabels = [permTrainLabels, label];
    for j = 1 : noTrain
        y = csvread(permTrainFilesj);            
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma));
        PermSimilarityMatrixTrain(i, j) = rbfValue;
        n=n+1
    end
end

PermSimilarityMatrixTrain = [(1:size(PermSimilarityMatrixTrain, 1))', PermSimilarityMatrixTrain];

%Test
for i = 1 : noTest
    x = csvread(permTestFilesi);
    label = x(1);
    permTestLabels = [permTestLabels, label];
    for j = 1 : noTest
        y = csvread(permTestFilesj);            
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma));
        PermSimilarityMatrixTest(i, j) = rbfValue;
        n=n+1
    end
end

PermSimilarityMatrixTest = [(1:size(PermSimilarityMatrixTest, 1))', PermSimilarityMatrixTest];

mdlU = svmtrain(trainLabels', SimilarityMatrixTrain, '-t 4 -c 0.5');
mdlP = svmtrain(permTrainLabels', PermSimilarityMatrixTrain, '-t 4 -c 0.5');

[pclassU, xU, yU] = svmpredict(testLabels', SimilarityMatrixTest, mdlU);
[pclassP, xP, yP] = svmpredict(permTestLabels', PermSimilarityMatrixTest, mdlP);

xU    
xP

end

我会非常感谢任何答案!

问候 本杰明

【问题讨论】:

好吧,我不知道 *** 是否适合我的问题,所以我决定也将其发布到 stats.stackexchange.com (stats.stackexchange.com/questions/96452/…)。随时在这里或那里回答我的问题。亲爱的版主:如果这对你来说不合适,请随时删除我的帖子。非常感谢! 【参考方案1】:

在清理代码并让我的一位同事查看后,我们/他终于找到了错误。当然,我必须从训练 测试样本中计算测试矩阵(让 SVM 通过使用训练向量的 alpha 值乘积的总和来预测测试数据(它们是非支持向量为零))。希望这可以为你们中的任何人澄清问题。为了更清楚,请参阅下面的修改后的代码。但是,例如在using precomputed kernels with libsvm 中,眼睛敏锐的人也可以看到带有训练和测试向量的测试矩阵的计算。如果您有任何进一步的评论/问题/提示,请随时在此帖子中添加 cmets 或/和答案!

function [tacc, testacc, mdl, SimilarityMatrixTrain, SimilarityMatrixTest, trainLabels, testLabels] = computeSimilarityMatrix(dirName)
fileList = getAllFiles(dirName);
fileList = fileList(1:72);
trainLabels = [];
testLabels = [];
trainFiles = ;
testFiles = ;   
n = 0;
sigma = 0.01;

trainFiles = fileList(1:2:end);
testFiles = fileList(2:5:end);

noTrain = size(trainFiles);
noTest = size(testFiles);

permTrain = randperm(noTrain(1));
permTest = randperm(noTest(1));

trainFiles = trainFiles(permTrain);
testFiles = testFiles(permTest);

%Train
for i = 1 : noTrain(1)
    x = csvread(trainFilesi);
    label = x(1);
    trainlabel = label;
    trainLabels = [trainLabels, label];
    for j = 1 : noTrain(1)
        y = csvread(trainFilesj);
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma.^2));
        SimilarityMatrixTrain(i, j) = rbfValue;
    end
end

SimilarityMatrixTrain = [(1:size(SimilarityMatrixTrain, 1))', SimilarityMatrixTrain];

%Test
for i = 1 : noTest(1)
    x = csvread(testFilesi);
    label = x(1);
    testlabel = label;
    testLabels = [testLabels, label];
    for j = 1 : noTrain(1)
        y = csvread(trainFilesj);     
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma.^2));
        SimilarityMatrixTest(i, j) = rbfValue;

    end
end

SimilarityMatrixTest = [(1:size(SimilarityMatrixTest, 1))', SimilarityMatrixTest];

mdlU = svmtrain(trainLabels', SimilarityMatrixTrain, '-t 4 -c 1000 -q');
fprintf('TEST: '); [pclassU, xU, yU] = svmpredict(testLabels', SimilarityMatrixTest, mdlU);
fprintf('TRAIN: ');[pclassT, xT, yT] = svmpredict(trainLabels', SimilarityMatrixTrain, mdlU);

tacc = xT(1);
testacc = xU(1);
mdl = mdlU;

end

问候 本杰明

【讨论】:

以上是关于将置换后的数据放入 LibSVM 预计算内核的主要内容,如果未能解决你的问题,请参考以下文章

为啥在 matlab 中使用带有 libsvm 的预计算内核

在 Python 中使用 LibSVM 预计算内核

Libsvm 预计算内核

带有预计算内核的 libsvm:如何计算分类分数?

将预计算的 chi2 内核与 libsvm (matlab) 一起使用时结果不佳

libsvm 交叉验证与 matlab 中的预计算内核