LibSVM学习---中英文对照使用手册完整版暨了解readme文件

Posted 2023-02-28 WillWinwin

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了LibSVM学习---中英文对照使用手册完整版暨了解readme文件相关的知识，希望对你有一定的参考价值。

本文主要内容是
（1）简单介绍LibSVM
（2）精读并翻译LibSVM中的readme文件

一、LibSVM是什么？
LIBSVM是台湾大学林智仁(Lin Chih-Jen)教授等开发设计的一个简单、易于使用和快速有效的SVM模式识别与回归的软件包，他不但提供了编译好的可在Windows系列系统的执行文件，还提供了源代码，方便改进、修改以及在其它操作系统上应用；该软件对SVM所涉及的参数调节相对比较少，提供了很多的默认参数，利用这些默认参数可以解决很多问题；并提供了交互检验(Cross Validation)的功能。该软件可以解决C-SVM、ν-SVM、ε-SVR和ν-SVR等问题，包括基于一对一算法的多类模式识别问题。

–来自百度百科

二、LibSVM中的readme文件的翻译以及理解

作者肯定把最希望读者看到的内容放到了readme当中，所以如果想要深入理解某个project，首先就应该深入理解readme文件。

Libsvm is a simple, easy-to-use, and efficient software for SVM
classification and regression. It solves C-SVM classification, nu-SVM
classification, one-class-SVM, epsilon-SVM regression, and nu-SVM
regression. It also provides an automatic model selection tool for
C-SVM classification. This document explains the use of libsvm.

Libsvm is available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm
Please read the COPYRIGHT file before using libsvm.

Libsvm是一个简单、易用、有效的SVM分类和回归软件。它解决了C-SVM分类、nu-SVM分类、one-class-SVM、epsilon-SVM回归和nu-SVM回归。它还为C-SVM提供了一个自动模型选择工具。这个文档将解释libsvm的用法。
Libsvm可以从这里得到：click to get source
在使用libsvm之前先阅读COPYRIGHT文件。

Table of Contents

Quick Start
Installation and Data Format
`svm-train’ Usage
`svm-predict’ Usage
`svm-scale’ Usage
Tips on Practical Use
Examples
Precomputed Kernels
Library Usage
Java Version
Building Windows Binaries
Additional Tools: Sub-sampling, Parameter Selection, Format checking, etc.
MATLAB/OCTAVE Interface
Python Interface
Additional Information

－快速开始
－安装和数据格式
－“svm-train”的用法
－“svm-predict”的用法
－“svm-scale”的用法
－实际使用技巧
－例子

Quick Start

If you are new to SVM and if the data is not large, please go to
`tools’ directory and use easy.py after installation. It does
everything automatic – from data scaling to parameter selection.

Usage: easy.py training_file [testing_file]

More information about parameter selection can be found in
`tools/README.’

快速开始

如果你是SVM的一个新手，并且数据不大，那么在安装完成之后请打开tools目录使用easy.py。它将把一切全自动化，从数据规化到参数选择。
用法：easy.py training_file [test_file]
你可以在“tools/README”中找到关于参数选择的更多信息。

Installation and Data Format

On Unix systems, type make' to build thesvm-train’ and `svm-predict’
programs. Run them without arguments to show the usages of them.

On other systems, consult Makefile' to build them (e.g., see 'Building Windows binaries' in this file) or use the pre-built binaries (Windows binaries are in the directorywindows’).

The format of training and testing data file is:

label index1:value1 index2:value2 …
.
.
.

Each line contains an instance and is ended by a ‘\\n’ character. For classification, label is an integer indicating the class label (multi-class is supported).
For regression, label is the target value which can be any real number. For one-class SVM, it’s not used
so can be any number. The pair index:value gives a feature
(attribute) value:index is an integer starting from 1 and value
is a real number. The only exception is the precomputed kernel, where index starts from 0; see the section of precomputed kernels. Indices must be in ASCENDING order. Labels in the testing file are only used
to calculate accuracy or errors. If they are unknown, just fill the first column with any numbers.

A sample classification data included in this package is
heart_scale'. To check if your data is in a correct form, usetools/checkdata.py’ (details in `tools/README’).

Type svm-train heart_scale', and the program will read the training data and output the model fileheart_scale.model’. If you have a test
set called heart_scale.t, then type svm-predict heart_scale.t heart_scale.model output' to see the prediction accuracy. Theoutput’
file contains the predicted class labels.

For classification, if training data are in only one class (i.e., all
labels are the same), then svm-train' issues a warning message:Warning: training data in only one class. See README for details,’
which means the training data is very unbalanced. The label in the
training data is directly returned when testing.

There are some other useful programs in this package.

svm-scale:

This is a tool for scaling input data file.

svm-toy:

This is a simple graphical interface which shows how SVM
separate data in a plane. You can click in the window to 
draw data points. Use "change" button to choose class 
1, 2 or 3 (i.e., up to three classes are supported), "load"
button to load data from a file, "save" button to save data to
a file, "run" button to obtain an SVM model, and "clear"
button to clear the window.

You can enter options in the bottom of the window, the syntax of
options is the same as `svm-train'.

Note that "load" and "save" consider dense data format both in
classification and the regression cases. For classification,
each data point has one label (the color) that must be 1, 2,
or 3 and two attributes (x-axis and y-axis values) in
[0,1). For regression, each data point has one target value
(y-axis) and one attribute (x-axis values) in [0, 1).

Type `make' in respective directories to build them.

You need Qt library to build the Qt version.
(available from http://www.trolltech.com)

You need GTK+ library to build the GTK version.
(available from http://www.gtk.org)

The pre-built Windows binaries are in the `windows'
directory. We use Visual C++ on a 32-bit machine, so the
maximal cache size is 2GB.

安装和数据格式

在Unix系统中，输入make来生成“svm-train”和“svm-predict”程序。不带参数地运行它们可以显示他们的用法。
在其他系统中，参考“Makefile”来生成它们（例如：你可以参数这篇文档中的“生成Windows可执行文件”）或者使用预生成二进制文件（Windows二进制文件在”windows“目录中）
训练和测试数据文件中的格式是：
label index1:value1 index2:value2 …
.
.
.
每行包含一个实例，并且以“n”（译者注：换行符）结束。对于分类来说，label是一个指向该类标志的整数（支持多类）。对于回归来说，label是一个可为任何实数的目标值。对于one-class-SVM来说，它不会被用到，所以可以为任何数值。除非使用预先计算的核（将在另一节介绍），index:value给出了一个特性（属性）值。index是一个从1开始的整数，value是一个实数。索引必须按升序排列。标签在测试文件中只被用来计算精确度或者错误。如果它们是未知的，把第一列赋任意值。
这个包内的一个分类数据的例子是“heart_scale”。可以使用“tools/checkdata.py”来检测你数据格式是否正确。（详见“tools/README”）。
输入“svm-train heart_scale”，程序将读取训练数据并输出模型文件“hear_scale.model”。如果你有一个测试集叫“heart_scale.t”，那么输入“svm-predict heart_scale.t heart_scale.model output” 来检查预测的准确性。“output”文件包含了预测的类标签。
这个包里还有一些其他的有用的程序：
svm-scale：
规化你的输入数据文件
svm-toy：
这是一个简单的图形界面，它将在一个面板上显示SVM如果分离数据。你可以在窗口里单击来画数据点。使用“change”按钮来选择类1,2或者3（例如：一直到3个类都是支持的），“load”按钮用来从文件里装入数据，“save”按钮用来保存数据到一个文件，“run”按钮用来获取一个SVM模型，“clear”按钮用来清除窗口。
你可以窗口的底部输入选项，选项的符号规则和“svm-train”一样。
注意“load”和“save”只考虑了分类情况下的数据，而没有考虑回归的情况。每一个数据库有一个标签（颜色），它必须是1，2或者3，并且两个属性（x和y值）范围必须是[0，1]。
在各个目录中输入make来生成它们。
你需要Qt库来生成Qt版本（可以在这里得到：http://www.trolltech.com）
你需要GTK+库来生成GTK版本（可以在这里得到：http://www.gtk.org）
预生成的Windows二进制文件可“Windows”目录中。我们使用的是32-位机上的Visual C++，所以最大缓存是2GB。

`svm-train’ Usage

Usage: svm-train [options] training_set_file [model_file]
options:
-s svm_type : set type of SVM (default 0)
0 – C-SVC (multi-class classification)
1 – nu-SVC (multi-class classification)
2 – one-class SVM
3 – epsilon-SVR (regression)
4 – nu-SVR (regression)
-t kernel_type : set type of kernel function (default 2)
0 – linear: u’*v
1 – polynomial: (gamma*u’*v + coef0)^degree
2 – radial basis function: exp(-gamma*|u-v|^2)
3 – sigmoid: tanh(gamma*u’*v + coef0)
4 – precomputed kernel (kernel values in training_set_file)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/num_features)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)
-v n: n-fold cross validation mode
-q : quiet mode (no outputs)

The k in the -g option means the number of attributes in the input data.

option -v randomly splits the data into n parts and calculates cross
validation accuracy/mean squared error on them.

See libsvm FAQ for the meaning of outputs.

“svm-train”的用法
svm-train主要实现对训练数据集的训练，并可以获得SVM模型。

用法：svm-train [options] training_set_file [model_file]
选项：
-s svm_type : 设定SVM的类型 (default 0)
0 — C-SVC
1 — nu-SVC
2 — one-class SVM
3 — epsilon-SVR
4 — nu-SVR
-t kernel_type : 设定核函数的类型 (default 2)
0 — linear: u’*v
1 — polynomial: (gamma*u’*v + coef0)^degree
2 — radial basis function: exp(-gamma*|u-v|^2)
3 — sigmoid: tanh(gamma*u’*v + coef0)
4 — precomputed kernel (kernel values in training_set_file)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/k)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates: whether to train an SVC or SVR model for probability estimates, 0 or 1 (default 0)
-wi weight: set the parameter C of class i to weight*C in C-SVC (default 1)
-v n: n-fold cross validation mode

-g中的k表示输入数据中属性的数目。
-v选项把数据随机分成n个部分，并计算它们的交叉验证精度/均方误差
通过libsvm FAQ来查看输出文件的含义。

`svm-predict’ Usage

Usage: svm-predict [options] test_file model_file output_file
options:
-b probability_estimates: whether to predict probability estimates, 0 or 1 (default 0); for one-class SVM only 0 is supported

model_file is the model file generated by svm-train.
test_file is the test data you want to predict.
svm-predict will produce output in the output_file.

svmpredict 是根据训练获得的模型，对数据集合进行预测。

用法： svm-predict [options] test_file model_file output_file
选项：
-b probability_estimates: 是否预测概率估计, 0 或 1 (默认 0); one-class SVM只支持0
model_file是svm-train生成的model文件.
test_file 是你想预测的数据.
svm-predict 将把结果输出到output_file.

`svm-scale’ Usage

Usage: svm-scale [options] data_filename
options:
-l lower : x scaling lower limit (default -1)
-u upper : x scaling upper limit (default +1)
-y y_lower y_upper : y scaling limits (default: no y scaling)
-s save_filename : save scaling parameters to save_filename
-r restore_filename : restore scaling parameters from restore_filename

See ‘Examples’ in this file for examples.

“svm-scale” Usage

svm-scale是用来对原始样本进行缩放的，范围可以自己定，一般是[0,1]或[-1,1]。缩放的目的主要是

1）防止某个特征过大或过小，从而在训练中起的作用不平衡；

2）为了计算速度。因为在核计算中，会用到内积运算或exp运算，不平衡的数据可能造成计算困难。

用法： svm-scale [options] data_filename
选项：
-l lower : x 规化的最小值 (默认 -1)
-u upper : x 规化的最大值 (默认 +1)
-y y_lower y_upper : y 规化的限定 (默认: 不规化y)
-s save_filename : 保存规化参数到 save_filename
-r restore_filename : 从restore_filename恢复规化参数
查看这个文档的’Examples’ 来获取例子。

Tips on Practical Use

Scale your data. For example, scale each attribute to [0,1] or [-1,+1].
For C-SVC, consider using the model selection tool in the tools directory.
nu in nu-SVC/one-class-SVM/nu-SVR approximates the fraction of training errors and support vectors.
If data for classification are unbalanced (e.g. many positive and few negative), try different penalty parameters C by -wi (see examples below).
Specify larger cache size (i.e., larger -m) for huge problems.

实际使用技巧

你的数据的规化。例如，规化每一个属性到[0,1]或[-1,+1]。
对于C-SVC，考虑使用tools目录中的模型选择工具。
nu in nu-SVC/one-class-SVM/nu-SVR 近似训练误差和支持向量的分数。（暂时没搞懂这个是什么意思）
如果分类数据不平衡（如太多正数，极少负数），使用-wi尝试一个不同的罚分参数C。
为大的问题指定更大的缓存大小（如 larger -m）

Examples/例子

svm-scale -l -1 -u 1 -s range train > train.scale
svm-scale -r range test > test.scale

Scale each feature of the training data to be in [-1,1]. Scaling
factors are stored in the file range and then used for scaling the test data.
其中第一行命令：
-l -1 -u -1的意思是把训练数据缩放到【-1,1】的区间
-s range的意思是把上述的缩放规则保存到range文件中
train > train.scale的意思是对train中的训练数据进行并且把缩放后的数据保存到train.scale中，不改动train中的数据
其中第二行命令：
就是引用range中已经保存的缩放规则来对test进行缩放并且将新数据保存到test.sacle文件中

在cmd中输入svm-scale即可得到所有的option的用法说明：

G:\\windows>svm-scale
Usage: svm-scale [options] data_filename
options:
-l lower : x scaling lower limit (default -1)
-u upper : x scaling upper limit (default +1)
-y y_lower y_upper : y scaling limits (default: no y scaling)
-s save_filename : save scaling parameters to save_filename
-r restore_filename : restore scaling parameters from restore_filename

下列的解读方法都一样，只需要cmd中输入相应的程序名，即可获悉各个指令的意义，因此不再做翻译。

svm-train -s 0 -c 5 -t 2 -g 0.5 -e 0.1 data_file

Train a classifier with RBF kernel exp(-0.5|u-v|^2), C=10, and
stopping tolerance 0.1.

svm-train -s 3 -p 0.1 -t 0 data_file

Solve SVM regression with linear kernel u’v and epsilon=0.1
in the loss function.

svm-train -c 10 -w1 1 -w-2 5 -w4 2 data_file

Train a classifier with penalty 10 = 1 * 10 for class 1, penalty 50 =
5 * 10 for class -2, and penalty 20 = 2 * 10 for class 4.

svm-train -s 0 -c 100 -g 0.1 -v 5 data_file

Do five-fold cross validation for the classifier using
the parameters C = 100 and gamma = 0.1

svm-train -s 0 -b 1 data_file
svm-predict -b 1 test_file data_file.model output_file

Obtain a model with probability information and predict test data with probability estimates

Precomputed Kernels /预计算核函数
（没看太懂。。。）

Users may precompute kernel values and input them as training and testing files. Then libsvm does not need the original training/testing sets.
用户可以预先计算好核函数的值并输入到代码中，则libsvm就不再需要原始的训练/测试集了。

Assume there are L training instances x1, …, xL and.
Let K(x, y) be the kernel
value of two instances x and y. The input formats
are:

New training instance for xi:

label 0:i 1:K(xi,x1) … L:K(xi,xL)

New testing instance for any x:

label 0:? 1:K(x,x1) … L:K(x,xL)

That is, in the training file the first column must be the “ID” of
xi. In testing, ? can be any value.

All kernel values including ZEROs must be explicitly provided. Any permutation or random subsets of the training/testing files are also valid (see examples below).

Note: the format is slightly different from the precomputed kernel package released in libsvmtools earlier.

Examples:

Assume the original training data has three four-feature
instances and testing data has one instance:

15  1:1 2:1 3:1 4:1
45      2:3     4:3
25          3:1

15  1:1     3:1

If the linear kernel is used, we have the following new
training/testing sets:

15  0:1 1:4 2:6  3:1
45  0:2 1:6 2:18 3:0 
25  0:3 1:1 2:0  3:1

15  0:? 1:2 2:0  3:1

? can be any value.

Any subset of the above training file is also valid. For example,

25  0:3 1:1 2:0  3:1
45  0:2 1:6 2:18 3:0 

implies that the kernel matrix is

    [K(2,2) K(2,3)] = [18 0]
    [K(3,2) K(3,3)] = [0  1]

Library Usage/库的使用
这部分内容主要介绍的是LibSVM库中的一些函数的具体实现，包括一些变量的定义、结构体的构造等等，内容较多，我可能会另开一篇博客来详细的描述。

These functions and structures are declared in the header file
svm.h'. You need to #include "svm.h" in your C/C++ source files and link your program withsvm.cpp’. You can see svm-train.c' and 下列这些函数和结构体都被定义在头文件“svm.h”当中，你需要include “svm.h”到你的c/c++源文件并且将你的程序和“svm.cpp”连接。svm-predict.c’ for examples showing how to use them.
你可以把svm-train.c、svm-predict.c作为例子查看来学习如何使用它们。
We define LIBSVM_VERSION and declare extern int libsvm_version; ' in svm.h, so you can check the version number. 我们定义了LIBSVM的版本，你可以查看版本号。 Before you classify test data, you need to construct an SVM model 在你要分类数据之前，你需要去构造一个SVM的模型 (svm_model’) using training data. A model can also be saved in
a file for later use. Once an SVM model is available, you can use it
to classify new data.

不喜欢把文章写得太长，未完待续。

—————————-这是文末—————————-
听话下图这人会被打：

最近在看《追影子的人》有一段对话颇有感触，分享一下：
“都结束了，我的真命天女爱上了别人”
“看到她跟马格在一起，你心痛吗？”
“你说呢？”
“也许应该说，‘真命天女’指的是会让你幸福的人，对吧？”
“·····”
“所以咯，也许你的‘真命天女’不是她”

———————–这真的是文末—————————–

以上是关于LibSVM学习---中英文对照使用手册完整版暨了解readme文件的主要内容，如果未能解决你的问题，请参考以下文章