负的 decision_function 值
Posted
技术标签:
【中文标题】负的 decision_function 值【英文标题】:Negative decision_function values 【发布时间】:2018-03-30 21:59:53 【问题描述】:我在鸢尾花数据集上使用来自 sklearn 的支持向量分类器。当我调用decision_function
时,它返回负值。但是分类后测试数据集中的所有样本都有正确的类别。我认为decision_function 应该在样本是异常值时返回正值,如果样本是异常值则返回负值。我哪里错了?
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X = iris.data[:,:]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3,
random_state=0)
clf = SVC(probability=True)
print(clf.fit(X_train,y_train).decision_function(X_test))
print(clf.predict(X_test))
print(y_test)
这是输出:
[[-0.76231668 -1.03439531 -1.40331645]
[-1.18273287 -0.64851109 1.50296097]
[ 1.10803774 1.05572833 0.12956269]
[-0.47070432 -1.08920859 -1.4647051 ]
[ 1.18767563 1.12670665 0.21993744]
[-0.48277866 -0.98796232 -1.83186272]
[ 1.25020033 1.13721691 0.15514536]
[-1.07351583 -0.84997114 0.82303659]
[-1.04709616 -0.85739411 0.64601611]
[-1.23148923 -0.69072989 1.67459938]
[-0.77524787 -1.00939817 -1.08441968]
[-1.12212245 -0.82394879 1.11615504]
[-1.14646662 -0.91238712 0.80454974]
[-1.13632316 -0.8812114 0.80171542]
[-1.14881866 -0.95169643 0.61906248]
[ 1.15821271 1.10902205 0.22195304]
[-1.19311709 -0.93149873 0.78649126]
[-1.21653084 -0.90953622 0.78904491]
[ 1.16829526 1.12102515 0.20604678]
[ 1.18446364 1.1080255 0.15199149]
[-0.93911991 -1.08150089 -0.8026332 ]
[-1.15462733 -0.95603159 0.5713605 ]
[ 0.93278883 0.99763184 0.34033663]
[ 1.10999556 1.04596018 0.14791409]
[-1.07285663 -1.01864255 -0.10701465]
[ 1.21200422 1.01284263 0.0416991 ]
[ 0.9462457 1.01076579 0.36620915]
[-1.2108146 -0.79124775 1.43264808]
[-1.02747495 -0.25741977 1.13056021]
...
[ 1.16066886 1.11212424 0.22506538]]
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
2 1 1 2 0 2 0 0]
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
1 1 1 2 0 2 0 0]
【问题讨论】:
【参考方案1】:您需要分别考虑决策函数和预测。决定是从超平面到样本的距离。这意味着通过查看标志,您可以判断您的样本是位于超平面的右侧还是左侧。所以负值非常好,表示负类(“超平面的另一边”)。
使用 iris 数据集,您会遇到多类问题。由于 SVM 是二元分类器,因此没有固有的多类分类。两种方法是“one-vs-rest”(OvR)和“one-vs-one”方法,它们从二进制“单元”构造多类分类器。
一对一
既然您已经了解了 OvR,那么 OvA 并不难掌握。您基本上构建了每个类对组合的分类器(A, 乙)。在您的情况下:0 对 1、0 对 2、1 对 2。
注意:(A,B)和(B,A)的值可以从单个二元分类器中获得。您只更改被认为是正类的内容,因此您必须反转符号。
这样做会给你一个矩阵:
+-------+------+-------+-------+
| A / B | #0 | #1 | #2 |
+-------+------+-------+-------+
| | | | |
| #0 | -- | -1.18 | -0.64 |
| | | | |
| #1 | 1.18 | -- | 1.50 |
| | | | |
| #2 | 0.64 | -1.50 | -- |
+-------+------+-------+-------+
阅读如下: A 类(行)与 B 类(列)竞争时的决策函数值。
为了提取结果,执行投票。在基本形式中,您可以将其想象为每个分类器可以给出的单一投票:是或否。这可能导致平局,因此我们使用整个决策函数值。
+-------+------+-------+-------+-------+
| A / B | #0 | #1 | #2 | SUM |
+-------+------+-------+-------+-------+
| | | | | |
| #0 | - | -1.18 | -0.64 | -1.82 |
| | | | | |
| #1 | 1.18 | - | 1.50 | 2.68 |
| | | | | |
| #2 | 0.64 | -1.50 | - | 0.86 |
+-------+------+-------+-------+-------+
生成的列再次为您提供向量[-1.82, 2.68, 0.86]
。现在申请arg max
,它符合您的预测。
一对一休息
我保留此部分以避免进一步混淆。 scikit-lear SVC 分类器 (libsvm) 有一个 decision_function_shape
参数,这让我误以为它是 OvR(我大部分时间都在使用 liblinear)。
对于真正的 OvR 响应,您从每个分类器的决策函数中获得一个值,例如
[-1.18273287 -0.64851109 1.50296097]
现在要从中获得预测,您只需应用arg max
,它将返回值为1.50296097
的最后一个索引。从这里开始,不再需要决策函数的值(对于这个单一的预测)。这就是为什么你注意到你的预测很好。
但是,您还指定了 probability=True
,它使用 distance_function 的值并将其传递给 sigmoid function。示例原理如上,但现在您还拥有介于 0 和 1 之间的置信度值(我更喜欢这个术语而不是概率,因为它只描述到超平面的距离)。
编辑: 哎呀,萨沙是对的。 LibSVM 使用一对一(尽管决策函数的形状)。
【讨论】:
感谢您的回答!我理解对决策函数负值的解释。当我们从[-1.18273287 -0.64851109 1.50296097]
得到arg max
时,它会返回2
。但第二行的真正类别是“1”。我又哪里错了?
Sascha 是对的,libsvm 使用 OvA。我被决策函数输出的形状弄错了。我编辑了我的答案。很抱歉让您感到困惑。我仍然保留了部分原始答案,因为现在您可以看到两种不同的多类策略如何产生不同的预测。【参考方案2】:
Christopher 是正确的,但在这里假设 OvR。
现在您正在执行 OvO 方案而没有注意到它!
这里有一些例子,其中:
解释如何使用 OvO + decision_function 进行预测但首先 OvO 的预测理论来自:
ECS289: Scalable Machine Learning (Cho-Jui Hsieh; UC Davis; Oct 27, 2015)代码:
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import numpy as np
iris = datasets.load_iris()
X = iris.data[:,:]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3,
random_state=0)
clf = SVC(decision_function_shape='ovo') # EXPLICIT OVO-usage!
clf.fit(X, y)
def predict(dec):
# OVO prediction-scheme
# hardcoded for 3 classes!
# OVO order assumption: 0 vs 1; 0 vs 2; 1 vs 2 (lexicographic!)
# theory: http://www.stat.ucdavis.edu/~chohsieh/teaching/ECS289G_Fall2015/lecture9.pdf page 18
# and: http://www.mit.edu/~9.520/spring09/Classes/multiclass.pdf page 8
class0 = dec[0] + dec[1]
class1 = -dec[0] + dec[2]
class2 = -dec[1] - dec[2]
return np.argmax([class0, class1, class2])
dec_vals = clf.decision_function(X_test)
pred_vals = clf.predict(X_test)
pred_vals_own = np.array([predict(x) for x in dec_vals])
for i in range(len(X_test)):
print('decision_function vals : ', dec_vals[i])
print('sklearns prediction : ', pred_vals[i])
print('own prediction using dec: ', pred_vals_own[i])
输出:
decision_function vals : [-0.76867027 -1.04536032 -1.60216452]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [-1.19939987 -0.64932285 1.6951256 ]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [ 1.11946664 1.05573131 0.06261988]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-0.46107656 -1.09842529 -1.50671611]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [ 1.2094164 1.12827802 0.1415261 ]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-0.47736819 -0.99988924 -2.15027278]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [ 1.25467104 1.13814461 0.07643985]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-1.07557745 -0.87436887 0.93179222]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-1.05047139 -0.88027404 0.80181305]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-1.24310627 -0.70058067 1.906847 ]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-0.78440125 -1.00630434 -0.99963088]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [-1.12586024 -0.84193093 1.25542752]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-1.15639222 -0.91555677 1.07438865]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-1.14345638 -0.90050709 0.95795276]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-1.15790163 -0.95844647 0.83046875]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [ 1.17805731 1.11063472 0.1333462 ]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-1.20283096 -0.93961585 0.98410451]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-1.22782802 -0.90725712 1.05316513]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [ 1.16903803 1.12221984 0.11367107]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [ 1.17145967 1.10832227 0.08212776]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-0.9506135 -1.08467062 -0.79851794]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [-1.16266048 -0.9573001 0.79179457]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [ 0.99991983 0.99976567 0.27258784]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [ 1.14009372 1.04646327 0.05173163]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-1.08080806 -1.03404209 -0.06411027]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [ 1.23515997 1.01235174 -0.03884014]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [ 0.99958361 1.0123953 0.31647776]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-1.21958703 -0.8018796 1.67844367]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-1.03327108 -0.25946619 1.1567434 ]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [ 1.12368215 1.11169071 0.20956223]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-0.82416303 -1.07792277 -1.1580516 ]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [-1.13071754 -0.96096255 0.65828256]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [ 1.194643 1.12966124 0.15746621]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-1.04070512 -1.04532308 -0.20319486]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [-0.70170723 -1.09340841 -1.9323473 ]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [-1.24655214 -0.74489305 1.15450078]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [ 0.99984598 1.03781258 0.2790073 ]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-0.99993896 -1.06846079 -0.44496083]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [-1.22495071 -0.83041964 1.41965874]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-1.286798 -0.72689128 1.72244026]
sklearns prediction : 1
own prediction using dec: 1
decision_function vals : [-0.75503345 -1.09561165 -1.44344022]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [ 1.24778268 1.11179415 0.05277115]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [-0.79577073 -1.00004599 -0.99974376]
sklearns prediction : 2
own prediction using dec: 2
decision_function vals : [ 1.07018075 1.0831253 0.22181655]
sklearns prediction : 0
own prediction using dec: 0
decision_function vals : [ 1.16705531 1.11326796 0.15604895]
sklearns prediction : 0
own prediction using dec: 0
【讨论】:
谢谢!这是我读过的最容易理解的解释之一。以上是关于负的 decision_function 值的主要内容,如果未能解决你的问题,请参考以下文章
Decision_function:scores,predict以及其他
scikit-learn中的predict_proba和decision_function有啥区别?
bzoj1627 / P2873 [USACO07DEC]泥水坑Mud Puddles
为啥 Swift 语言指南建议使用 Int “即使已知值是非负的”?
从 scikit-learn SVC decision_function 预测概率,decision_function_shape='ovo'