与实际提供的相比，为啥在 SVM 中排名的特征数量更少？

Posted 2023-03-12

技术标签:

【中文标题】与实际提供的相比，为啥在 SVM 中排名的特征数量更少？【英文标题】：Why less number of features being ranked in SVM as compared to actual provided?与实际提供的相比，为什么在 SVM 中排名的特征数量更少？ 【发布时间】：2015-02-27 19:49:45 【问题描述】：

我已经训练了一个具有 18881 个特征的 SVM，并且想知道特征的排名。我尝试了SVM equations from e1071 R package?给出的方法，找到了权重向量

w = t(model$coefs) %*% model$SV

在通过str(w) 检查w 时，我得到以下信息：

> str(w)
Formal class 'matrix.csr' [package "SparseM"] with 4 slots
  ..@ ra       : num [1:16725] 1198.1 229 107.5 -22.4 408.3 ...
  ..@ ja       : int [1:16725] 381 396 434 447 3262 4802 9187 10398 11856 13896 ...
  ..@ ia       : int [1:2] 1 16726
  ..@ dimension: int [1:2] 1 18881

我猜@ja 给出了特征的列 id，@ra 给出了相应的权重。在那种情况下，为什么特征数不等于 18881。

我说@ja 是特征的列 ID 是否正确？

正如在提到的 *** 答案中所解释的，我使用了线性内核。我可以对 Radial Kernel 应用相同的方法吗？

【问题讨论】：

你是否已经完成了对包“SparseM”的文档的研究？这里的实际问题是，为什么不给出所有特征的权重？谢谢@BondedDust，我明白了。 【参考方案1】：

好的，由于@BondedDust 在 cmets 中提供的帮助，我得到了解释。 sparseM 不存储值为 0 的单元格，因此此处不存储权重为 0 的特征。

ra: Object of class numeric, a real array of nnz elements containing the **non-zero elements** of A.
ja: Object of class integer, an integer array of nnz elements **containing the column indices of the elements stored in ‘ra’**.
ia: Object of class integer, an integer array of nnz elements containing the row indices of the elements stored in ‘ra’.
dimension: Object of class integer, dimension of the matrix

【讨论】：

以上是关于与实际提供的相比，为啥在 SVM 中排名的特征数量更少？的主要内容，如果未能解决你的问题，请参考以下文章