sklearn:半监督学习 - LabelSpreadingModel 内存错误

Posted

技术标签:

【中文标题】sklearn:半监督学习 - LabelSpreadingModel 内存错误【英文标题】:sklearn: semi-supervised learning - LabelSpreadingModel memory error 【发布时间】:2017-02-27 02:08:21 【问题描述】:

我使用sklearn LabelSpreadingModel如下:

label_spreading_model = LabelSpreading()
model_s = label_spreading_model.fit(my_inputs, labels)

但我收到以下错误:

   MemoryErrorTraceback (most recent call last)
    <ipython-input-17-73adbf1fc908> in <module>()
         11 
         12 label_spreading_model = LabelSpreading()
    ---> 13 model_s = label_spreading_model.fit(my_inputs, labels)

    /usr/local/lib/python2.7/dist-packages/sklearn/semi_supervised/label_propagation.pyc in fit(self, X, y)
        224 
        225         # actual graph construction (implementations should override this)
    --> 226         graph_matrix = self._build_graph()
        227 
        228         # label construction

    /usr/local/lib/python2.7/dist-packages/sklearn/semi_supervised/label_propagation.pyc in _build_graph(self)
        455         affinity_matrix = self._get_kernel(self.X_)
        456         laplacian = graph_laplacian(affinity_matrix, normed=True)
    --> 457         laplacian = -laplacian
        458         if sparse.isspmatrix(laplacian):
        459             diag_mask = (laplacian.row == laplacian.col)

    MemoryError: 

我的输入矩阵的拉普拉斯算子似乎有问题。是否有任何我可以配置的参数或任何可以避免此错误的更改?谢谢!

【问题讨论】:

【参考方案1】:

很明显:您的 PC 内存不足。

由于您没有设置任何参数,因此默认使用 rbf-kernel (proof)。

摘自scikit-learn's docs:

The RBF kernel will produce a fully connected graph which is represented in
memory by a dense matrix. This matrix may be very large and combined with the 
cost of performing a full matrix multiplication calculation for each iteration
of the algorithm can lead to prohibitively long running times

也许以下(上述文档中的下一句)会有所帮助:

On the other hand, the KNN kernel will produce a much more memory-friendly 
sparse matrix which can drastically reduce running times.

但我不知道您的数据、PC 配置等。而且只能猜测...

【讨论】:

以上是关于sklearn:半监督学习 - LabelSpreadingModel 内存错误的主要内容,如果未能解决你的问题,请参考以下文章

使用 sklearn 进行半监督学习

基于协同训练的半监督文本分类算法

半监督学习官方案例

半监督学习官方案例

用半监督算法做文本分类

半监督+标签传播算法