Numpy/Scipy 稀疏与密集乘法

Posted 2023-03-12

技术标签:

【中文标题】Numpy/Scipy 稀疏与密集乘法【英文标题】：Numpy/Scipy Sparse vs dense multiplication 【发布时间】：2013-05-26 05:53:12 【问题描述】：

scipy 稀疏矩阵类型和普通 numpy 矩阵类型之间似乎存在一些差异

import scipy.sparse as sp
A = sp.dia_matrix(tri(3,4))
vec = array([1,2,3,4])

print A * vec                        #array([ 1.,  3.,  6.])

print A * (mat(vec).T)               #matrix([[ 1.],
                                     #        [ 3.],
                                     #        [ 6.]])

print A.todense() * vec              #ValueError: matrices are not aligned

print A.todense() * (mat(vec).T)     #matrix([[ 1.],
                                     #        [ 3.],
                                     #        [ 6.]])

为什么稀疏矩阵可以将数组解释为列向量，而普通矩阵却不能？

【问题讨论】：

稀疏矩阵不是 numpy 矩阵的子类。它甚至不是ndarray。 【参考方案1】：

在spmatrix 类（您可以在 scipy/sparse/base.py 中查看）__mul__() 中有一组“ifs”可以回答您的问题：

class spmatrix(object):
    ...
    def __mul__(self, other):
        ...
        M,N = self.shape
        if other.__class__ is np.ndarray:
            # Fast path for the most common case
            if other.shape == (N,):
                return self._mul_vector(other)
            elif other.shape == (N, 1):
                return self._mul_vector(other.ravel()).reshape(M, 1)
            elif other.ndim == 2  and other.shape[0] == N:
                return self._mul_multivector(other)

对于一维数组，它将始终从 compressed.py 转到 _mul_vector()，在类 _cs_matrix 内，代码如下：

def _mul_vector(self, other):
    M,N = self.shape

    # output array
    result = np.zeros(M, dtype=upcast_char(self.dtype.char,
                                           other.dtype.char))

    # csr_matvec or csc_matvec
    fn = getattr(sparsetools,self.format + '_matvec')
    fn(M, N, self.indptr, self.indices, self.data, other, result)

    return result

请注意，它假定输出具有稀疏矩阵的行数。基本上，它将您的输入一维数组视为适合稀疏数组的列数（没有转置或非转置）。但是对于带有ndim==2 的ndarray，它不能做这样的假设，所以如果你尝试过：

vec = np.array([[1,2,3,4],
                [1,2,3,4]])

A * vec.T 将是唯一可行的选项。

对于一维矩阵，稀疏模块也不假设它适合列数。要检查您是否可以尝试：

A * mat(vec)
#ValueError: dimension mismatch

而A * mat(vec).T 将是您唯一的选择。

【讨论】：

因为fast path，A*vec 比A*mvec 快，其中mvec=mat(vec).T。

以上是关于Numpy/Scipy 稀疏与密集乘法的主要内容，如果未能解决你的问题，请参考以下文章