使用R中的稀疏矩阵从矢量中提取元素,而不转换为密集矩阵
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用R中的稀疏矩阵从矢量中提取元素,而不转换为密集矩阵相关的知识,希望对你有一定的参考价值。
我想从矢量x1
中提取所有元素,其中第i列存在于稀疏矩阵中。我需要删除所有稀疏元素,但结果应该在他们自己的对象/列表/矩阵中逐行存在。
鉴于:
> x1
[1] 1 2 3 4 5 6 7 8 9 10
> sparse_mat
8 x 10 sparse Matrix of class "ngCMatrix"
[1,] | | | . . . . . . .
[2,] . | | | . . . . . .
[3,] . . | | | . . . . .
[4,] . . . | | | . . . .
[5,] . . . . | | | . . .
[6,] . . . . . | | | . .
[7,] . . . . . . | | | .
[8,] . . . . . . . | | |
期望的结果:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
[4,] 4 5 6
[5,] 5 6 7
[6,] 6 7 8
[7,] 7 8 9
[8,] 8 9 10
更完整的例子与评论
library(Matrix)
library(purrr)
x1 <- 1:10
create_seq_sparse <- function(n, len) {
bandSparse(m = n, n = n - len + 1L, k = seq_len(len) - 1L)
}
sparse_mat <- create_seq_sparse(10, 3)
sparse_mat
#> 8 x 10 sparse Matrix of class "ngCMatrix"
#>
#> [1,] | | | . . . . . . .
#> [2,] . | | | . . . . . .
#> [3,] . . | | | . . . . .
#> [4,] . . . | | | . . . .
#> [5,] . . . . | | | . . .
#> [6,] . . . . . | | | . .
#> [7,] . . . . . . | | | .
#> [8,] . . . . . . . | | |
# If there's a better way to do this, please advise?
mat_x1_mult_sparse <- t(t(sparse_mat) * x1)
mat_x1_mult_sparse
#> 8 x 10 sparse Matrix of class "dgCMatrix"
#>
#> [1,] 1 2 3 . . . . . . .
#> [2,] . 2 3 4 . . . . . .
#> [3,] . . 3 4 5 . . . . .
#> [4,] . . . 4 5 6 . . . .
#> [5,] . . . . 5 6 7 . . .
#> [6,] . . . . . 6 7 8 . .
#> [7,] . . . . . . 7 8 9 .
#> [8,] . . . . . . . 8 9 10
# This is nice, but can't use in conjunction with keep?
# mat_x1_mult_sparse[1, , drop = FALSE]
# Desired results, but this approach I think I lose the advantages of the sparse matrix?
mat_x1_mult_sparse[1, ] %>% keep(~ .x != 0)
#> [1] 1 2 3
mat_x1_mult_sparse[2, ] %>% keep(~ .x != 0)
#> [1] 2 3 4
# etc...
mat_x1_mult_sparse[8, ] %>% keep(~ .x != 0)
#> [1] 8 9 10
答案
一种选择是利用summary
方法来获得非稀疏元素的索引
library(Matrix)
i1 <- summary(sparse_mat)
i2 <- as.matrix(i1[order(i1[,1]),]) # order by the row index
# multiply the sparse matrix by the replicated 'x1', extract elements
# with i2 index and convert it to n column matrix
matrix((sparse_mat * x1[col(sparse_mat)])[i2], ncol = 3, byrow = TRUE)
#. [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 2 3 4
#[3,] 3 4 5
#[4,] 4 5 6
#[5,] 5 6 7
#[6,] 6 7 8
#[7,] 7 8 9
#[8,] 8 9 10
另一答案
一旦我注意到你不想让你的矩阵稀疏,就删除了之前的回答;仍然,想法是利用矩阵的i
槽:
# convert to dgCMatrix since ngCMatrix can only be on/off
out = as(sparse_mat, 'dgCMatrix')
# subset to the "on" elements of sparse_mat,
# and replace with the column number. The column number is
# not stored directly so we have to make it ourselves, basically
# by looking for when the value in @i stays the same or goes down
out[sparse_mat] = c(1L, cumsum(diff(sparse_mat@i) <= 0) + 1L)
out
# 8 x 10 sparse Matrix of class "dgCMatrix"
#
# [1,] 1 2 3 . . . . . . .
# [2,] . 2 3 4 . . . . . .
# [3,] . . 3 4 5 . . . . .
# [4,] . . . 4 5 6 . . . .
# [5,] . . . . 5 6 7 . . .
# [6,] . . . . . 6 7 8 . .
# [7,] . . . . . . 7 8 9 .
# [8,] . . . . . . . 8 9 10
这应该是非常有效的,因为[
的dgCMatrix
方法应该是聪明的,你的替换正是所需的长度(没有浪费的元素)。
以上是关于使用R中的稀疏矩阵从矢量中提取元素,而不转换为密集矩阵的主要内容,如果未能解决你的问题,请参考以下文章
TypeError:传递了稀疏矩阵,但需要密集数据。使用 X.toarray() 转换为密集的 numpy 数组。使用 NaiveBayes 分类器