Rcpp - 从矩阵/数据帧列表中提取行

Posted 2023-02-16

技术标签:

【中文标题】Rcpp - 从矩阵/数据帧列表中提取行【英文标题】：Rcpp - extracting rows from list of matrices / dataframes 【发布时间】：2016-03-14 19:23:10 【问题描述】：

作为this question 的后续行动，我决定走 Rcpp 与 R 中复杂语法的路线。我认为这将提供更好的可读性（并且可能也更快）。

假设我有一个data.frames 的列表（我可以通过as 轻松地将其转换为矩阵）。鉴于之前的answe-r-s，这似乎是最好的方法。

# input data
my_list <- vector("list", length= 10)
set.seed(65L)
for (i in 1:10) 
  my_list[[i]] <- data.frame(matrix(rnorm(10000),ncol=10))
  # alternatively 
  # my_list[[i]] <- matrix(rnorm(10000),ncol=10)

从矩阵中提取行的适当方法是什么？目标是创建一个列表，其中每个列表元素包含每个原始列表 data.frames 的nrth 行的列表。我尝试了几种不同的语法并不断收到错误：

#include <Rcpp.h>
using namespace Rcpp;
using namespace std:

List foo(const List& my_list, const int& n_geo) 
  int n_list = my_list.size();
  std::vector<std::vector<double> > list2(n_geo);

  // needed code....

  return wrap(list2);

选项

for (int i = 0; i < n_list; i++) 
  for (int nr = 0; nr < n_geo; nr++) 
    list2[nr][i] = my_list[i].row(nr);
    // or list2[nr].push_back(my_list[i].row(nr));
    // or list2[nr].push_back(as<double>(my_list[i].row(nr)));
    // or list2[nr].push_back(as<double>(my_list[i](nr, _)));
  


// or:
NumericMatrix a = my_list[1] 
... 
NumericMatrix j = my_list[10]

for (int nr = 0; nr < n_geo; nr++) 
  list2[nr][1] = // as above

这些都不适合我。我究竟做错了什么？以下是我从上述语法选择中收到的错误。

错误：没有匹配函数调用'as(Rcpp::Matrix::Row)'

或

错误：无法在赋值中将 'Rcpp::Matrix::Row aka Rcpp::MatrixRow' 转换为 'double'

【问题讨论】：

你的问题对我来说有点不清楚。可以为您的输入（对应于my_list）和所需的输出显示示例 R 对象吗？所以您正在尝试使用 Rcpp 在其他问题中编写创建 l2 的操作？ 【参考方案1】：

这是一种方法：

#include <Rcpp.h>

// x[[nx]][ny,]  ->  y[[ny]][[nx]]

// [[Rcpp::export]]
Rcpp::List Transform(Rcpp::List x) 
    R_xlen_t nx = x.size(), ny = Rcpp::as<Rcpp::NumericMatrix>(x[0]).nrow();
    Rcpp::List y(ny);

    for (R_xlen_t iy = 0; iy < ny; iy++) 
        Rcpp::List tmp(nx);
        for (R_xlen_t ix = 0; ix < nx; ix++) 
            Rcpp::NumericMatrix mtmp = Rcpp::as<Rcpp::NumericMatrix>(x[ix]);
            tmp[ix] = mtmp.row(iy);
        
        y[iy] = tmp;
    

    return y;


/*** R

L1 <- lapply(1:10, function(x) 
    matrix(rnorm(20), ncol = 5)
)

L2 <- lapply(1:nrow(L1[[1]]), function(x) 
    lapply(L1, function(y) unlist(y[x,]))
)

all.equal(L2, Transform(L1))
#[1] TRUE

microbenchmark::microbenchmark(
    "R" = lapply(1:nrow(L1[[1]]), function(x) 
        lapply(L1, function(y) unlist(y[x,]))
    ),
    "Cpp" = Transform(L1),
    times = 200L)

#Unit: microseconds
#expr    min      lq      mean  median       uq      max neval
#  R 254.660 316.627 383.92739 347.547 392.7705 1909.097   200
#Cpp  18.314  26.007  71.58795  30.230  38.8650  945.167   200

*/

我不确定这将如何扩展；我认为这只是一种固有的低效转型。根据我在源代码顶部的评论，您似乎只是在进行一种坐标交换——输入列表中nxth 元素的nyth 行变成了nxth 元素输出列表的nyth 元素：

x[[nx]][ny,]  ->  y[[ny]][[nx]]

为了解决您遇到的错误，Rcpp::List 是一个通用对象 - 从技术上讲是 Rcpp::Vector<VECSXP> - 所以当您尝试这样做时，例如

my_list[i].row(nr)

编译器不知道my_list[i] 是NumericMatrix。因此，您必须使用Rcpp::as<> 进行显式转换，

Rcpp::NumericMatrix mtmp = Rcpp::as<Rcpp::NumericMatrix>(x[ix]);
tmp[ix] = mtmp.row(iy);

我只是在示例数据中使用了matrix 元素来简化事情。在实践中，您最好直接在 R 中将 data.frames 强制转换为 matrix 对象，而不是尝试在 C++ 中执行此操作；它会简单得多，而且很可能，强制只是调用底层的 C 代码，所以实际上没有任何东西可以尝试这样做。

我还应该指出，如果您使用同构类型的Rcpp::List，您可以使用Rcpp::ListOf<type> 挤出更多性能。这将允许您跳过上面完成的Rcpp::as<type> 转换：

typedef Rcpp::ListOf<Rcpp::NumericMatrix> MatList;

// [[Rcpp::export]]
Rcpp::List Transform2(MatList x) 
    R_xlen_t nx = x.size(), ny = x[0].nrow();
    Rcpp::List y(ny);

    for (R_xlen_t iy = 0; iy < ny; iy++) 
        Rcpp::List tmp(nx);
        for (R_xlen_t ix = 0; ix < nx; ix++) 
            tmp[ix] = x[ix].row(iy);
        
        y[iy] = tmp;
    

    return y;


/*** R

L1 <- lapply(1:10, function(x) 
    matrix(rnorm(20000), ncol = 100)
)

L2 <- lapply(1:nrow(L1[[1]]), function(x) 
    lapply(L1, function(y) unlist(y[x,]))
)

microbenchmark::microbenchmark(
    "R" = lapply(1:nrow(L1[[1]]), function(x) 
        lapply(L1, function(y) unlist(y[x,]))
    ),
    "Transform" = Transform(L1),
    "Transform2" = Transform2(L1),
    times = 200L)

#Unit: microseconds
#      expr      min       lq     mean   median       uq       max neval
#         R 6049.594 6318.822 7604.871 6707.242 8592.510 64005.190   200
# Transform  928.468 1041.936 3130.959 1166.819 1659.745 71552.284   200
#Transform2  850.912  957.918 1694.329 1061.183 2856.724  4502.065   200

*/

【讨论】：

感谢您的持续编辑。与我幼稚的 R 方法相比，我的速度提高了 ~11 倍，这比通过 sgibbs 的先前解决方案提高 ~8.5 倍要好......而且，正如最初指出的那样，可读性得到了显着提高。

以上是关于Rcpp - 从矩阵/数据帧列表中提取行的主要内容，如果未能解决你的问题，请参考以下文章