在data.table中逐行应用函数;将列名称作为向量传递

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在data.table中逐行应用函数;将列名称作为向量传递相关的知识,希望对你有一定的参考价值。

考虑一个函数foo如下。

foo <- function(a, b, c) {
  out <- (sum(a) + sqrt(prod(c))) / sqrt(pi * b)
  return(out)
}

我想将函数应用于data.table DT,其中列中的数据作为参数,根据唯一键列ID按行排列。

DT <- structure(list(ID = c("K1L1", "K1L2", "K1L3", "K2L1", "K2L2", 
"K2L3", "K3L1", "K3L2", "K3L3", "K4L1", "K4L2", "K4L3", "K5L1", 
"K5L2", "K5L3"), K1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L), K2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L), K3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), K4 = c(0L, 0L, 0L, 1L, 0L, 0L, 2L, 
1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), K5 = c(4L, 3L, 5L, 3L, 4L, 3L, 
3L, 3L, 4L, 4L, 3L, 3L, 5L, 4L, 4L), K6 = c(17L, 21L, 21L, 15L, 
18L, 20L, 18L, 14L, 19L, 19L, 19L, 21L, 20L, 18L, 17L), K7 = c(10L, 
11L, 11L, 13L, 11L, 10L, 9L, 12L, 12L, 12L, 10L, 11L, 12L, 13L, 
10L), K8 = c(7L, 7L, 8L, 6L, 7L, 7L, 8L, 6L, 8L, 6L, 8L, 6L, 
8L, 6L, 8L), K9 = c(1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 
1L, 1L, 2L, 1L), K10 = c(0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L, 0L, 0L, 1L, 1L), keq = c(50, 49, 51, 51, 48, 51, 48, 
47, 49, 51, 52, 48, 50, 50, 48), result = c(3.32285019941341, 
3.75957814378025, 3.85756018427585, 3.51276824014721, 3.55423728741272, 
3.52711899186614, 3.82738634323954, 3.49460484846665, 3.85490005446497, 
3.7497752713846, 3.58557114276955, 3.61968872352116, 3.89594481311228, 
3.78708738710968, 3.56911326431751)), class = "data.frame", row.names = c(NA, 
-15L))

library(data.table)
setDT(DT)

    DT
     ID K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 keq
1  K1L1  0  0  0  0  4 17 10  7  1   0  50
2  K1L2  0  0  0  0  3 21 11  7  1   1  49
3  K1L3  0  0  0  0  5 21 11  8  1   0  51
4  K2L1  0  0  0  1  3 15 13  6  2   1  51
5  K2L2  0  0  0  0  4 18 11  7  1   0  48
6  K2L3  0  0  0  0  3 20 10  7  1   1  51
7  K3L1  0  0  0  2  3 18  9  8  2   1  48
8  K3L2  0  0  0  1  3 14 12  6  2   1  47
9  K3L3  0  0  0  0  4 19 12  8  1   1  49
10 K4L1  0  0  0  0  4 19 12  6  2   1  51
11 K4L2  0  0  0  1  3 19 10  8  1   1  52
12 K4L3  0  0  0  0  3 21 11  6  1   0  48
13 K5L1  0  0  0  0  5 20 12  8  1   0  50
14 K5L2  0  0  0  0  4 18 13  6  2   1  50
15 K5L3  0  0  0  0  4 17 10  8  1   1  48

我用通常的语法得到了理想的结果。

DT[, result := foo(a = c(K1, K2, K3, K4, K5, K6, K7, K8, K9, K10),
                   b = keq, c = c(K8, K9)), by = "ID"]

DT
      ID K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 keq   result
 1: K1L1  0  0  0  0  4 17 10  7  1   0  50 3.322850
 2: K1L2  0  0  0  0  3 21 11  7  1   1  49 3.759578
 3: K1L3  0  0  0  0  5 21 11  8  1   0  51 3.857560
 4: K2L1  0  0  0  1  3 15 13  6  2   1  51 3.512768
 5: K2L2  0  0  0  0  4 18 11  7  1   0  48 3.554237
 6: K2L3  0  0  0  0  3 20 10  7  1   1  51 3.527119
 7: K3L1  0  0  0  2  3 18  9  8  2   1  48 3.827386
 8: K3L2  0  0  0  1  3 14 12  6  2   1  47 3.494605
 9: K3L3  0  0  0  0  4 19 12  8  1   1  49 3.854900
10: K4L1  0  0  0  0  4 19 12  6  2   1  51 3.749775
11: K4L2  0  0  0  1  3 19 10  8  1   1  52 3.585571
12: K4L3  0  0  0  0  3 21 11  6  1   0  48 3.619689
13: K5L1  0  0  0  0  5 20 12  8  1   0  50 3.895945
14: K5L2  0  0  0  0  4 18 13  6  2   1  50 3.787087
15: K5L3  0  0  0  0  4 17 10  8  1   1  48 3.569113

x <- c(0, 0, 0, 0, 4, 17, 10, 7, 1, 0)
y <- 50
z <- c(7, 1)

 foo(x, y, z)
[1] 3.32285

但是当我试图将参数作为列名的向量传递时,我得不到正确的结果。

DT[, result := foo(a = get(acol), b = get(bcol), c = get(ccol)), by = "ID"]
DT
      ID K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 keq    result
 1: K1L1  0  0  0  0  4 17 10  7  1   0  50 0.2111004
 2: K1L2  0  0  0  0  3 21 11  7  1   1  49 0.2132436
 3: K1L3  0  0  0  0  5 21 11  8  1   0  51 0.2234524
 4: K2L1  0  0  0  1  3 15 13  6  2   1  51 0.1935154
 5: K2L2  0  0  0  0  4 18 11  7  1   0  48 0.2154535
 6: K2L3  0  0  0  0  3 20 10  7  1   1  51 0.2090206
 7: K3L1  0  0  0  2  3 18  9  8  2   1  48 0.2303294
 8: K3L2  0  0  0  1  3 14 12  6  2   1  47 0.2015820
 9: K3L3  0  0  0  0  4 19 12  8  1   1  49 0.2279670
10: K4L1  0  0  0  0  4 19 12  6  2   1  51 0.1935154
11: K4L2  0  0  0  1  3 19 10  8  1   1  52 0.2212934
12: K4L3  0  0  0  0  3 21 11  6  1   0  48 0.1994711
13: K5L1  0  0  0  0  5 20 12  8  1   0  50 0.2256758
14: K5L2  0  0  0  0  4 18 13  6  2   1  50 0.1954410
15: K5L3  0  0  0  0  4 17 10  8  1   1  48 0.2303294

我哪里错了?

答案

试试这个:

DT[, result := foo(a = unlist(mget(acol)), 
                   b = unlist(mget(bcol)), 
                   c = unlist(mget(ccol))), by = "ID"]

使用过的物体(除了DT

acol <- paste0("K", 1:10)
bcol <- "keq"
ccol <- c("K8", "K9")

以上是关于在data.table中逐行应用函数;将列名称作为向量传递的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Julia 中逐行读取文件?

从文本文件中逐行提取数据并将其存储在python的列表中[重复]

在批处理文件中逐行读取txt

警告:将列添加到从函数返回的 data.table 时“检测到无效 .internal.selfref”

在 Go 中逐行读取文件

如何在目标c中逐行解析JSON文件