如何将函数应用于每一行data.table

Posted 2021-04-14

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何将函数应用于每一行data.table相关的知识，希望对你有一定的参考价值。

我试图使用library(financial)计算净现值（NPV），用于data.table格式的给定现金流量的每次观察。这是我的现金流量：

library(data.table)    
dt <- data.table(id=c(1,2,3,4), Year1=c(NA, 30, 40, NA), Year2=c(20, 30, 20 ,70), Year3=c(60, 40, 0, 10))

要计算NPV并更新data.table，

library(financial)
npv <- apply(dt, 1, function(x) cf(na.omit(x[-1]), i = 20)$tab[, 'NPV'])
dt[, NPV:=npv]

返回，

   id Year1 Year2 Year3      NPV
1:  1    NA    20    60 70.00000
2:  2    30    30    40 82.77778
3:  3    40    20     0 56.66667
4:  4    NA    70    10 78.33333

如何使用函数cf直接将结果更新到data.table中的每一行？

仅供参考：在我的真实数据集中，有超过50列

答案

我们可以尝试基于连接的方法

dt[melt(dt, id.var = "id")[, .(NPV = cf(value[!is.na(value)], 
                      i = 20)$tab[, "NPV"]), id], on = 'id']
#   id Year1 Year2 Year3      NPV
#1:  1    NA    20    60 70.00000
#2:  2    30    30    40 82.77778
#3:  3    40    20     0 56.66667
#4:  4    NA    70    10 78.33333

另一答案

重写cf函数以仅计算所需的部分将显着加快速度：

dt[, NPV := {x <- na.omit(unlist(.SD)); sum(x * sppv(20,0:(length(x)-1)))}, by=id]

#   id Year1 Year2 Year3      NPV
#1:  1    NA    20    60 70.00000
#2:  2    30    30    40 82.77778
#3:  3    40    20     0 56.66667
#4:  4    NA    70    10 78.33333

事实上，现在这可能是矢量化了......嗯，让我想一想！

另一答案

我们可以尝试在这个例子中使用我们自己的npv函数。

dcf <- function(x, r, t0=FALSE){
  # calculates discounted cash flows (DCF) given cash flow and discount rate
  #
  # x - cash flows vector
  # r - vector or discount rates, in decimals. Single values will be recycled
  # t0 - cash flow starts in year 0, default is FALSE, i.e. discount rate in first period is zero.
  if(length(r)==1){
    r <- rep(r, length(x))
    if(t0==TRUE){r[1]<-0}
  }
  x/cumprod(1+r)
}

npv <- function(x, r, t0=FALSE){
  # calculates net present value (NPV) given cash flow and discount rate
  #
  # x - cash flows vector
  # r - discount rate, in decimals
  # t0 - cash flow starts in year 0, default is FALSE
  sum(dcf(x, r, t0))
}

现在，每当你想要apply(x,1,f)，melt / gather / nest代替。除非您打算完全误导数据的用户，否则在计算NPV时不应丢弃NA。这意味着您正在将现金流量折现到不同的时间点。用q替换NA。我还看到你打算使用的套餐，将现金流量折扣到第0年，基本上意味着第一笔现金流量（在第1年）不打折。

library(data.table)
npv_dt <- melt(dt, id.vars = "id")[is.na(value), value:=0][order(variable), .(NPV=npv(x=value, r=0.2, t0=TRUE)), by="id"]

setkey(dt, id)
setkey(npv_dt, id)

npv_dt[dt]

#>    id      NPV Year1 Year2 Year3
#> 1:  1 58.33333    NA    20    60
#> 2:  2 82.77778    30    30    40
#> 3:  3 56.66667    40    20     0
#> 4:  4 65.27778    NA    70    10

以上是关于如何将函数应用于每一行data.table的主要内容，如果未能解决你的问题，请参考以下文章

当我在 R 中使用聚合时，我可以将总和应用于每一行吗？

如何将函数应用于每组数据框

如何从熊猫数据框中的当前行中减去前一行并将其应用于每一行；不使用循环？

如何将`where`条件应用于每条记录

是否有将函数应用于每对列的 R 函数？

规范化data.table的每一行