如何将当前行的负值转移到数据框中的前一行?
Posted
技术标签:
【中文标题】如何将当前行的负值转移到数据框中的前一行?【英文标题】:How to transfer negative value at current row to previous row in a data frame? 【发布时间】:2019-02-01 02:37:49 【问题描述】:我想通过将当前行的负值添加到每个组中的上一行来将它们转移到上一行。 以下是我拥有的示例原始数据:
raw_data <- data.frame(GROUP = rep(c('A','B','C'),each = 6),
YEARMO = rep(c(201801:201806),3),
VALUE = c(100,-10,20,70,-50,30,20,60,40,-20,-10,50,0,10,-30,50,100,-100))
> raw_data
GROUP YEARMO VALUE
1 A 201801 100
2 A 201802 -10
3 A 201803 20
4 A 201804 70
5 A 201805 -50
6 A 201806 30
7 B 201801 20
8 B 201802 60
9 B 201803 40
10 B 201804 -20
11 B 201805 -10
12 B 201806 50
13 C 201801 0
14 C 201802 10
15 C 201803 -30
16 C 201804 50
17 C 201805 100
18 C 201806 -100
以下是我想要的输出:
final_data <- data.frame(GROUP = rep(c('A','B','C'),each = 6),
YEARMO = rep(c(201801:201806),3),
VALUE = c(90,0,20,20,0,30,20,60,10,0,0,50,-20,0,0,50,0,0))
> final_data
GROUP YEARMO VALUE
1 A 201801 90
2 A 201802 0
3 A 201803 20
4 A 201804 20
5 A 201805 0
6 A 201806 30
7 B 201801 20
8 B 201802 60
9 B 201803 10
10 B 201804 0
11 B 201805 0
12 B 201806 50
13 C 201801 -20
14 C 201802 0
15 C 201803 0
16 C 201804 50
17 C 201805 0
18 C 201806 0
以下数据框将显示如何在每个组中进行转换:
Trans_GRP_A <- data.frame(GROUP = rep('A',each = 6),
YEARMO = c(201801:201806),
VALUE = c(100,-10,20,70,-50,30),
ITER_1 = c(100,-10,20,20,0,30),
ITER_2 = c(90,0,20,20,0,30))
> Trans_GRP_A
GROUP YEARMO VALUE ITER_1 ITER_2
1 A 201801 100 100 90
2 A 201802 -10 -10 0
3 A 201803 20 20 20
4 A 201804 70 20 20
5 A 201805 -50 0 0
6 A 201806 30 30 30
> Trans_GRP_B <- data.frame(GROUP = rep('B',each = 6),
+ YEARMO = c(201801:201806),
+ VALUE = c(20,60,40,-20,-10,50),
+ ITER_1 = c(20,60,40,-30,0,50),
+ ITER_2 = c(20,60,10,0,0,50))
> Trans_GRP_B
GROUP YEARMO VALUE ITER_1 ITER_2
1 B 201801 20 20 20
2 B 201802 60 60 60
3 B 201803 40 40 10
4 B 201804 -20 -30 0
5 B 201805 -10 0 0
6 B 201806 50 50 50
> Trans_GRP_C <- data.frame(GROUP = rep('C',each = 6),
+ YEARMO = c(201801:201806),
+ VALUE = c(0,10,-30,50,100,-100),
+ ITER_1 = c(0,10,-30,50,0,0),
+ ITER_2 = c(0,-20,0,50,0,0),
+ ITER_3 = c(-20,0,0,50,0,0))
> Trans_GRP_C
GROUP YEARMO VALUE ITER_1 ITER_2 ITER_3
1 C 201801 0 0 0 -20
2 C 201802 10 10 -20 0
3 C 201803 -30 -30 0 0
4 C 201804 50 50 50 50
5 C 201805 100 0 0 0
6 C 201806 -100 0 0 0
转账的逻辑如下:
-
将负值替换为 0。
将当前行的负值与上一行的值相加。
将负值传递到上一行,直到值变为正值或 0。
如果转移没有产生正值,则转移直到遇到组内的第一行,这里每个组中的第一行是 YEARMO = 201801。
欢迎任何解决方案。我认为矢量化的解决方案可能会执行得更快。
【问题讨论】:
我怀疑是否存在纯粹的矢量化解决方案。可能需要一个循环构造 【参考方案1】:这是另一个选项,可以递归地将向量的正部分与移位的向量的负部分相加,直到没有更多的负值或它已被执行 .N 次(其中 .N 是每个组)
setDT(raw_data)[, OUTPUT :=
posVal <- replace(VALUE, VALUE<0, 0)
negVal <- replace(VALUE, VALUE>0, 0)
n <- 1L
while (any(negVal < 0) && n < .N)
posVal <- replace(posVal, posVal<0, 0) +
shift(negVal, 1L, type="lead", fill=0) +
c(negVal[1L], rep(0, .N-1L))
negVal <- replace(posVal, posVal>0, 0)
n <- n + 1L
posVal
, by=.(GROUP)]
输出:
GROUP YEARMO VALUE OUTPUT
1: A 201801 100 90
2: A 201802 -10 0
3: A 201803 20 20
4: A 201804 70 20
5: A 201805 -50 0
6: A 201806 30 30
7: B 201801 20 20
8: B 201802 60 60
9: B 201803 40 10
10: B 201804 -20 0
11: B 201805 -10 0
12: B 201806 50 50
13: C 201801 0 -20
14: C 201802 10 0
15: C 201803 -30 0
16: C 201804 50 50
17: C 201805 100 0
18: C 201806 -100 0
【讨论】:
【参考方案2】:这是一个棘手的问题。我试图找到一个矢量化解决方案,但到目前为止唯一有效的方法是循环 backwards 通过每个组中的行:
library(data.table)
DT <- as.data.table(raw_data)
DT$final <- final_data$VALUE
DT[, new :=
x <- VALUE
sn <- 0
for (i in .N:1)
if (i > 1)
if (x[i] < 0)
sn <- sn + x[i]
x[i] <- 0
else
tmp <- pmax(x[i] + sn, 0)
sn <- sn + x[i] - tmp
x[i] <- tmp
else
x[i] <- x[i] + sn
x
, by = GROUP]
DT[]
GROUP YEARMO VALUE final new 1: A 201801 100 90 90 2: A 201802 -10 0 0 3: A 201803 20 20 20 4: A 201804 70 20 20 5: A 201805 -50 0 0 6: A 201806 30 30 30 7: B 201801 20 20 20 8: B 201802 60 60 60 9: B 201803 40 10 10 10: B 201804 -20 0 0 11: B 201805 -10 0 0 12: B 201806 50 50 50 13: C 201801 0 -20 -20 14: C 201802 10 0 0 15: C 201803 -30 0 0 16: C 201804 50 50 50 17: C 201805 100 0 0 18: C 201806 -100 0 0
sn
存储,即累积负值,然后由后续(以相反顺序)正值“消耗”。
【讨论】:
以上是关于如何将当前行的负值转移到数据框中的前一行?的主要内容,如果未能解决你的问题,请参考以下文章
如何从熊猫数据框中的当前行中减去前一行并将其应用于每一行;不使用循环?