根据其他列的最旧组成员的值重新编码整个组的列
Posted
技术标签:
【中文标题】根据其他列的最旧组成员的值重新编码整个组的列【英文标题】:Recode column for whole group based on other column's value of oldest group member 【发布时间】:2019-02-27 23:09:51 【问题描述】:我想根据每个组中最老成员的另一列的值重新编码指示整个组的状态(x1 或 x2 = 3 或 0)的两列。
在下面的示例中,x1(x2) 是每个组内 key1(key2) 的总和(每人总是有三个值/插补)。但是,我只想为每个组设置 x1>0 或 x2>0。在那些有一个 key1=1 的人和一个 key2=1 的人(因此 x1=3 AND x2=3)的组中,应该由最年长的人决定。如果最年长的人有 key1=1 和 key2=0,就像在 A 组中,x1 应该是 3,x2 应该是整个组的 0,依此类推。
可重现的例子:
id <- c("A11", "A12", "A13", "A21", "A22", "A23", "B11", "B12", "B13", "C11", "C12", "C13", "C21", "C22", "C23", "D11", "D12", "D13", "D21", "D22", "D23", "E11", "E12", "E13", "E21", "E22", "E23")
group <- c("A","A","A","A","A","A","B","B","B","C","C","C","C","C","C","D","D","D","D","D","D","E","E","E","E","E","E")
imputation <- c(rep(1:3, 9))
age <- c(45,45,45,17,17,17,20,20,20,70,70,70,60,60,60,25,25,25,30,30,30,28,28,28,34,34,34)
key1 <- c(1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0)
key2 <- c(0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0)
x1 <- c(3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3)
x2 <- c(3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0,0,0,0)
test <- data.frame(id, group, imputation, age, key1, key2, x1, x2)
应重新编码 x1 和 x2 的子集:
> test %>% group_by(group) %>% filter(x1==x2 & x1>0 | x1==x2 & x2>0)
# A tibble: 18 x 8
# Groups: group [3]
id group imputation age key1 key2 x1 x2
<fct> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A11 A 1 45 1 0 3 3
2 A12 A 2 45 1 0 3 3
3 A13 A 3 45 1 0 3 3
4 A21 A 1 17 0 1 3 3
5 A22 A 2 17 0 1 3 3
6 A23 A 3 17 0 1 3 3
7 C11 C 1 70 0 1 3 3
8 C12 C 2 70 0 1 3 3
9 C13 C 3 70 0 1 3 3
10 C21 C 1 60 1 0 3 3
11 C22 C 2 60 1 0 3 3
12 C23 C 3 60 1 0 3 3
13 D11 D 1 25 1 0 3 3
14 D12 D 2 25 1 0 3 3
15 D13 D 3 25 1 0 3 3
16 D21 D 1 30 0 1 3 3
17 D22 D 2 30 0 1 3 3
18 D23 D 3 30 0 1 3 3
输出应该是:
id group imputation age key1 key2 x1 x2
1 A11 A 1 45 1 0 3 0
2 A12 A 2 45 1 0 3 0
3 A13 A 3 45 1 0 3 0
4 A21 A 1 17 0 1 3 0
5 A22 A 2 17 0 1 3 0
6 A23 A 3 17 0 1 3 0
7 C11 C 1 70 0 1 0 3
8 C12 C 2 70 0 1 0 3
9 C13 C 3 70 0 1 0 3
10 C21 C 1 60 1 0 0 3
11 C22 C 2 60 1 0 0 3
12 C23 C 3 60 1 0 0 3
13 D11 D 1 25 1 0 0 3
14 D12 D 2 25 1 0 0 3
15 D13 D 3 25 1 0 0 3
16 D21 D 1 30 0 1 0 3
17 D22 D 2 30 0 1 0 3
18 D23 D 3 30 0 1 0 3
我猜它可以通过 group_by、filter、mutate 和 ifelse 的组合来完成,但我还没有弄清楚。然而,重要的是它包含过滤器或类似的东西,因为x1==x2 & x1>0 | x1==x2 & x2>0
的观察只是我数据框的一个子集。
【问题讨论】:
【参考方案1】:在每个group
中,您可以比较age
的unique
值(其中key1
为1)与unique
值age
(其中key2
为1)并更新x1
和@987654329 @ 相应地:
id <- c("A11", "A12", "A13", "A21", "A22", "A23", "B11", "B12", "B13", "C11", "C12", "C13", "C21", "C22", "C23", "D11", "D12", "D13", "D21", "D22", "D23", "E11", "E12", "E13", "E21", "E22", "E23")
group <- c("A","A","A","A","A","A","B","B","B","C","C","C","C","C","C","D","D","D","D","D","D","E","E","E","E","E","E")
imputation <- c(rep(1:3, 9))
age <- c(45,45,45,17,17,17,20,20,20,70,70,70,60,60,60,25,25,25,30,30,30,28,28,28,34,34,34)
key1 <- c(1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0)
key2 <- c(0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0)
x1 <- c(3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3)
x2 <- c(3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0,0,0,0)
test <- data.frame(id, group, imputation, age, key1, key2, x1, x2)
library(dplyr)
test %>%
group_by(group) %>%
filter(x1==x2 & x1>0 | x1==x2 & x2>0) %>%
mutate(x1 = ifelse(unique(age[key1==1]) > unique(age[key2==1]), 3, 0),
x2 = ifelse(unique(age[key1==1]) > unique(age[key2==1]), 0, 3)) %>%
ungroup()
# # A tibble: 18 x 8
# id group imputation age key1 key2 x1 x2
# <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A11 A 1 45 1 0 3 0
# 2 A12 A 2 45 1 0 3 0
# 3 A13 A 3 45 1 0 3 0
# 4 A21 A 1 17 0 1 3 0
# 5 A22 A 2 17 0 1 3 0
# 6 A23 A 3 17 0 1 3 0
# 7 C11 C 1 70 0 1 0 3
# 8 C12 C 2 70 0 1 0 3
# 9 C13 C 3 70 0 1 0 3
#10 C21 C 1 60 1 0 0 3
#11 C22 C 2 60 1 0 0 3
#12 C23 C 3 60 1 0 0 3
#13 D11 D 1 25 1 0 0 3
#14 D12 D 2 25 1 0 0 3
#15 D13 D 3 25 1 0 0 3
#16 D21 D 1 30 0 1 0 3
#17 D22 D 2 30 0 1 0 3
#18 D23 D 3 30 0 1 0 3
【讨论】:
以上是关于根据其他列的最旧组成员的值重新编码整个组的列的主要内容,如果未能解决你的问题,请参考以下文章