在完整的情况下改变某些东西,但保留所有
Posted
技术标签:
【中文标题】在完整的情况下改变某些东西,但保留所有【英文标题】:Mutate something on complete cases, but keep all 【发布时间】:2022-01-22 19:52:44 【问题描述】:我想根据两个变量(国家和政党)的组合生成一个组 ID。这是我的数据:
df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
year = c(2010, 2010, 2010, 2010, 2010, 2010),
party = c(NA, NA, NA, "A", "B", "B"))
这给出了:
country year party
1 BE 2010 <NA>
2 BE 2010 <NA>
3 BE 2010 <NA>
4 NL 2010 A
5 NL 2010 B
6 NL 2010 B
我想要的是:
country year party group
<chr> <dbl> <chr> <int>
1 BE 2010 NA NA
2 BE 2010 NA NA
3 BE 2010 NA NA
4 NL 2010 A 1
5 NL 2010 B 2
6 NL 2010 B 2
我试过了:
df <- df %>%
group_by(country, party) %>%
mutate(group = cur_group_id())
但这给了我:
country year party group
<chr> <dbl> <chr> <int>
1 BE 2010 NA 1
2 BE 2010 NA 1
3 BE 2010 NA 1
4 NL 2010 A 2
5 NL 2010 B 3
6 NL 2010 B 3
但是,我不希望为任何具有缺失值的数据单独分组。同时,我想保留数据。
如果我尝试:
df <- df %>%
group_by(country, party) %>%
filter(!is.na(party)) %>%
mutate(group = cur_group_id())
我明白了:
country year party group
<chr> <dbl> <chr> <int>
1 NL 2010 A 1
2 NL 2010 B 2
3 NL 2010 B 2
我怎样才能只为完整的数据获取这个新变量,同时将不完整的数据保留在数据集中?
谢谢
【问题讨论】:
【参考方案1】:类似以下内容?
library(tidyverse)
df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
year = c(2010, 2010, 2010, 2010, 2010, 2010),
party = c(NA, NA, NA, "A", "B", "B"))
df %>%
group_by(country, party) %>%
mutate(group = if_else(is.na(party), NA_integer_, cur_group_id()))
#> # A tibble: 6 × 4
#> # Groups: country, party [3]
#> country year party group
#> <chr> <dbl> <chr> <int>
#> 1 BE 2010 <NA> NA
#> 2 BE 2010 <NA> NA
#> 3 BE 2010 <NA> NA
#> 4 NL 2010 A 2
#> 5 NL 2010 B 3
#> 6 NL 2010 B 3
如果您希望组以 1(而不是 2)开头:
library(tidyverse)
df %>%
filter(!is.na(party)) %>%
group_by(country, party) %>%
mutate(group = cur_group_id()) %>%
ungroup %>% add_row(filter(df,is.na(party))) %>%
mutate(group = if_else(is.na(party), NA_integer_, group))
#> # A tibble: 6 × 4
#> country year party group
#> <chr> <dbl> <chr> <int>
#> 1 NL 2010 A 1
#> 2 NL 2010 B 2
#> 3 NL 2010 B 2
#> 4 BE 2010 <NA> NA
#> 5 BE 2010 <NA> NA
#> 6 BE 2010 <NA> NA
【讨论】:
【参考方案2】:使用交互
df %>% mutate(group = as.integer(interaction(country, party, drop = TRUE)))
给予:
country year party group
1 BE 2010 <NA> NA
2 BE 2010 <NA> NA
3 BE 2010 <NA> NA
4 NL 2010 A 1
5 NL 2010 B 2
6 NL 2010 B 2
【讨论】:
【参考方案3】:df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
year = c(2010, 2010, 2010, 2010, 2010, 2010),
party = c(NA, NA, NA, "A", "B", "B"))
library(data.table)
setDT(df)[!is.na(party), grp := .GRP, by = party][]
#> country year party grp
#> 1: BE 2010 <NA> NA
#> 2: BE 2010 <NA> NA
#> 3: BE 2010 <NA> NA
#> 4: NL 2010 A 1
#> 5: NL 2010 B 2
#> 6: NL 2010 B 2
由reprex package (v2.0.1) 于 2021 年 12 月 21 日创建
【讨论】:
以上是关于在完整的情况下改变某些东西,但保留所有的主要内容,如果未能解决你的问题,请参考以下文章