r 按 ID 按年份折叠
Posted
技术标签:
【中文标题】r 按 ID 按年份折叠【英文标题】:r collapse by year by ID 【发布时间】:2021-11-21 13:04:28 【问题描述】:我有一个像这样每个 ID 有多行的数据集
ID From To State
1 2004 2005 MD
1 2005 2005 MD
1 2005 2012 DC
1 2012 2015 DC
1 2015 2020 DC
1 2012 2013 MD
1 2013 2016 MD
1 2016 2019 MD
1 2019 2020 MD
2 2003 2004 OR
2 2004 2008 OR
2 2008 2013 AZ
2 2013 2015 AZ
我的目标是折叠多个 From 和 To 列以创建一个平滑的时间线,例如
ID From To State
1 2004 2005 MD
1 2005 2020 DC
1 2012 2020 MD
2 2003 2008 OR
2 2008 2015 AZ
不知道如何做到这一点。非常感谢您的帮助。谢谢。
【问题讨论】:
Collapse rows with overlapping ranges 和其中的几个“链接”可能会让你继续前进。 【参考方案1】:按'ID'、'State'和'State'的run-length-id分组,得到'From'的first
和'To'的last
library(dplyr)
library(data.table)
df1 %>%
group_by(ID, State, grp = rleid(State)) %>%
summarise(From = first(From), To = last(To), .groups = 'drop') %>%
select(-grp)
-输出
# A tibble: 5 × 4
ID State From To
<int> <chr> <int> <int>
1 1 DC 2005 2020
2 1 MD 2004 2005
3 1 MD 2012 2020
4 2 AZ 2008 2015
5 2 OR 2003 2008
数据
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), From = c(2004L, 2005L, 2005L, 2012L, 2015L, 2012L,
2013L, 2016L, 2019L, 2003L, 2004L, 2008L, 2013L), To = c(2005L,
2005L, 2012L, 2015L, 2020L, 2013L, 2016L, 2019L, 2020L, 2004L,
2008L, 2013L, 2015L), State = c("MD", "MD", "DC", "DC", "DC",
"MD", "MD", "MD", "MD", "OR", "OR", "AZ", "AZ")),
class = "data.frame", row.names = c(NA,
-13L))
【讨论】:
以上是关于r 按 ID 按年份折叠的主要内容,如果未能解决你的问题,请参考以下文章