根据R中现有行的值插入行
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了根据R中现有行的值插入行相关的知识,希望对你有一定的参考价值。
我有纵向数据,我想根据现有行中多列的值插入新行。
对于任何个人,只要前一个发布日期和下一个入场日期之间存在差距,我想添加一个新行,其中包含前一个发布日期作为录取日期,下一个录取日期作为发布日期,因此没有“差距”。如果个人的最终观察有发布日期,我还想添加一个新行,其中前一个发布日期为入场日期,NA为发布日期。
我认为这可能需要data.table或dplyr的add_row但我不知道如何。我见过的其他SO问题基于组中的行数或在每个现有行之前/之后添加新行。如果我能弄清楚如何在正确的位置插入行,我想我可以使用dplyr的滞后和引导函数来填充正确的日期。
以下是一些示例数据:
myData <- data.frame(ID = c(2, 2, 2, 3, 3, 4, 5, 5, 5, 5),
TERM_TYPE = c("Parole", "Prison", "Parole",
"Parole", "Prison", "Parole",
"Parole", "Prison", "Parole", "Prison"),
ADMISSION_DATE = c("2006-10-15", "2008-09-15", "2009-01-15",
"2006-01-15", "2006-12-15", "2006-12-15",
"2006-04-15", "2013-01-15", "2013-12-15", "2015-01-15"),
RELEASE_DATE = c("2008-09-15","2009-01-15", "2010-12-15",
"2006-10-15", NA, "2008-06-15",
"2010-01-15", "2013-12-15", "2015-01-15", NA),
stringsAsFactors = FALSE)
我希望看起来像这样:
ID TERM_TYPE ADMISSION_DATE RELEASE_DATE
1 2 Parole 2006-10-15 2008-09-15
2 2 Prison 2008-09-15 2009-01-15
3 2 Parole 2009-01-15 2010-12-15
4 2 Not supervised 2010-12-15 <NA>
5 3 Parole 2006-01-15 2006-10-15
6 3 Prison 2006-10-15 <NA>
7 4 Parole 2006-12-15 2008-06-15
8 4 Not supervised 2008-06-15 <NA>
9 5 Parole 2006-04-15 2010-01-15
10 5 Not supervised 2010-01-15 2013-01-15
11 5 Prison 2013-01-15 2013-12-15
12 5 Parole 2013-12-15 2015-01-15
13 5 Prison 2015-01-15 <NA>
答案
可能有更简洁的方法来做到这一点,但我认为这显示了潜在的想法。基本上,我结合了三个表:
1)原始数据2)缺失的间隙时期3)已知发布日期之后的时段
#2和#3是通过从原始行中提取相关行并修改它们来显示我们想要的内容而创建的。例如,#2查找自前一行以来有间隙的行,并修改以使该行看起来像缺失的时间段。
# First, change dates into date formats
library(tidyverse)
library(lubridate)
myData <- myData %>%
mutate_at(vars(contains("DATE")), ymd)
# Create table #2
myData_fill_gaps <- myData %>%
group_by(ID) %>%
mutate(gap_days = (ADMISSION_DATE - lag(RELEASE_DATE)) / ddays(1),
ADM_temp = lag(RELEASE_DATE),
REL_temp = ADMISSION_DATE) %>%
ungroup() %>%
filter(gap_days > 0) %>% # Only keep rows relating to gaps
mutate(TERM_TYPE = "Not supervised") %>%
select(ID, TERM_TYPE, ADMISSION_DATE = ADM_temp, RELEASE_DATE = REL_temp)
# Create table #3
myData_add_release_NA <- myData %>%
group_by(ID) %>%
slice(n()) %>% # Only keep last row for each ID
filter(!is.na(RELEASE_DATE)) %>% # Only keep if lacking an NA in RELEASE_DATE
mutate(TERM_TYPE = "Not supervised",
ADMISSION_DATE = RELEASE_DATE,
RELEASE_DATE = NA_real_)
myData_combined <- bind_rows(
myData,
myData_fill_gaps,
myData_add_release_NA
) %>%
arrange(ID, ADMISSION_DATE)
产量
> myData_combined
ID TERM_TYPE ADMISSION_DATE RELEASE_DATE
1 2 Parole 2006-10-15 2008-09-15
2 2 Prison 2008-09-15 2009-01-15
3 2 Parole 2009-01-15 2010-12-15
4 2 Not supervised 2010-12-15 <NA>
5 3 Parole 2006-01-15 2006-10-15
6 3 Not supervised 2006-10-15 2006-12-15
7 3 Prison 2006-12-15 <NA>
8 4 Parole 2006-12-15 2008-06-15
9 4 Not supervised 2008-06-15 <NA>
10 5 Parole 2006-04-15 2010-01-15
11 5 Not supervised 2010-01-15 2013-01-15
12 5 Prison 2013-01-15 2013-12-15
13 5 Parole 2013-12-15 2015-01-15
14 5 Prison 2015-01-15 <NA>
以上是关于根据R中现有行的值插入行的主要内容,如果未能解决你的问题,请参考以下文章