将列中的 NA 替换为相邻列中的值
Posted
技术标签:
【中文标题】将列中的 NA 替换为相邻列中的值【英文标题】:Replace NA in column with value in adjacent column 【发布时间】:2013-03-15 19:47:01 【问题描述】:此问题与标题相似的帖子 (replace NA in an R vector with adjacent values) 相关。我想扫描数据框中的一列并将 NA 替换为相邻单元格中的值。在上述帖子中,解决方案不是用来自相邻向量的值(例如数据矩阵中的相邻元素)替换 NA,而是对固定值进行条件替换。以下是我的问题的可重现示例:
UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300)
STATUS <-c('ACTIVE','INACTIVE','ACTIVE','ACTIVE','INACTIVE','ACTIVE','INACTIVE','ACTIVE','ACTIVE',
'ACTIVE','INACTIVE')
TERMINATED <- c('1999-07-06' , '2008-12-05' , '2000-08-18' , '2000-08-18' ,'2000-08-18' ,'2008-08-18',
'2008-08-18','2006-09-19','2006-09-19' ,'2006-09-19' ,'1999-03-15')
START <- c('2007-04-23','2008-12-06','2004-06-01','2007-02-01','2008-04-19','2010-11-29','2010-12-30',
'2007-10-29','2008-02-05','2008-06-30','2009-02-07')
STOP <- c('2008-12-05','4712-12-31','2007-01-31','2008-04-18','2010-11-28','2010-12-29','4712-12-31',
'2008-02-04','2008-06-29','2009-02-06','4712-12-31')
#creating dataframe
TEST <- data.frame(UNIT,STATUS,TERMINATED,START,STOP); TEST
UNIT STATUS TERMINATED START STOP
1 NA ACTIVE 1999-07-06 2007-04-23 2008-12-05
2 NA INACTIVE 2008-12-05 2008-12-06 4712-12-31
3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31
4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18
5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29
7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04
9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29
10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06
11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31
#using the syntax for a conditional replace and hoping it works :/
TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS; TEST
UNIT STATUS TERMINATED START STOP
1 1 ACTIVE 1999-07-06 2007-04-23 2008-12-05
2 2 INACTIVE 2008-12-05 2008-12-06 4712-12-31
3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31
4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18
5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29
7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04
9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29
10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06
11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31
结果应该是:
UNIT STATUS TERMINATED START STOP
1 ACTIVE ACTIVE 1999-07-06 2007-04-23 2008-12-05
2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31
4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18
5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29
7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04
9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29
10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06
11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31
【问题讨论】:
也许可以试试TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS[is.na(TEST$UNIT)]; TEST
您不能在数据框中的列中混合类型。
【参考方案1】:
它不起作用,因为地位是一个因素。当您将因子与数字混合时,数字的限制最少。通过强制状态为字符,您将获得您所追求的结果,并且该列现在是一个字符向量:
TEST$UNIT[is.na(TEST$UNIT)] <- as.character(TEST$STATUS[is.na(TEST$UNIT)])
## UNIT STATUS TERMINATED START STOP
## 1 ACTIVE ACTIVE 1999-07-06 2007-04-23 2008-12-05
## 2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
## 3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31
## 4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18
## 5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
## 6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29
## 7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
## 8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04
## 9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29
## 10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06
## 11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31
【讨论】:
这对我不起作用,因为我在相邻列中也有一些带有 NA 的行。无论如何要解决它?我的两列都是 int 的。【参考方案2】:你必须这样做
TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS[is.na(TEST$UNIT)]
这样该值将被替换为相邻的值。否则,要替换的值的数量与要替换它们的值之间存在不匹配。这将导致按行顺序替换值。它在这种情况下有效,因为被替换的两个值是前两个。
【讨论】:
我认为这可以作为答案。当然,解决方案与其他人给出的解决方案相同,但是您已经添加了对正在发生的事情的解释。在我看来,这不应该是评论。【参考方案3】:TEST$UNIT = ifelse(is.na(TEST$UNIT), paste(TEST$STATUS),paste(TEST$UNIT));TEST
UNIT STATUS TERMINATED START STOP
1 ACTIVE ACTIVE 1999-07-06 2007-04-23 2008-12-05
2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
3 200 ACTIVE 2000-08-18 2004-06-01 2007-01-31
4 200 ACTIVE 2000-08-18 2007-02-01 2008-04-18
5 200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6 200 ACTIVE 2008-08-18 2010-11-29 2010-12-29
7 200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8 300 ACTIVE 2006-09-19 2007-10-29 2008-02-04
9 300 ACTIVE 2006-09-19 2008-02-05 2008-06-29
10 300 ACTIVE 2006-09-19 2008-06-30 2009-02-06
11 300 INACTIVE 1999-03-15 2009-02-07 4712-12-31
【讨论】:
以上是关于将列中的 NA 替换为相邻列中的值的主要内容,如果未能解决你的问题,请参考以下文章
使用 dplyr [重复] 有条件地将一列中的值替换为另一列中的值