如果前一列包含值,则有条件地填充列?

Posted

技术标签:

【中文标题】如果前一列包含值,则有条件地填充列?【英文标题】:Fill column conditionally if previous column contained value? 【发布时间】:2019-09-15 05:58:00 【问题描述】:

我想使用 dplyr::mutate 有条件地填充一列。新变量的一个级别应该对应于上一列中是否存在值,而另一个级别是“其他”条件。

我有一个数据框:

         group     piece      answer         agreement
        group1     A          noise       good 
        group1     A          silence     good
        group1     A          silence     good
        group1     B          silence     bad
        group1     B          loud_noise  bad
        group1     B          noise       bad
        group1     B          loud_noise  bad
        group1     B          noise       bad
        group2     C          silence     good
        group2     C          silence     good

我想创建一个按组分组的新变量,如果 'bad' 出现在 'agreement' 中,那么值应该是 'inconsistent' 但如果 'agreement' 的所有值都是 'good',那么值应该是“一致的”。

        group     piece      answer     agreement   new_agreement
        group1     A          noise       good       bad
        group1     A          silence     good       bad
        group1     A          silence     good       bad
        group1     B          silence     bad        bad
        group1     B          loud_noise  bad        bad
        group1     B          noise       bad        bad
        group1     B          loud_noise  bad        bad
        group1     B          noise       bad        bad
        group2     C          silence     good       good
        group2     C          silence     good       good

但 case_when 并没有完全做到这一点 - 它只是再次复制相同的变量:

   newdf <- df %>%
    group_by(group) %>%
    mutate(new_agreement = case_when(agreement == 'bad' ~
    "inconsistent", agreement =='good' ~ "consistent")) %>%
    as.data.frame()

【问题讨论】:

您的新列与您的问题描述不符。你能编辑一下这个问题吗? 【参考方案1】:

只需添加any(agreement == 'bad')

df %>%
  group_by(group) %>%
  mutate(new_agreement = case_when(any(agreement == 'bad') ~"inconsistent",
                                   agreement =='good' ~ "consistent"))
    # A tibble: 10 x 5
    # Groups:   group [2]
       group  piece answer     agreement new_agreement
       <fct>  <fct> <fct>      <fct>     <chr>        
     1 group1 A     noise      good      inconsistent 
     2 group1 A     silence    good      inconsistent 
     3 group1 A     silence    good      inconsistent 
     4 group1 B     silence    bad       inconsistent 
     5 group1 B     loud_noise bad       inconsistent 
     6 group1 B     noise      bad       inconsistent 
     7 group1 B     loud_noise bad       inconsistent 
     8 group1 B     noise      bad       inconsistent 
     9 group2 C     silence    good      consistent   
    10 group2 C     silence    good      consistent   

您甚至可以将if_elseany 一起使用:

df %>% 
  group_by(group) %>% 
  mutate(new_agreement= if_else(any(agreement=="bad"), "inconsistent", "consistent") )

【讨论】:

太棒了,我不知道“任何” - 这是一个非常快速的解决方案。谢谢!【参考方案2】:

使用case_when,使用any

library(dplyr)

df %>%
  group_by(group) %>%
  mutate(new_agreement = case_when(
    any(agreement == 'bad') ~ 'inconsistent',
    TRUE ~ 'consistent'))
## A tibble: 10 x 5
## Groups:   group [2]
#   group  piece answer     agreement new_agreement
#   <fct>  <fct> <fct>      <fct>     <chr>        
# 1 group1 A     noise      good      inconsistent 
# 2 group1 A     silence    good      inconsistent 
# 3 group1 A     silence    good      inconsistent 
# 4 group1 B     silence    bad       inconsistent 
# 5 group1 B     loud_noise bad       inconsistent 
# 6 group1 B     noise      bad       inconsistent 
# 7 group1 B     loud_noise bad       inconsistent 
# 8 group1 B     noise      bad       inconsistent 
# 9 group2 C     silence    good      consistent   
#10 group2 C     silence    good      consistent   

dput 格式的数据。

df <-
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L), .Label = c("group1", "group2"), 
class = "factor"), piece = structure(c(1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", "B", "C"), 
class = "factor"), answer = structure(c(2L, 3L, 3L, 
3L, 1L, 2L, 1L, 2L, 3L, 3L), .Label = c("loud_noise", 
"noise", "silence"), class = "factor"), agreement = 
structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), 
.Label = c("bad", "good"), class = "factor")), 
class = "data.frame", row.names = c(NA, -10L))

【讨论】:

以上是关于如果前一列包含值,则有条件地填充列?的主要内容,如果未能解决你的问题,请参考以下文章

根据熊猫中的另一个列值有条件地填充列值

Python Pandas - 用前一列的值向前填充整行

如何有效地填充时间序列?

如何有效地填充时间序列?

EXCEL中与第一列相同的内容自动填充颜色

使用最后 n 个值的平均值或中值填充数据框不同列中的缺失值