tidyverse 和 dplyr:根据其他列有条件地替换列中的值

Posted

技术标签:

【中文标题】tidyverse 和 dplyr:根据其他列有条件地替换列中的值【英文标题】:tidyverse and dplyr: Conditional replacement of values in a column based on other column [duplicate] 【发布时间】:2020-11-17 11:37:21 【问题描述】:

我想通过A3 改变列A4,但如果Total == 63 则将A3 的值减少1。我在这里做错了什么

tb1 %>% 
  mutate(A4 = replace(A3, Total == 63, A3-1))

完整的数据代码在这里

library(tidyverse)

tb1 <-
structure(
  list(
    A1 = c(16, 11, 16, 18, 20, 19, 16, 18, 20, 15, 
          17, 19, 19, 19, 16, 19, 16, 15, 19, 19, 16, 18, 18, 19, 19, 18, 
          20, 18, 19, 19, 19, 19, 17, 19, 17, 16, 18, 19, 16, 18, 17, 19, 
          19, 20, 17, 16, 18, 16, 15, 19, 19, 17, 20, 18, 16, 19, 19, 15, 
          17, 17, 19, 19, 16, 17, 18, 19, 17, 19, 17, 15, 19, 16, 17
          )
        , A2 = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
              8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
              8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
              8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8
              )
      , A3 = c(33, 34, 38, 36, 36, 34, 41, 36, 40, 38, 38, 41, 38, 34, 33, 36, 
            41, 40, 41, 38, 41, 33, 40, 38, 40, 38, 41, 41, 40, 41, 40, 
            38, 34, 40, 36, 41, 40, 40, 33, 38, 36, 41, 40, 40, 28, 41, 
            40, 41, 33, 41, 36, 36, 40, 34, 41, 41, 38, 38, 41, 38, 41, 
            41, 36, 40, 38, 38, 40, 41, 38, 22, 36, 34, 38
            )
        , Total = c(57, 53, 62, 62, 64, 61, 65, 62, 68, 61, 63, 68, 65, 61, 57, 63, 
        65, 63, 68, 65, 65, 59, 66, 65, 67, 64, 69, 67, 67, 68, 67, 
        65, 59, 67, 61, 65, 66, 67, 57, 64, 61, 68, 67, 68, 53, 65, 
        66, 65, 56, 68, 63, 61, 68, 60, 65, 68, 65, 61, 66, 63, 68, 
        68, 60, 65, 64, 65, 65, 68, 63, 45, 63, 58, 63
        )
    )
  , class = "data.frame"
  , row.names = c(NA, -73L)
  )

tb1 %>% 
  filter(Total == 63)
#>   A1 A2 A3 Total
#> 1 17  8 38    63
#> 2 19  8 36    63
#> 3 15  8 40    63
#> 4 19  8 36    63
#> 5 17  8 38    63
#> 6 17  8 38    63
#> 7 19  8 36    63
#> 8 17  8 38    63

tb2 <- 
  tb1 %>% 
  mutate(A4 = replace(A3, Total == 63, A3-1)) %>% 
  mutate(Total = A1 + A2 + A3)
#> Warning: Problem with `mutate()` input `A4`.
#> x number of items to replace is not a multiple of replacement length
#> ℹ Input `A4` is `replace(A3, Total == 63, A3 - 1)`.

tb2 %>% 
  filter(Total == 62)
#>   A1 A2 A3 Total
#> 1 16  8 38    62
#> 2 18  8 36    62
#> 3 18  8 36    62

【问题讨论】:

【参考方案1】:

你最好在这里使用ifelse

library(dplyr)
tb1 %>% mutate(A4 = ifelse(Total == 63, A3 -1, A3))

至于为什么replace不起作用,如果你检查replace的源代码:

replace
function (x, list, values) 

   x[list] <- values
   x

在对list 进行子集化后,它将values 分配给x

当你使用时:

tb1 %>% mutate(A4 = replace(A3, Total == 63, A3-1))

您的values 的长度为length(tb1$A3),但list 的长度为sum(tb1$Total == 63),它们不匹配,因此您收到 number of items to replace is not a multiple of replacement length 的警告,因为它尝试回收这些值但长度仍然不相等。

如果你想让replace 工作,你可以试试:

tb1 %>%  mutate(A4 = replace(A3, Total == 63, A3[Total == 63] -1))

但正如我所提到的,在这里使用ifelse 更容易。

【讨论】:

你也可以避免多次引用 A3 - tb1 %&gt;% mutate(A4 = A3 - if_else(Total == 63, 1, 0)) tb1 %&gt;% mutate(A4 = A3 - as.integer(Total == 63))

以上是关于tidyverse 和 dplyr:根据其他列有条件地替换列中的值的主要内容,如果未能解决你的问题,请参考以下文章

dplyr | tidyverse:将键值对集合成单个键值(长格式)

用R的dplyr进行数据转换

如何根据另一个变量的值使用 dplyr::Distinct

仅当列存在时才执行 dplyr 操作

如何在dplyr中基于ntile()-groups应用变异?

R语言 | 数据操作dplyr包