tidyverse 和 dplyr:根据其他列有条件地替换列中的值
Posted
技术标签:
【中文标题】tidyverse 和 dplyr:根据其他列有条件地替换列中的值【英文标题】:tidyverse and dplyr: Conditional replacement of values in a column based on other column [duplicate] 【发布时间】:2020-11-17 11:37:21 【问题描述】:我想通过A3
改变列A4
,但如果Total == 63
则将A3
的值减少1。我在这里做错了什么
tb1 %>%
mutate(A4 = replace(A3, Total == 63, A3-1))
完整的数据代码在这里
library(tidyverse)
tb1 <-
structure(
list(
A1 = c(16, 11, 16, 18, 20, 19, 16, 18, 20, 15,
17, 19, 19, 19, 16, 19, 16, 15, 19, 19, 16, 18, 18, 19, 19, 18,
20, 18, 19, 19, 19, 19, 17, 19, 17, 16, 18, 19, 16, 18, 17, 19,
19, 20, 17, 16, 18, 16, 15, 19, 19, 17, 20, 18, 16, 19, 19, 15,
17, 17, 19, 19, 16, 17, 18, 19, 17, 19, 17, 15, 19, 16, 17
)
, A2 = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8
)
, A3 = c(33, 34, 38, 36, 36, 34, 41, 36, 40, 38, 38, 41, 38, 34, 33, 36,
41, 40, 41, 38, 41, 33, 40, 38, 40, 38, 41, 41, 40, 41, 40,
38, 34, 40, 36, 41, 40, 40, 33, 38, 36, 41, 40, 40, 28, 41,
40, 41, 33, 41, 36, 36, 40, 34, 41, 41, 38, 38, 41, 38, 41,
41, 36, 40, 38, 38, 40, 41, 38, 22, 36, 34, 38
)
, Total = c(57, 53, 62, 62, 64, 61, 65, 62, 68, 61, 63, 68, 65, 61, 57, 63,
65, 63, 68, 65, 65, 59, 66, 65, 67, 64, 69, 67, 67, 68, 67,
65, 59, 67, 61, 65, 66, 67, 57, 64, 61, 68, 67, 68, 53, 65,
66, 65, 56, 68, 63, 61, 68, 60, 65, 68, 65, 61, 66, 63, 68,
68, 60, 65, 64, 65, 65, 68, 63, 45, 63, 58, 63
)
)
, class = "data.frame"
, row.names = c(NA, -73L)
)
tb1 %>%
filter(Total == 63)
#> A1 A2 A3 Total
#> 1 17 8 38 63
#> 2 19 8 36 63
#> 3 15 8 40 63
#> 4 19 8 36 63
#> 5 17 8 38 63
#> 6 17 8 38 63
#> 7 19 8 36 63
#> 8 17 8 38 63
tb2 <-
tb1 %>%
mutate(A4 = replace(A3, Total == 63, A3-1)) %>%
mutate(Total = A1 + A2 + A3)
#> Warning: Problem with `mutate()` input `A4`.
#> x number of items to replace is not a multiple of replacement length
#> ℹ Input `A4` is `replace(A3, Total == 63, A3 - 1)`.
tb2 %>%
filter(Total == 62)
#> A1 A2 A3 Total
#> 1 16 8 38 62
#> 2 18 8 36 62
#> 3 18 8 36 62
【问题讨论】:
【参考方案1】:你最好在这里使用ifelse
:
library(dplyr)
tb1 %>% mutate(A4 = ifelse(Total == 63, A3 -1, A3))
至于为什么replace
不起作用,如果你检查replace
的源代码:
replace
function (x, list, values)
x[list] <- values
x
在对list
进行子集化后,它将values
分配给x
。
当你使用时:
tb1 %>% mutate(A4 = replace(A3, Total == 63, A3-1))
您的values
的长度为length(tb1$A3)
,但list
的长度为sum(tb1$Total == 63)
,它们不匹配,因此您收到 number of items to replace is not a multiple of replacement length
的警告,因为它尝试回收这些值但长度仍然不相等。
如果你想让replace
工作,你可以试试:
tb1 %>% mutate(A4 = replace(A3, Total == 63, A3[Total == 63] -1))
但正如我所提到的,在这里使用ifelse
更容易。
【讨论】:
你也可以避免多次引用 A3 -tb1 %>% mutate(A4 = A3 - if_else(Total == 63, 1, 0))
或tb1 %>% mutate(A4 = A3 - as.integer(Total == 63))
以上是关于tidyverse 和 dplyr:根据其他列有条件地替换列中的值的主要内容,如果未能解决你的问题,请参考以下文章
dplyr | tidyverse:将键值对集合成单个键值(长格式)