对于每一行,找到与特定字符串匹配的单元格并返回列名的最后一个字符
Posted
技术标签:
【中文标题】对于每一行,找到与特定字符串匹配的单元格并返回列名的最后一个字符【英文标题】:For each row, find the cell that matches a specific string and return last character of column name 【发布时间】:2021-07-13 22:29:57 【问题描述】:以下是一些示例数据。每一行都是不同的参与者。每个参与者完成五次试验。在每次试验中,他们从一组 10 个水果中挑选一个水果(不更换)。
ID | trial_1 | trial_2 | trial_3 | trial_4 | trial_5 |
---|---|---|---|---|---|
01 | apple | orange | banana | peach | grapes |
02 | grapes | watermelon | mango | peach | apricot |
03 | pear | grapes | mango | orange | banana |
04 | watermelon | apple | peach | grapes | pear |
05 | banana | peach | apple | grapes | mango |
我想创建 10 个新列 - 每个水果一个 - 其中包含试用编号(如果没有试用编号,则为“NA”):
ID | trial_1 | trial_2 | trial_3 | trial_4 | trial_5 | apple | apricot | banana | grapes | mango | orange | peach | pear | strawberries | watermelon |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01 | apple | orange | banana | peach | grapes | 1 | NA | 3 | 5 | NA | 2 | 4 | NA | NA | NA |
02 | grapes | watermelon | mango | peach | apricot | NA | 5 | NA | 1 | 3 | NA | 4 | NA | NA | 2 |
03 | pear | grapes | mango | orange | banana | NA | NA | 5 | 2 | 3 | 4 | NA | 1 | NA | NA |
04 | watermelon | apple | peach | grapes | pear | 2 | NA | NA | 4 | NA | NA | 3 | 5 | NA | 1 |
05 | banana | peach | apple | grapes | mango | 3 | NA | 1 | 4 | 5 | NA | 2 | NA | NA | NA |
我可以像这样对每个水果列都这样做,但看起来很笨拙:
mutate(apple = ifelse(trial_1 == "apple", 1,
ifelse(trial_2 == "apple", 2,
ifelse(trial_2 == "apple", 3,
ifelse(trial_2 == "apple", 4
ifelse(trial_2 == "apple", 5, "NA"))))))
我认为有一个更简单、更简洁的解决方案,可能使用rowwise()
来匹配水果名称,然后只返回列名的最后一个字符(即数字)。但我就是搞不定。你能帮忙吗?
【问题讨论】:
【参考方案1】:library(tidyverse)
df %>%
pivot_longer(-ID) %>%
mutate(name = parse_number(name)) %>%
pivot_wider(names_from = value, values_from = name)
这将给出右侧的列。要将这些附加到原始文件,
left_join(df,
# the code above
)
结果
Joining, by = "ID"
# A tibble: 5 x 15
ID trial_1 trial_2 trial_3 trial_4 trial_5 apple orange banana peach grapes watermelon mango apricot pear
<chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01 apple orange banana peach grapes 1 2 3 4 5 NA NA NA NA
2 02 grapes watermelon mango peach apricot NA NA NA 4 1 2 3 5 NA
3 03 pear grapes mango orange banana NA 4 5 NA 2 NA 3 NA 1
4 04 watermelon apple peach grapes pear 2 NA NA 3 4 1 NA NA 5
5 05 banana peach apple grapes mango 3 NA 1 2 4 NA 5 NA NA
来源数据:
tibble::tribble(
~ID, ~trial_1, ~trial_2, ~trial_3, ~trial_4, ~trial_5,
"01", "apple", "orange", "banana", "peach", "grapes",
"02", "grapes", "watermelon", "mango", "peach", "apricot",
"03", "pear", "grapes", "mango", "orange", "banana",
"04", "watermelon", "apple", "peach", "grapes", "pear",
"05", "banana", "peach", "apple", "grapes", "mango"
) -> df
【讨论】:
【参考方案2】:考虑按照我们想要的顺序创建一个水果向量(base R
)
nm1 <- c("apple", "apricot", "banana", "grapes", "mango", "orange",
"peach", "pear", "strawberries", "watermelon")
然后循环遍历数据的行,使用match
获取索引并将它们分配为新列
df1[nm1] <- t(apply(df1[-1], 1, function(x) match(nm1, x)))
-输出
df1
ID trial_1 trial_2 trial_3 trial_4 trial_5 apple apricot banana grapes mango orange peach pear strawberries watermelon
1 1 apple orange banana peach grapes 1 NA 3 5 NA 2 4 NA NA NA
2 2 grapes watermelon mango peach apricot NA 5 NA 1 3 NA 4 NA NA 2
3 3 pear grapes mango orange banana NA NA 5 2 3 4 NA 1 NA NA
4 4 watermelon apple peach grapes pear 2 NA NA 4 NA NA 3 5 NA 1
5 5 banana peach apple grapes mango 3 NA 1 4 5 NA 2 NA NA NA
或者另一个base R
选项是
xtabs(ind ~ ID + values, transform(stack(df1[-1]),
ind = as.integer(sub(".*_", "", ind)), ID = df1$ID))
数据
df1 <- structure(list(ID = 1:5, trial_1 = c("apple", "grapes", "pear",
"watermelon", "banana"), trial_2 = c("orange", "watermelon",
"grapes", "apple", "peach"), trial_3 = c("banana", "mango", "mango",
"peach", "apple"), trial_4 = c("peach", "peach", "orange", "grapes",
"grapes"), trial_5 = c("grapes", "apricot", "banana", "pear",
"mango")), class = "data.frame", row.names = c(NA, -5L))
【讨论】:
【参考方案3】:这个问题的另一个 tidyverse 解决方案:
library(dplyr)
library(purrr)
nm <- unique(unlist(df1[-1]))
df1 %>%
bind_cols(nm %>%
map_dfc(function(a) pmap_dbl(df1[, -1], ~ match(a, c(...)))) %>%
set_names(nm))
ID trial_1 trial_2 trial_3 trial_4 trial_5 apple grapes pear watermelon banana orange
1 1 apple orange banana peach grapes 1 5 NA NA 3 2
2 2 grapes watermelon mango peach apricot NA 1 NA 2 NA NA
3 3 pear grapes mango orange banana NA 2 1 NA 5 4
4 4 watermelon apple peach grapes pear 2 4 5 1 NA NA
5 5 banana peach apple grapes mango 3 4 NA NA 1 NA
peach mango apricot
1 4 NA NA
2 4 3 5
3 NA 3 NA
4 3 NA NA
5 2 5 NA
【讨论】:
以上是关于对于每一行,找到与特定字符串匹配的单元格并返回列名的最后一个字符的主要内容,如果未能解决你的问题,请参考以下文章
如何在 VBA 中创建一个函数以返回与记录集中每条记录的特定条件匹配的列名?
连续选择多个单元格并找到它们的总和 jquery - kendo ui