对于每一行,找到与特定字符串匹配的单元格并返回列名的最后一个字符

Posted

技术标签:

【中文标题】对于每一行,找到与特定字符串匹配的单元格并返回列名的最后一个字符【英文标题】:For each row, find the cell that matches a specific string and return last character of column name 【发布时间】:2021-07-13 22:29:57 【问题描述】:

以下是一些示例数据。每一行都是不同的参与者。每个参与者完成五次试验。在每次试验中,他们从一组 10 个水果中挑选一个水果(不更换)。

ID trial_1 trial_2 trial_3 trial_4 trial_5
01 apple orange banana peach grapes
02 grapes watermelon mango peach apricot
03 pear grapes mango orange banana
04 watermelon apple peach grapes pear
05 banana peach apple grapes mango

我想创建 10 个新列 - 每个水果一个 - 其中包含试用编号(如果没有试用编号,则为“NA”):

ID trial_1 trial_2 trial_3 trial_4 trial_5 apple apricot banana grapes mango orange peach pear strawberries watermelon
01 apple orange banana peach grapes 1 NA 3 5 NA 2 4 NA NA NA
02 grapes watermelon mango peach apricot NA 5 NA 1 3 NA 4 NA NA 2
03 pear grapes mango orange banana NA NA 5 2 3 4 NA 1 NA NA
04 watermelon apple peach grapes pear 2 NA NA 4 NA NA 3 5 NA 1
05 banana peach apple grapes mango 3 NA 1 4 5 NA 2 NA NA NA

我可以像这样对每个水果列都这样做,但看起来很笨拙:

mutate(apple = ifelse(trial_1 == "apple", 1,
               ifelse(trial_2 == "apple", 2,
               ifelse(trial_2 == "apple", 3,
               ifelse(trial_2 == "apple", 4
               ifelse(trial_2 == "apple", 5, "NA"))))))

我认为有一个更简单、更简洁的解决方案,可能使用rowwise() 来匹配水果名称,然后只返回列名的最后一个字符(即数字)。但我就是搞不定。你能帮忙吗?

【问题讨论】:

【参考方案1】:
library(tidyverse)
df %>%
  pivot_longer(-ID) %>%
  mutate(name = parse_number(name)) %>%
  pivot_wider(names_from = value, values_from = name)

这将给出右侧的列。要将这些附加到原始文件,

left_join(df, 
    # the code above
)

结果

Joining, by = "ID"
# A tibble: 5 x 15
  ID    trial_1    trial_2    trial_3 trial_4 trial_5 apple orange banana peach grapes watermelon mango apricot  pear
  <chr> <chr>      <chr>      <chr>   <chr>   <chr>   <dbl>  <dbl>  <dbl> <dbl>  <dbl>      <dbl> <dbl>   <dbl> <dbl>
1 01    apple      orange     banana  peach   grapes      1      2      3     4      5         NA    NA      NA    NA
2 02    grapes     watermelon mango   peach   apricot    NA     NA     NA     4      1          2     3       5    NA
3 03    pear       grapes     mango   orange  banana     NA      4      5    NA      2         NA     3      NA     1
4 04    watermelon apple      peach   grapes  pear        2     NA     NA     3      4          1    NA      NA     5
5 05    banana     peach      apple   grapes  mango       3     NA      1     2      4         NA     5      NA    NA

来源数据:

tibble::tribble(
   ~ID,     ~trial_1,     ~trial_2, ~trial_3, ~trial_4,  ~trial_5,
  "01",      "apple",     "orange", "banana",  "peach",  "grapes",
  "02",     "grapes", "watermelon",  "mango",  "peach", "apricot",
  "03",       "pear",     "grapes",  "mango", "orange",  "banana",
  "04", "watermelon",      "apple",  "peach", "grapes",    "pear",
  "05",     "banana",      "peach",  "apple", "grapes",   "mango"
  ) -> df

【讨论】:

【参考方案2】:

考虑按照我们想要的顺序创建一个水果向量(base R

nm1 <- c("apple", "apricot", "banana", "grapes", "mango", "orange", 
         "peach", "pear", "strawberries", "watermelon")

然后循环遍历数据的行,使用match 获取索引并将它们分配为新列

df1[nm1] <- t(apply(df1[-1], 1, function(x) match(nm1, x)))

-输出

df1
  ID    trial_1    trial_2 trial_3 trial_4 trial_5 apple apricot banana grapes mango orange peach pear strawberries watermelon
1  1      apple     orange  banana   peach  grapes     1      NA      3      5    NA      2     4   NA           NA         NA
2  2     grapes watermelon   mango   peach apricot    NA       5     NA      1     3     NA     4   NA           NA          2
3  3       pear     grapes   mango  orange  banana    NA      NA      5      2     3      4    NA    1           NA         NA
4  4 watermelon      apple   peach  grapes    pear     2      NA     NA      4    NA     NA     3    5           NA          1
5  5     banana      peach   apple  grapes   mango     3      NA      1      4     5     NA     2   NA           NA         NA

或者另一个base R 选项是

xtabs(ind ~ ID + values, transform(stack(df1[-1]), 
        ind = as.integer(sub(".*_", "", ind)), ID = df1$ID))

数据

df1 <- structure(list(ID = 1:5, trial_1 = c("apple", "grapes", "pear", 
"watermelon", "banana"), trial_2 = c("orange", "watermelon", 
"grapes", "apple", "peach"), trial_3 = c("banana", "mango", "mango", 
"peach", "apple"), trial_4 = c("peach", "peach", "orange", "grapes", 
"grapes"), trial_5 = c("grapes", "apricot", "banana", "pear", 
"mango")), class = "data.frame", row.names = c(NA, -5L))

【讨论】:

【参考方案3】:

这个问题的另一个 tidyverse 解决方案:

library(dplyr)
library(purrr)

nm <- unique(unlist(df1[-1]))

df1 %>%
  bind_cols(nm %>%
              map_dfc(function(a) pmap_dbl(df1[, -1], ~ match(a, c(...)))) %>%
              set_names(nm))


  ID    trial_1    trial_2 trial_3 trial_4 trial_5 apple grapes pear watermelon banana orange
1  1      apple     orange  banana   peach  grapes     1      5   NA         NA      3      2
2  2     grapes watermelon   mango   peach apricot    NA      1   NA          2     NA     NA
3  3       pear     grapes   mango  orange  banana    NA      2    1         NA      5      4
4  4 watermelon      apple   peach  grapes    pear     2      4    5          1     NA     NA
5  5     banana      peach   apple  grapes   mango     3      4   NA         NA      1     NA
  peach mango apricot
1     4    NA      NA
2     4     3       5
3    NA     3      NA
4     3    NA      NA
5     2     5      NA

【讨论】:

以上是关于对于每一行,找到与特定字符串匹配的单元格并返回列名的最后一个字符的主要内容,如果未能解决你的问题,请参考以下文章

如何在 VBA 中创建一个函数以返回与记录集中每条记录的特定条件匹配的列名?

连续选择多个单元格并找到它们的总和 jquery - kendo ui

PHP / MYSQL - 返回变量匹配数据点的列名[重复]

如何将列名与字典键匹配并向计数器添加值

熊猫,对于每一行获取两列之间最大列的值

R获取矩阵中每一行的最小值,并返回行名和列名