根据存储在 data.frame 中的单独字符向量，有条件地重命名列表中的列

Posted 2023-03-11

技术标签:

【中文标题】根据存储在 data.frame 中的单独字符向量，有条件地重命名列表中的列【英文标题】：Conditionally rename columns in list, based on separate character vector stored in data.frame 【发布时间】：2021-09-13 21:27:46 【问题描述】：

我有一个名为 lst 的 tibbles 列表：

> lst
[[1]]
# A tibble: 2 x 4
  temp1    temp2 temp3    id
  <chr>    <dbl> <dbl> <dbl>
1 Metric 1   150  1234   201
2 Metric 2   190  3456   201

[[2]]
# A tibble: 2 x 4
  temp1    temp2 temp3    id
  <chr>    <dbl> <dbl> <dbl>
1 Metric 1   190  1231   202
2 Metric 2   120  3356   202

我还有一个单独的tibble，称为df，其中有一列包含用于重命名lst 中的列的字符向量：

# A tibble: 2 x 2
  colnames                                                      id
  <chr>                                                      <dbl>
1 c(' ','Ranking 1 for School A', 'Ranking 2 for School A')    201
2 c(' ', 'Ranking 1 for School B', 'Ranking 2 for School B')   202

我正在寻找一种方法，最好是使用 purrr 中的某种形式的 map，以删除 id 列，并根据df.

非常感谢任何建议。提前谢谢你。

期望的输出：

[[1]]
# A tibble: 2 x 3
  ` `      `Ranking 1 for School A` `Ranking 2 for School A`
  <chr>                       <dbl>                    <dbl>
1 Metric 1                      150                     1234
2 Metric 2                      190                     3456

[[2]]
# A tibble: 2 x 3
  ` `      `Ranking 1 for School B` `Ranking 2 for School B`
  <chr>                       <dbl>                    <dbl>
1 Metric 1                      190                     1231
2 Metric 2                      120                     3356

数据：

lst <- list(structure(list(temp1 = c("Metric 1", "Metric 2"), temp2 = c(150, 
190), temp3 = c(1234, 3456), id = c(201, 201)), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame")), structure(list(
    temp1 = c("Metric 1", "Metric 2"), temp2 = c(190, 120), temp3 = c(1231, 
    3356), id = c(202, 202)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame")))

df <- structure(list(colnames = c("c(' ','Ranking 1 for School A', 'Ranking 2 for School A')", 
"c(' ', 'Ranking 1 for School B', 'Ranking 2 for School B')"), 
    id = c(201, 202)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))

【问题讨论】：

【参考方案1】：

从表面上看，这就是我会做的，

library(tidyverse)

1:length(lst) %>% map(
        .f = function(x) 
                
                # Store list
                tmp <- lst[[x]] %>% 
                        select(-"id")
                
                
                # Rename Colums
                colnames(tmp) <- paste((df[x,"colnames"])) %>%
                                    parse(text = .) %>% 
                                       eval()
                
                # Return the modified data 
                tmp
                
        
)

注意： 这显然是假设lst 和colnames 是按顺序存储的，因此list 中的index 1 使用df[,"colnames"] 中的index 1。

【讨论】：

这太棒了，我肯定需要更好地理解parse 和eval。谢谢！不客气！不过，我会修改我的答案，我发现我忘了按您的意愿删除 ID 列！【参考方案2】：

您也可以使用以下解决方案。首先我们将第二个数据框中的colname变量分开：

library(dplyr)
library(purrr)

df %>%
  mutate(colnames = gsub("[c()]", "", colnames)) %>%
  separate(colnames, into = paste("col", 1:3, sep = "_"), sep = ",\\s?") -> DF

DF
# A tibble: 2 x 4
  col_1 col_2                   col_3                      id
  <chr> <chr>                   <chr>                   <dbl>
1 ' '   'Ranking 1 for Shool A' 'Ranking 2 for Shool A'   201
2 ' '   'Ranking 1 for Shool B' 'Ranking 2 for Shool B'   202

然后我们使用它来更改列表元素中的旧列名：

lst %>%
  map(~ .x %>% 
        set_names(DF %>% filter(id == .x$id) %>% unlist()) %>%
        select(-length(.)))

[[1]]
# A tibble: 2 x 3
  `' '`    `'Ranking 1 for Shool A'` `'Ranking 2 for Shool A'`
  <chr>                        <dbl>                     <dbl>
1 Metric 1                       150                      1234
2 Metric 2                       190                      3456

[[2]]
# A tibble: 2 x 3
  `' '`    `'Ranking 1 for Shool B'` `'Ranking 2 for Shool B'`
  <chr>                        <dbl>                     <dbl>
1 Metric 1                       190                      1231
2 Metric 2                       120                      3356

【讨论】：

【参考方案3】：

library(tidyverse)

map2(lst, pmap(df, ~.), ~ set_names(.x[-4], eval(parse(text = .y))))
#> [[1]]
#> # A tibble: 2 x 3
#>   ` `      `Ranking 1 for School A` `Ranking 2 for School A`
#>   <chr>                       <dbl>                    <dbl>
#> 1 Metric 1                      150                     1234
#> 2 Metric 2                      190                     3456
#> 
#> [[2]]
#> # A tibble: 2 x 3
#>   ` `      `Ranking 1 for School B` `Ranking 2 for School B`
#>   <chr>                       <dbl>                    <dbl>
#> 1 Metric 1                      190                     1231
#> 2 Metric 2                      120                     3356

^{由reprex package (v2.0.0) 于 2021-07-01 创建}

【讨论】：

感谢您提出的解决方案，但这不适用于提供的数据。您的解决方案有效的原因是 nms 中的值实际上是字符向量，而不是存储在 data.frame 或 tibble 中的字符向量【参考方案4】：

我看到你更喜欢 tidyverse 的答案，并且已经有至少一个好的答案。所以我想我会分享一种非 tidyverse 的方法，以防后来出现的任何人感兴趣......

library(qdapRegex)
for(i in 1:length(lst))
  # extract field names based on 'id'
  new_names <- qdapRegex::rm_between(df[df$id == lst[[i]]$id,"colnames"], "'", "'", extract = TRUE)
  # rename fields
  names(lst[[i]]) <- new_names[[1]]
  # drop NA field
  lst[[i]] <- lst[[i]][!is.na(names(lst[[i]]))]

【讨论】：

以上是关于根据存储在 data.frame 中的单独字符向量，有条件地重命名列表中的列的主要内容，如果未能解决你的问题，请参考以下文章