在R中将行转换为列

Posted 2023-02-14

技术标签:

【中文标题】在R中将行转换为列【英文标题】：Convert rows into columns in R 【发布时间】：2022-01-12 18:04:20 【问题描述】：

我有这个示例数据集，我想将其转换为以下格式：

Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5,1,2,3,1,2,2.5)

df_before <- data.frame(Type, Level, Estimate)


     Type       Level Estimate
1     AGE       18-25      1.5
2     AGE       26-70      1.0
3  REGION      London      2.0
4  REGION Southampton      3.0
5  REGION   Newcastle      1.0
6 DRIVERS           1      2.0
7 DRIVERS           2      2.5

基本上，我想将数据集转换为以下格式。我已经尝试使用函数dcast()，但它似乎不起作用。

    AGE Estimate_AGE      REGION Estimate_REGION DRIVERS Estimate_DRIVERS
1 18-25          1.5      London               2       1              2.0
2 26-70          1.0 Southampton               3       2              2.5
3  <NA>           NA   Newcastle               1    <NA>               NA

【问题讨论】：

这能回答你的问题吗？ How to reshape data from long to wide format 不是我的数据集格式不同您可能想要重构您的数据，因为在同一列中混合使用字符串和数值并不好。 【参考方案1】：

基于tidyr::pivot_wider和purrr::map_dfc的解决方案：

library(tidyverse)

Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5,1,2,3,1,2,2.5)
df_before <- data.frame(Type, Level, Estimate)

df_before %>% 
  pivot_wider(names_from=Type, values_from=c(Level, Estimate), values_fn=list) %>% 
  map_dfc(~ c(unlist(.x), rep(NA, max(table(df_before$Type))-length(unlist(.x)))))

#> # A tibble: 3 × 6
#>   Level_AGE Level_REGION Level_DRIVERS Estimate_AGE Estimate_REGION
#>   <chr>     <chr>        <chr>                <dbl>           <dbl>
#> 1 18-25     London       1                      1.5               2
#> 2 26-70     Southampton  2                      1                 3
#> 3 <NA>      Newcastle    <NA>                  NA                 1
#> # … with 1 more variable: Estimate_DRIVERS <dbl>

另一种解决方案，基于dplyr:: group_split 和purrr::map_dfc：

library(tidyverse)

df_before %>% 
  mutate(maxn = max(table(.$Type))) %>% 
  group_by(Type) %>% group_split() %>% 
  map_dfc(
    ~ data.frame(c(.x$Level, rep(NA, .x$maxn[1] - nrow(.x))),
      c(.x$Estimate, rep(NA, .x$maxn[1] - nrow(.x)))) %>%
      set_names(c(.x$Type[1], paste0("Estimate_", .x$Type[1])))) %>% 
  type.convert(as.is=T)

#>     AGE Estimate_AGE DRIVERS Estimate_DRIVERS      REGION Estimate_REGION
#> 1 18-25          1.5       1              2.0      London               2
#> 2 26-70          1.0       2              2.5 Southampton               3
#> 3  <NA>           NA      NA               NA   Newcastle               1

【讨论】：

【参考方案2】：

我们可以试试这个：

library(tidyverse)

Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5, 1, 2, 3, 1, 2, 2.5)
df_before <- data.frame(Type, Level, Estimate)

data <-
  df_before %>% group_split(Type)

data <-
  map2(
    data, map(data, ~ unique(.$Type)),
    ~ mutate(., ".y" := Level, "Estimate_.y" := Estimate) %>%
      select(-c("Type", "Level", "Estimate"))
  )

#get the longest number of rows to be able to join the columns
max_rows <- map_dbl(data, nrow) %>%
  max()

#add rows if needed
map_if(
  data, ~ nrow(.) < max_rows,
  ~ rbind(., NA)
) %>%
  bind_cols()
#> # A tibble: 3 × 6
#>   AGE   Estimate_AGE DRIVERS Estimate_DRIVERS REGION      Estimate_REGION
#>   <chr>        <dbl> <chr>              <dbl> <chr>                 <dbl>
#> 1 18-25          1.5 1                    2   London                    2
#> 2 26-70          1   2                    2.5 Southampton               3
#> 3 <NA>          NA   <NA>                NA   Newcastle                 1

^{由reprex package 创建于 2021-12-07 (v2.0.1)}

【讨论】：

【参考方案3】：

df_before %>%
  group_by(Type) %>%
  mutate(id = row_number(), Estimate = as.character(Estimate))%>%
  pivot_longer(-c(Type, id)) %>%
  pivot_wider(id, names_from = c(Type, name))%>%
  type.convert(as.is = TRUE)

# A tibble: 3 x 7
     id AGE_Level AGE_Estimate REGION_Level REGION_Estimate DRIVERS_Level DRIVERS_Estimate
  <int> <chr>            <dbl> <chr>                  <int>         <int>            <dbl>
1     1 18-25              1.5 London                     2             1              2  
2     2 26-70              1   Southampton                3             2              2.5
3     3 NA                NA   Newcastle                  1            NA             NA

在data.table中：

library(data.table)
setDT(df_before)

dcast(melt(df_before, 'Type'), rowid(Type, variable)~Type + variable)

请注意，由于类型不匹配，您会收到很多警告。您可以使用reshape2::melt 来避免这种情况。

无论如何，您的数据帧不是标准格式。

在基础 R >=4.0

transform(df_before, id = ave(Estimate, Type, FUN = seq_along)) |>
  reshape(v.names = c('Level', 'Estimate'), dir = 'wide', timevar = 'Type', sep = "_")

 id Level_AGE Estimate_AGE Level_REGION Estimate_REGION Level_DRIVERS Estimate_DRIVERS
1  1     18-25          1.5       London               2             1              2.0
2  2     26-70          1.0  Southampton               3             2              2.5
5  3      <NA>           NA    Newcastle               1          <NA>               NA

IN基R

reshape(transform(df_before, id = ave(Estimate, Type, FUN = seq_along)),
       v.names = c('Level', 'Estimate'), dir = 'wide', timevar = 'Type', sep = "_")

【讨论】：

【参考方案4】：

更新：

作为所需输出的确切输出：

df_before %>% 
  group_by(Type) %>% 
  mutate(id = row_number()) %>% 
  pivot_wider(
    names_from = Type,
    values_from = c(Level, Estimate)
  ) %>% 
  select(AGE = Level_AGE, Estimate_AGE, REGION = Level_REGION, 
         Estimate_REGION, DRIVERS = Level_DRIVERS, Estimate_DRIVERS) %>% 
  type.convert(as.is=TRUE)

  AGE   Estimate_AGE REGION      Estimate_REGION DRIVERS Estimate_DRIVERS
  <chr>        <dbl> <chr>                 <int>   <int>            <dbl>
1 18-25          1.5 London                    2       1              2  
2 26-70          1   Southampton               3       2              2.5
3 NA            NA   Newcastle                 1      NA             NA

第一个答案：

主要方面是按Type 分组，因为已经提供了 Onyambu 的解决方案。之后我们可以使用pivot_wider：

library(dplyr)
library(tidyr)

df_before %>% 
  group_by(Type) %>% 
  mutate(id = row_number()) %>% 
  pivot_wider(
    names_from = Type,
    values_from = c(Level, Estimate)
  )

     id Level_AGE Level_REGION Level_DRIVERS Estimate_AGE Estimate_REGION Estimate_DRIVERS
  <int> <chr>     <chr>        <chr>                <dbl>           <dbl>            <dbl>
1     1 18-25     London       1                      1.5               2              2  
2     2 26-70     Southampton  2                      1                 3              2.5
3     3 NA        Newcastle    NA                    NA                 1             NA

【讨论】：

以上是关于在R中将行转换为列的主要内容，如果未能解决你的问题，请参考以下文章