在R中将行转换为列
Posted
技术标签:
【中文标题】在R中将行转换为列【英文标题】:Convert rows into columns in R 【发布时间】:2022-01-12 18:04:20 【问题描述】:我有这个示例数据集,我想将其转换为以下格式:
Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5,1,2,3,1,2,2.5)
df_before <- data.frame(Type, Level, Estimate)
Type Level Estimate
1 AGE 18-25 1.5
2 AGE 26-70 1.0
3 REGION London 2.0
4 REGION Southampton 3.0
5 REGION Newcastle 1.0
6 DRIVERS 1 2.0
7 DRIVERS 2 2.5
基本上,我想将数据集转换为以下格式。我已经尝试使用函数dcast()
,但它似乎不起作用。
AGE Estimate_AGE REGION Estimate_REGION DRIVERS Estimate_DRIVERS
1 18-25 1.5 London 2 1 2.0
2 26-70 1.0 Southampton 3 2 2.5
3 <NA> NA Newcastle 1 <NA> NA
【问题讨论】:
这能回答你的问题吗? How to reshape data from long to wide format 不是我的数据集格式不同 您可能想要重构您的数据,因为在同一列中混合使用字符串和数值并不好。 【参考方案1】:基于tidyr::pivot_wider
和purrr::map_dfc
的解决方案:
library(tidyverse)
Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5,1,2,3,1,2,2.5)
df_before <- data.frame(Type, Level, Estimate)
df_before %>%
pivot_wider(names_from=Type, values_from=c(Level, Estimate), values_fn=list) %>%
map_dfc(~ c(unlist(.x), rep(NA, max(table(df_before$Type))-length(unlist(.x)))))
#> # A tibble: 3 × 6
#> Level_AGE Level_REGION Level_DRIVERS Estimate_AGE Estimate_REGION
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 18-25 London 1 1.5 2
#> 2 26-70 Southampton 2 1 3
#> 3 <NA> Newcastle <NA> NA 1
#> # … with 1 more variable: Estimate_DRIVERS <dbl>
另一种解决方案,基于dplyr:: group_split
和purrr::map_dfc
:
library(tidyverse)
df_before %>%
mutate(maxn = max(table(.$Type))) %>%
group_by(Type) %>% group_split() %>%
map_dfc(
~ data.frame(c(.x$Level, rep(NA, .x$maxn[1] - nrow(.x))),
c(.x$Estimate, rep(NA, .x$maxn[1] - nrow(.x)))) %>%
set_names(c(.x$Type[1], paste0("Estimate_", .x$Type[1])))) %>%
type.convert(as.is=T)
#> AGE Estimate_AGE DRIVERS Estimate_DRIVERS REGION Estimate_REGION
#> 1 18-25 1.5 1 2.0 London 2
#> 2 26-70 1.0 2 2.5 Southampton 3
#> 3 <NA> NA NA NA Newcastle 1
【讨论】:
【参考方案2】:我们可以试试这个:
library(tidyverse)
Type <- c("AGE", "AGE", "REGION", "REGION", "REGION", "DRIVERS", "DRIVERS")
Level <- c("18-25", "26-70", "London", "Southampton", "Newcastle", "1", "2")
Estimate <- c(1.5, 1, 2, 3, 1, 2, 2.5)
df_before <- data.frame(Type, Level, Estimate)
data <-
df_before %>% group_split(Type)
data <-
map2(
data, map(data, ~ unique(.$Type)),
~ mutate(., ".y" := Level, "Estimate_.y" := Estimate) %>%
select(-c("Type", "Level", "Estimate"))
)
#get the longest number of rows to be able to join the columns
max_rows <- map_dbl(data, nrow) %>%
max()
#add rows if needed
map_if(
data, ~ nrow(.) < max_rows,
~ rbind(., NA)
) %>%
bind_cols()
#> # A tibble: 3 × 6
#> AGE Estimate_AGE DRIVERS Estimate_DRIVERS REGION Estimate_REGION
#> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 18-25 1.5 1 2 London 2
#> 2 26-70 1 2 2.5 Southampton 3
#> 3 <NA> NA <NA> NA Newcastle 1
由reprex package 创建于 2021-12-07 (v2.0.1)
【讨论】:
【参考方案3】:df_before %>%
group_by(Type) %>%
mutate(id = row_number(), Estimate = as.character(Estimate))%>%
pivot_longer(-c(Type, id)) %>%
pivot_wider(id, names_from = c(Type, name))%>%
type.convert(as.is = TRUE)
# A tibble: 3 x 7
id AGE_Level AGE_Estimate REGION_Level REGION_Estimate DRIVERS_Level DRIVERS_Estimate
<int> <chr> <dbl> <chr> <int> <int> <dbl>
1 1 18-25 1.5 London 2 1 2
2 2 26-70 1 Southampton 3 2 2.5
3 3 NA NA Newcastle 1 NA NA
在data.table中:
library(data.table)
setDT(df_before)
dcast(melt(df_before, 'Type'), rowid(Type, variable)~Type + variable)
请注意,由于类型不匹配,您会收到很多警告。您可以使用reshape2::melt
来避免这种情况。
无论如何,您的数据帧不是标准格式。
在基础 R >=4.0
transform(df_before, id = ave(Estimate, Type, FUN = seq_along)) |>
reshape(v.names = c('Level', 'Estimate'), dir = 'wide', timevar = 'Type', sep = "_")
id Level_AGE Estimate_AGE Level_REGION Estimate_REGION Level_DRIVERS Estimate_DRIVERS
1 1 18-25 1.5 London 2 1 2.0
2 2 26-70 1.0 Southampton 3 2 2.5
5 3 <NA> NA Newcastle 1 <NA> NA
IN基R
reshape(transform(df_before, id = ave(Estimate, Type, FUN = seq_along)),
v.names = c('Level', 'Estimate'), dir = 'wide', timevar = 'Type', sep = "_")
【讨论】:
【参考方案4】:更新:
作为所需输出的确切输出:
df_before %>%
group_by(Type) %>%
mutate(id = row_number()) %>%
pivot_wider(
names_from = Type,
values_from = c(Level, Estimate)
) %>%
select(AGE = Level_AGE, Estimate_AGE, REGION = Level_REGION,
Estimate_REGION, DRIVERS = Level_DRIVERS, Estimate_DRIVERS) %>%
type.convert(as.is=TRUE)
AGE Estimate_AGE REGION Estimate_REGION DRIVERS Estimate_DRIVERS
<chr> <dbl> <chr> <int> <int> <dbl>
1 18-25 1.5 London 2 1 2
2 26-70 1 Southampton 3 2 2.5
3 NA NA Newcastle 1 NA NA
第一个答案:
主要方面是按Type
分组,因为已经提供了 Onyambu 的解决方案。之后我们可以使用pivot_wider
:
library(dplyr)
library(tidyr)
df_before %>%
group_by(Type) %>%
mutate(id = row_number()) %>%
pivot_wider(
names_from = Type,
values_from = c(Level, Estimate)
)
id Level_AGE Level_REGION Level_DRIVERS Estimate_AGE Estimate_REGION Estimate_DRIVERS
<int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1 18-25 London 1 1.5 2 2
2 2 26-70 Southampton 2 1 3 2.5
3 3 NA Newcastle NA NA 1 NA
【讨论】:
以上是关于在R中将行转换为列的主要内容,如果未能解决你的问题,请参考以下文章