R数据存储在一个数据框列的两个列表中,想要创建单独的列

Posted

技术标签:

【中文标题】R数据存储在一个数据框列的两个列表中,想要创建单独的列【英文标题】:R data stored in two lists on one dataframe column, want to create individual columns 【发布时间】:2020-11-09 01:59:34 【问题描述】:

来自 json 文件的 R 数据以列表形式存储在 'rounds' 列中。

> head(leaderboard[,20:22])
  round                         rounds strokes
1    -5 r1, r2, r3, r4, 67, 68, 67, 65     267
2    -7 r1, r2, r3, r4, 70, 70, 66, 63     269
3    -5 r1, r2, r3, r4, 72, 66, 66, 65     269
4    -7 r1, r2, r3, r4, 68, 69, 71, 63     271
5    -5 r1, r2, r3, r4, 72, 70, 65, 65     272
6    -1 r1, r2, r3, r4, 68, 69, 66, 69     272
> leaderboard$rounds[[1]]
  title strokes
1    r1      67
2    r2      68
3    r3      67
4    r4      65

我想把上面的数据做成这样:

round r1 r2 r3 r4 strokes
-5    67 68 67 65 267
-7    70 70 66 63 269

dput 函数的输出:

> dput(head(leaderboard[,20:22]))
structure(list(round = c("-5", "-7", "-5", "-7", "-5", "-1"), 
    rounds = list(structure(list(title = c("r1", "r2", "r3", 
    "r4"), strokes = c("67", "68", "67", "65")), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(title = c("r1", "r2", "r3", "r4"), strokes = c("70", 
    "70", "66", "63")), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(title = c("r1", "r2", "r3", "r4"), strokes = c("72", 
    "66", "66", "65")), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(title = c("r1", "r2", "r3", "r4"), strokes = c("68", 
    "69", "71", "63")), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(title = c("r1", "r2", "r3", "r4"), strokes = c("72", 
    "70", "65", "65")), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(title = c("r1", "r2", "r3", "r4"), strokes = c("68", 
    "69", "66", "69")), class = "data.frame", row.names = c(NA, 
    4L))), strokes = c("267", "269", "269", "271", "272", "272"
    )), row.names = c(NA, 6L), class = "data.frame")

【问题讨论】:

如果我们可以重新创建数据结构会更容易提供帮助:请您复制/粘贴来自dput(head(leaderboard[, 20:22])) 的输出。 是的,我已经编辑过原文 【参考方案1】:

你可以使用:

library(dplyr)
library(tidyr)

leaderboard %>%
  rename(new_strokes = strokes) %>%
  unnest(rounds) %>%
  pivot_wider(names_from = title, values_from = strokes)

#  round new_strokes r1    r2    r3    r4   
#  <chr> <chr>       <chr> <chr> <chr> <chr>
#1 -5    267         67    68    67    65   
#2 -7    269         70    70    66    63   
#3 -5    269         72    66    66    65   
#4 -7    271         68    69    71    63   
#5 -5    272         72    70    65    65   
#6 -1    272         68    69    66    69   

【讨论】:

【参考方案2】:

这是tidyverse 方法

library(dplyr)
library(tidyr)

leaderboard %>% mutate(rounds = lapply(rounds, pivot_wider, names_from = "title", values_from = "strokes")) %>% unnest(rounds)

输出

# A tibble: 6 x 6
  round r1    r2    r3    r4    strokes
  <chr> <chr> <chr> <chr> <chr> <chr>  
1 -5    67    68    67    65    267    
2 -7    70    70    66    63    269    
3 -5    72    66    66    65    269    
4 -7    68    69    71    63    271    
5 -5    72    70    65    65    272    
6 -1    68    69    66    69    272    

【讨论】:

以上是关于R数据存储在一个数据框列的两个列表中,想要创建单独的列的主要内容,如果未能解决你的问题,请参考以下文章

如何在此数据框列的 R 中正确使用 apply?

分别绘制所有 pandas 数据框列

将一些函数应用于列表中的数据框列

迭代获取数据框列的最大值,加一并重复 r 中的所有行

R:分配数据框列的变量标签

根据数据类型获取 pandas 数据框列的列表