多列从宽到长

Posted

技术标签:

【中文标题】多列从宽到长【英文标题】:Wide to long with multiple columns 【发布时间】:2022-01-22 06:34:55 【问题描述】:

我正在尝试将我的数据集从宽格式转换为长格式,但它没有按预期工作。我的数据集有列rowid, arrest1, arrest2, ..., arrest10, lien1, lien2, ..., lien10,看起来像这样:

rowid   arrest1   arrest2   ...   lien1     lien2   ...
1       1/1/2008  NA              2/2/2009  NA

我正在尝试获取一个长数据集,其中我有一个取值 1-10 的时间变量和包含日期的单独变量 arrestlien。我尝试了以下代码,但我的时间变量取值 0-9,除了 arrestlien 变量之外,还有 arrest1lien2names_pattern 参数肯定有问题。

df_long <- df_wide %>%
  select(rowid, lien1:lien10, arrest1:arrest10) %>%
  pivot_longer(-rowid,
               names_to = c(".value", "time"),
               names_pattern =  "(\\w+).*?(\\d1,2)")

以下是一些示例数据:

structure(list(rowid = c(9317L, 31447L, 37939L, 40198L, 19346L
), arrest1 = structure(c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), class = "Date"), arrest2 = structure(c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), class = "Date"), arrest3 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest4 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest5 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest6 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest7 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest8 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest9 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest10 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien1 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien2 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien3 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien4 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien5 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien6 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien7 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien8 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien9 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien10 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】:

【参考方案1】:

使用names_sep(使用正则表达式环视 - names_sep = "(?&lt;=\\D)(?=\\d)")或在names_patternnames_pattern = "(\\D+)(\\d+)")中作为组捕获 - 这里我们将一个或多个非数字(\\D+)作为一个组(@ 987654326@) 后跟一个或多个数字 (\\d+) 分别对应于names_to 中传递的向量,即“.value”将是“arrest”、“lien”和“grp”的列的值将使用列名中的后缀数字创建新列)

library(tidyr)
pivot_longer(df_wide, cols = -rowid, names_to = c(".value", "grp"), 
     names_pattern = "(\\D+)(\\d+)")

-输出

# A tibble: 50 × 4
   rowid grp   arrest lien  
   <int> <chr> <date> <date>
 1  9317 1     NA     NA    
 2  9317 2     NA     NA    
 3  9317 3     NA     NA    
 4  9317 4     NA     NA    
 5  9317 5     NA     NA    
 6  9317 6     NA     NA    
 7  9317 7     NA     NA    
 8  9317 8     NA     NA    
 9  9317 9     NA     NA    
10  9317 10    NA     NA    
# … with 40 more rows

【讨论】:

这非常有效!非常感谢。正则表达式 \D+ 有什么作用?我会尽可能接受你的回答。 @user122514 我添加了一些解释

以上是关于多列从宽到长的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery 使用从宽到长的结构重塑表

R,通过提取前缀从宽到长旋转。整齐划一

R语言使用reshape2包的melt函数将dataframe从宽表到长表(Wide- to long-format)如果没有指定行标识符号,则所有的字段都会放入variable变量中

R语言使用reshape2包的melt函数将dataframe从宽表到长表(Wide- to long-format)指定行标识符变量并自定义生成的长表的标识符列的名称

DataGridView 添加多列

MySQL关联表多行转多列?