多列从宽到长
Posted
技术标签:
【中文标题】多列从宽到长【英文标题】:Wide to long with multiple columns 【发布时间】:2022-01-22 06:34:55 【问题描述】:我正在尝试将我的数据集从宽格式转换为长格式,但它没有按预期工作。我的数据集有列rowid, arrest1, arrest2, ..., arrest10, lien1, lien2, ..., lien10
,看起来像这样:
rowid arrest1 arrest2 ... lien1 lien2 ...
1 1/1/2008 NA 2/2/2009 NA
我正在尝试获取一个长数据集,其中我有一个取值 1-10 的时间变量和包含日期的单独变量 arrest
和 lien
。我尝试了以下代码,但我的时间变量取值 0-9,除了 arrest
和 lien
变量之外,还有 arrest1
和 lien2
。 names_pattern
参数肯定有问题。
df_long <- df_wide %>%
select(rowid, lien1:lien10, arrest1:arrest10) %>%
pivot_longer(-rowid,
names_to = c(".value", "time"),
names_pattern = "(\\w+).*?(\\d1,2)")
以下是一些示例数据:
structure(list(rowid = c(9317L, 31447L, 37939L, 40198L, 19346L
), arrest1 = structure(c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), class = "Date"), arrest2 = structure(c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), class = "Date"), arrest3 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest4 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest5 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest6 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest7 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest8 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest9 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), arrest10 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien1 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien2 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien3 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien4 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien5 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien6 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien7 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien8 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien9 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), lien10 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
【问题讨论】:
【参考方案1】:使用names_sep
(使用正则表达式环视 - names_sep = "(?<=\\D)(?=\\d)"
)或在names_pattern
(names_pattern = "(\\D+)(\\d+)"
)中作为组捕获 - 这里我们将一个或多个非数字(\\D+
)作为一个组(@ 987654326@) 后跟一个或多个数字 (\\d+
) 分别对应于names_to
中传递的向量,即“.value”将是“arrest”、“lien”和“grp”的列的值将使用列名中的后缀数字创建新列)
library(tidyr)
pivot_longer(df_wide, cols = -rowid, names_to = c(".value", "grp"),
names_pattern = "(\\D+)(\\d+)")
-输出
# A tibble: 50 × 4
rowid grp arrest lien
<int> <chr> <date> <date>
1 9317 1 NA NA
2 9317 2 NA NA
3 9317 3 NA NA
4 9317 4 NA NA
5 9317 5 NA NA
6 9317 6 NA NA
7 9317 7 NA NA
8 9317 8 NA NA
9 9317 9 NA NA
10 9317 10 NA NA
# … with 40 more rows
【讨论】:
这非常有效!非常感谢。正则表达式 \D+ 有什么作用?我会尽可能接受你的回答。 @user122514 我添加了一些解释以上是关于多列从宽到长的主要内容,如果未能解决你的问题,请参考以下文章
R语言使用reshape2包的melt函数将dataframe从宽表到长表(Wide- to long-format)如果没有指定行标识符号,则所有的字段都会放入variable变量中
R语言使用reshape2包的melt函数将dataframe从宽表到长表(Wide- to long-format)指定行标识符变量并自定义生成的长表的标识符列的名称