数据框字符串拆分为列而不是行[重复]
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据框字符串拆分为列而不是行[重复]相关的知识,希望对你有一定的参考价值。
这个问题在这里已有答案:
我有这种格式的数据框:
df <- data.frame(names= c('perform data cleansing','information categorisation'))
names
1 perform data cleansing
2 information categorisation
我想获得这种格式:
names tokens
1 perform data cleansing perform
1 perform data cleansing data
1 perform data cleansing cleansing
2 information categorisation information
2 information categorisation categorisation
答案
我喜欢tidyr::unnest
:
library(dplyr)
library(tidyr)
df %>% mutate(tokens = strsplit(as.character(names), split = " ")) %>%
unnest()
# names tokens
# 1 perform data cleansing perform
# 2 perform data cleansing data
# 3 perform data cleansing cleansing
# 4 information categorisation information
# 5 information categorisation categorisation
但你也可以在base
做到这一切:
tokens = strsplit(as.character(df$names), split = " ")
result = data.frame(names = rep(df$names, lengths(tokens)),
tokens = unlist(tokens),
stringsAsFactors = FALSE)
# names tokens
# 1 perform data cleansing perform
# 2 perform data cleansing data
# 3 perform data cleansing cleansing
# 4 information categorisation information
# 5 information categorisation categorisation
一个带有额外功能进行文本分析的版本是tidytext::unnest_tokens
:
df$names = as.character(df$names)
tidytext::unnest_tokens(df, output = tokens, input = names, drop = FALSE)
# names tokens
# 1 perform data cleansing perform
# 1.1 perform data cleansing data
# 1.2 perform data cleansing cleansing
# 2 information categorisation information
# 2.1 information categorisation categorisation
以上是关于数据框字符串拆分为列而不是行[重复]的主要内容,如果未能解决你的问题,请参考以下文章