删除R中包含冒号的字符串
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了删除R中包含冒号的字符串相关的知识,希望对你有一定的参考价值。
这是我的数据集的示例摘录。它看起来如下:
Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id_234;2018/03/02
我想删除那些包含冒号的单词。在这种情况下,这将是wa119:d,ax21:3和bC230:13,以便我的新数据集应如下所示:
Description;ID;Date
Here comes the first row;id_112;2018/03/02
Here comes the second row;id_115;2018/03/02
Here comes the third row;id_234;2018/03/02
不幸的是,我无法使用gsub找到正则表达式/解决方案?有人可以帮忙吗?
答案
这是一种方法:
## reading in yor data
dat <- read.table(text ='
Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id:234;2018/03/02
', sep = ';', header = TRUE, stringsAsFactors = FALSE)
## \w+ = one or more word characters
gsub('\w+:\w+\s+', '', dat$Description)
## [1] "Here comes the first row"
## [2] "Here comes the second row"
## [3] "Here comes the third row"
关于\w
的更多信息,这是一个与[A-Za-z0-9_]
相同的速记字符类:https://www.regular-expressions.info/shorthand.html
另一答案
假设您要修改的列是dat
:
dat <- c("wa119:d Here comes the first row",
"ax21:3 Here comes the second row",
"bC230:13 Here comes the third row")
然后你可以把每个元素,分成单词,删除包含冒号的单词,然后将左边的内容粘贴在一起,产生你想要的东西:
dat_colon_words_removed <- unlist(lapply(dat, function(string){
words <- strsplit(string, split=" ")[[1]]
words <- words[!grepl(":", words)]
paste(words, collapse=" ")
}))
另一答案
另一个与OP的预期结果完全匹配的解决方案可能是:
#data
df <- read.table(text = "Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id:234;2018/03/02", stringsAsFactors = FALSE, sep="
")
gsub("[a-zA-Z0-9]+:[a-zA-Z0-9]+\s", "", df$V1)
#[1] "Description;ID;Date"
#[2] "Here comes the first row;id_112;2018/03/02"
#[3] "Here comes the second row;id_115;2018/03/02"
#[4] "Here comes the third row;id:234;2018/03/02"
以上是关于删除R中包含冒号的字符串的主要内容,如果未能解决你的问题,请参考以下文章