在R中,如何提取所有文本直到左括号?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在R中,如何提取所有文本直到左括号?相关的知识,希望对你有一定的参考价值。
在数据框的“获胜者”列中,我想删除所有从左括号开始的文本。
搜索stackoverflow.com,我发现this response,并且在代码中应用了它的更严格的解决方案,但是它不起作用。我的代码未更改我的输入。
我将很乐意为您解决此问题。
输入:
Year Lg Winner Team
1956 NL Don Newcombe (1 | MVP) Brooklyn (1)
1957 NL Warren Spahn (1 | HOF | ASG) Milwaukee (1)
1958 AL Bob Turley (1 | ASG) New York (1)
这是我希望输出显示的样子:
Year Lg Winner Team
1956 NL Don Newcombe Brooklyn (1)
1957 NL Warren Spahn Milwaukee (1)
1958 AL Bob Turley New York (1)
dput(dfx):
structure(list(Year = 1956:1958, Lg = structure(c(2L, 2L, 1L), .Label = c("AL",
"NL"), class = "factor"), Winner = structure(c(2L, 3L, 1L), .Label = c("Bob Turley (1 | ASG)",
"Don Newcombe (1 | MVP)", "Warren Spahn (1 | HOF | ASG)"
), class = "factor"), Team = structure(1:3, .Label = c("Brooklyn (1)",
"Milwaukee (1)", "New York (1)"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
代码:
library(stringr)
dfnoparens <- dfx
str_replace(dfnoparens$Winner, " \\(.*\\)", "")
head(dfnoparens)
答案
带有问题中的测试数据(仅相关列)。
x <- c('Don Newcombe (1 | MVP)', 'Warren Spahn (1 | HOF | ASG)', 'Bob Turley (1 | ASG)')
使用regexpr/regmatches
。
m <- regexpr('^[^\\(]*', x)
y <- regmatches(x, m)
y
#[1] "Don Newcombe " "Warren Spahn " "Bob Turley "
此输出字符串在左括号之前仍然有空白,如果需要,请立即将其删除。
trimws(y)
#[1] "Don Newcombe" "Warren Spahn" "Bob Turley"
另一答案
df <- structure(list(Year = 1956:1958,
Lg = structure(c(2L, 2L, 1L), .Label = c("AL", "NL"), class = "factor"),
Winner = structure(c(2L, 3L, 1L),
.Label = c("Bob Turley (1 | ASG)", "Don Newcombe (1 | MVP)",
"Warren Spahn (1 | HOF | ASG)"), class = "factor"),
Team = structure(1:3, .Label = c("Brooklyn (1)", "Milwaukee (1)", "New York (1)"),
class = "factor")), class = "data.frame", row.names = c(NA,-3L))
这里是strsplit
解决方案。
df$Winner <- unlist(lapply(strsplit(as.character(df$Winner)," (",fixed=TRUE), `[[`, 1))
df
Year Lg Winner Team
1 1956 NL Don Newcombe Brooklyn (1)
2 1957 NL Warren Spahn Milwaukee (1)
3 1958 AL Bob Turley New York (1)
另一答案
我们可以将trimws
与whitespace
一起使用
trimws(x, whitespace = "\\s*\\(.*")
#[1] "Don Newcombe" "Warren Spahn" "Bob Turley"
数据
x <- c('Don Newcombe (1 | MVP)', 'Warren Spahn (1 | HOF | ASG)', 'Bob Turley (1 | ASG)')
另一答案
使用str_extract
库中的stringr
:
df$Winner <- str_extract(df$Winner, ".*(?=\\s\\(\\d)")
此解决方案在(?=...)
中使用正向超前;可以将前瞻表述为“匹配在空白(.*
)之前出现的任何内容(\\s
),后跟一个圆括号(\\(
)和一个数字(\\d
)”。
结果:
df
Year Lg Winner Team
1 1956 NL Don Newcombe Brooklyn (1)
2 1957 NL Warren Spahn Milwaukee (1)
3 1958 AL Bob Turley New York (1)
以上是关于在R中,如何提取所有文本直到左括号?的主要内容,如果未能解决你的问题,请参考以下文章