在 R 中，如何将文本环绕在字符串中的所有单词周围，但要包含特定的单词（从左到右）？迭代和字符串操作

Posted 2023-02-22

技术标签:

【中文标题】在 R 中，如何将文本环绕在字符串中的所有单词周围，但要包含特定的单词（从左到右）？迭代和字符串操作【英文标题】：In R, how do I wrap text around all words in a string, but a specific one(going from left to right)? Iteration and string manipulation 【发布时间】：2018-03-25 07:33:23 【问题描述】：

我知道我的问题有点含糊，所以我举了一个例子来说明我正在尝试做的事情。

input <- c('I go to school')

#Output
'"I " * phantom("go to school")' 
'phantom("I ") * "go" * phantom("to school")'
'phantom("I go ") * "to" * phantom("school")'
'phantom("I go to ") * "school"'

我编写了一个函数，但在弄清楚如何使其适用于具有不同字数的字符串时遇到了很多麻烦，而且我不知道如何包含迭代以减少复制代码。它确实会生成上面的输出。

现在我的函数只适用于包含 4 个单词的字符串。它还包括不迭代。

我的主要问题是：如何将迭代包含在我的函数中？如何使它适用于任意数量的单词？

add_phantom <- function(stuff)
  strings <- c()
  stuff <- str_split(stuff, ' ')
  strings[1] <- str_c('"', stuff[[1]][[1]], ' "', ' * ', 
                 'phantom("', str_c(stuff[[1]][[2]], stuff[[1]][[3]], stuff[[1]][[4]], sep = ' '), '")')


  strings[2] <- str_c('phantom("', stuff[[1]][[1]], ' ")',
                      ' * "', stuff[[1]][[2]], '" * ', 
                      'phantom("', str_c(stuff[[1]][[3]], stuff[[1]][[4]], sep = ' '), '")')

  strings[3] <- str_c('phantom("', str_c(stuff[[1]][[1]], stuff[[1]][[2]], sep = ' '), ' ")',
                      ' * "', stuff[[1]][[3]], '" * ',
                      'phantom("', stuff[[1]][[4]], '")')

  strings[4] <- str_c('phantom("', str_c(stuff[[1]][[1]], stuff[[1]][[2]], stuff[[1]][[3]], sep = ' '), ' ")', 
                      ' * "', stuff[[1]][[4]], '"')
  return(strings)

【问题讨论】：

【参考方案1】：

这是一些屠夫工作，但它给出了预期的输出:)：

input <- c('I go to school')
library(purrr)
inp          <- c(list(NULL),strsplit(input," ")[[1]])
phantomize <- function(x,leftside = T)
 if(length(x)==1) return("")
 if(leftside)
   ph <- paste0('phantom("',paste(x[-1],collapse=" "),' ") * ') else
   ph <- paste0(' * phantom("',paste(x[-1],collapse=" "),'")')
 ph

map(1:(length(inp)-1),
    ~paste0(phantomize(inp[1:.x]),
            inp[[.x+1]],
            phantomize(inp[(.x+1):length(inp)],F)))

# [[1]]
# [1] "I * phantom(\"go to school\")"
# 
# [[2]]
# [1] "phantom(\"I \") * go * phantom(\"to school\")"
# 
# [[3]]
# [1] "phantom(\"I go \") * to * phantom(\"school\")"
# 
# [[4]]
# [1] "phantom(\"I go to \") * school"

【讨论】：

【参考方案2】：

这有点小技巧，但我认为它可以满足您的要求：

library(corpus)
input <- 'I go to school'
types <- text_types(input, collapse = TRUE) # all word types
(loc <- text_locate(input, types)) # locate all word types, get context
##   text             before              instance              after              
## 1 1                                       I      go to school                   
## 2 1                                 I     go     to school                      
## 3 1                              I go     to     school                         
## 4 1                           I go to   school

返回值是一个数据框，列类型为corpus_text。这种方法看起来很疯狂，但它实际上并没有为before 和after 上下文分配新字符串（两者的类型都是corpus_text）

这是你想要的输出：

paste0("phantom(", loc$before, ") *", loc$instance, "* phantom(", loc$after, ")")
## [1] "phantom() *I* phantom( go to school)"
## [2] "phantom(I ) *go* phantom( to school)"
## [3] "phantom(I go ) *to* phantom( school)"
## [4] "phantom(I go to ) *school* phantom()"

如果你真的想疯狂并忽略标点符号：

phantomize <- function(input, ...) 
    types <- text_types(input, collapse = TRUE, ...)
    loc <- text_locate(input, types, ...)
    paste0("phantom(", loc$before, ") *", loc$instance, "* phantom(",
           loc$after, ")")


phantomize("I! go to school (?)...don't you?", drop_punct = TRUE)
## [1] "phantom() *I* phantom(! go to school (?)...don't you?)"
## [2] "phantom(I! ) *go* phantom( to school (?)...don't you?)"
## [3] "phantom(I! go ) *to* phantom( school (?)...don't you?)"
## [4] "phantom(I! go to ) *school* phantom( (?)...don't you?)"
## [5] "phantom(I! go to school (?)...) *don't* phantom( you?)"
## [6] "phantom(I! go to school (?)...don't ) *you* phantom(?)"

【讨论】：

【参考方案3】：

我会建议这样的事情

library(tidyverse)
library(glue)

test_string <- "i go to school"

str_split(test_string, " ") %>% 
  map(~str_split(test_string, .x, simplify = T)) %>% 
  flatten() %>%
  map(str_trim) %>%
  keep(~.x != "") %>% 
  map(~glue("phantom(string)", string = .x))

这段代码 sn-p 可以很容易地在一个函数中实现，并且会返回以下输出。

[[1]]
phantom(i)

[[2]]
phantom(i go)

[[3]]
phantom(i go to)

[[4]]
phantom(go to school)

[[5]]
phantom(to school)

[[6]]
phantom(school)

我可能误解了您的问题——我不太确定您是否真的希望输出具有与示例输出中相同的格式。

【讨论】：

以上是关于在 R 中，如何将文本环绕在字符串中的所有单词周围，但要包含特定的单词（从左到右）？迭代和字符串操作的主要内容，如果未能解决你的问题，请参考以下文章