如何从单词列表中查找 DF 中的匹配单词并在新列中返回匹配的单词 [重复]

Posted

技术标签:

【中文标题】如何从单词列表中查找 DF 中的匹配单词并在新列中返回匹配的单词 [重复]【英文标题】:How to find matching words in a DF from list of words and returning the matched words in new column [duplicate] 【发布时间】:2018-10-13 12:25:44 【问题描述】:

我有一个 2 列的 DF,我有一个单词列表。

list_of_words <- c("tiger","elephant","rabbit", "hen", "dog", "Lion", "camel", "horse")

df <- tibble::tibble(page=c(12,6,9,18,2,15,81,65),
               text=c("I have two pets: a dog and a hen",
                      "lion and Tiger are dangerous animals",
                      "I have tried to ride a horse",
                      "Why elephants are so big in size",
                      "dogs are very loyal pets",
                      "I saw a tiger in the zoo",
                      "the lion was eating a buffalo",
                      "parrot and crow are very clever birds"))

animals <- c("dog,hen", "lion,tiger", "horse", FALSE, "dog", "tiger", "lion", FALSE)

cbind(df, animals)
#>   page                                  text    animals
#> 1   12      I have two pets: a dog and a hen    dog,hen
#> 2    6  lion and Tiger are dangerous animals lion,tiger
#> 3    9          I have tried to ride a horse      horse
#> 4   18      Why elephants are so big in size      FALSE
#> 5    2              dogs are very loyal pets        dog
#> 6   15              I saw a tiger in the zoo      tiger
#> 7   81         the lion was eating a buffalo       lion
#> 8   65 parrot and crow are very clever birds      FALSE

我需要找出列表中的任何单词是否存在于 DF 的一列中。如果是,则将单词/单词返回到 DF 中的新列。这是单词列表 ->(tiger,elephant,rabbit, hen, dog, Lion, camel, horse)。 This is how my DF Looks like I want something like this

【问题讨论】:

请将您的示例数据添加为代码,而不是图像。 是的,部分正确。但我想从列表中找出哪些匹配的单词出现在 DF 中,并将这些单词返回到同一 DF 的新列中。 这 4 个步骤将起作用:首先在您的列上使用 strsplit df$text" " 作为拆分参数,就像这样 test &lt;- strsplit(df$text, " ")。然后使用grepltolower 得到与你的向量匹配的词:test2 &lt;- lapply(test, function(x) x[grepl(tolower(paste(words, collapse = "|")), tolower(x))])。现在将它们放在每一行中,并使用df$animals &lt;- unlist(lapply(test2, paste, collapse = ", ")) 取消列出它们,然后使用df$animals[nchar(df$animals) == 0] &lt;- FALSE 将所有空字符设置为FALSE @LAP 不起作用 【参考方案1】:
library(dplyr)

df %>% 
  rowwise() %>%
  mutate(animals = paste(list_of_words[unlist(
    lapply(list_of_words, function(x) grepl(x, text, ignore.case = T)))], collapse=",")) %>%
  data.frame()

输出为:

  page                                  text    animals
1   12                       pets: dog & hen    hen,dog
2    6 Lions and tigers are dangerous animal tiger,Lion
3    9          I have tried to ride a horse      horse
4   65   parrot & crow are very clever birds           

样本数据:

df <- structure(list(page = c(12, 6, 9, 65), text = structure(c(4L, 
2L, 1L, 3L), .Label = c("I have tried to ride a horse", "Lions and tigers are dangerous animal", 
"parrot & crow are very clever birds", "pets: dog & hen"), class = "factor")), .Names = c("page", 
"text"), row.names = c(NA, -4L), class = "data.frame")

list_of_words <- c("tiger", "elephant", "rabbit", "hen", "dog", "Lion", "camel", "horse")
**另一种方法:**
library(data.table)
setDT(df)[, animals := paste(list_of_words[unlist(lapply(list_of_words, function(x) grepl(x, text, ignore.case = T)))], collapse = ","), by = 1:nrow(df)]

#> df
#   page                                  text    animals
#1:   12                       pets: dog & hen    hen,dog
#2:    6 Lions and tigers are dangerous animal tiger,Lion
#3:    9          I have tried to ride a horse      horse
#4:   65   parrot & crow are very clever birds           

【讨论】:

以上是关于如何从单词列表中查找 DF 中的匹配单词并在新列中返回匹配的单词 [重复]的主要内容,如果未能解决你的问题,请参考以下文章

如何迭代数据框中的行以检测不同的单词并将其保存在新列中?

熊猫:循环列表并从列中的列表中查找单词...使用列表中的找到的单词创建新列

pandas:查找部分字符串并在新列中使用它

在Pandas Dataframe列中查找某些单词,如果找到,则将它们添加到新列中

根据每个句子的第一个单词将 pandas 数据框列中的字符串列表分解为新列

Pandas - 使用 PostCoder 在每一行中查找纬度和经度,然后在新列中返回 Postcode