展开data.table，使每个ID的每个模式匹配一行

Question

我在data.table中有很多文本数据。我有几个我感兴趣的文本模式。我已设法对表进行子集化，以便显示与至少两个模式匹配的文本（相关问题here）。

我现在希望每个匹配能够有一行，并且有一个标识匹配的附加列 - 所以有多个匹配的行将与该列重复。

感觉这不应该太难，但我正在努力！我模糊的想法可能是计算模式匹配的数量，然后多次复制行...但是我不完全确定如何为每个不同的模式获取标签...（并且也不确定是非常有效率）。

谢谢你的帮助！

示例数据

library(data.table)
library(stringr)
text_table <- data.table(ID = (1:5), 
                         text = c("lucy, sarah and paul live on the same street",
                                  "lucy has only moved here recently",
                                  "lucy and sarah are cousins",
                                  "john is also new to the area",
                                  "paul and john have known each other a long time"))


text_patterns <- as.character(c("lucy", "sarah", "paul|john"))

# Filtering the table to just the IDs with at least two pattern matches
text_table_multiples <- text_table[, Reduce(`+`, lapply(text_patterns, 
                                    function(x) str_detect(text, x))) >1]

理想的输出

required_table <- data.table(ID = c(1, 1, 1, 2, 3, 3, 4, 5),
                             text = c("lucy, sarah and paul live on the same street",
                                      "lucy, sarah and paul live on the same street",
                                      "lucy, sarah and paul live on the same street",
                                      "lucy has only moved here recently",
                                      "lucy and sarah are cousins",
                                      "lucy and sarah are cousins",
                                      "john is also new to the area",
                                      "paul and john have known each other a long time"), 
                             person = c("lucy", "sarah", "paul or john", "lucy", "lucy", "sarah", "paul or john", "paul or john"))

Answer 1

另一答案

展开data.table，使每个ID的每个模式匹配一 行

展开data.table，使每个ID的每个模式匹配一行