R:在列表中模式的grepl之后粘贴并组合来自ifelse的多个输出
Posted
技术标签:
【中文标题】R:在列表中模式的grepl之后粘贴并组合来自ifelse的多个输出【英文标题】:R: Paste and combine multiple outputs from ifelse after grepl of patterns in list 【发布时间】:2022-01-16 06:39:51 【问题描述】:我尝试搜索与我的问题类似的帖子,但找不到。
我的目标是将 df1 中 name 列的单元格(如果有多个,用“_”分隔) 合并到一个新列 pasteHere,通过将 df1$order 中的字符串(我使用 grepl)匹配到 df2$ref。
这是一个大数据框,所以我包含了 for 循环来循环每一行。
我不确定错误是否来自循环,grepl,还是在这种情况下根本无法组合多个项目?
首先,虚拟数据:
## dummy data
df1 <- data.frame(ggplot2::msleep[c(1:10),c(1:5)])
df2 <- data.frame(ref = unique(df1$order), pasteHere = NA)
## how the dfs look like:
> df1
name genus vore order conservation
1 Cheetah Acinonyx carni Carnivora lc
2 Owl monkey Aotus omni Primates <NA>
3 Mountain beaver Aplodontia herbi Rodentia nt
4 Greater short-tailed shrew Blarina omni Soricomorpha lc
5 Cow Bos herbi Artiodactyla domesticated
6 Three-toed sloth Bradypus herbi Pilosa <NA>
7 Northern fur seal Callorhinus carni Carnivora vu
8 Vesper mouse Calomys <NA> Rodentia <NA>
9 Dog Canis carni Carnivora domesticated
10 Roe deer Capreolus herbi Artiodactyla lc
> df2
ref pasteHere
1 Carnivora NA
2 Primates NA
3 Rodentia NA
4 Soricomorpha NA
5 Artiodactyla NA
6 Pilosa NA
您可以看到Canivora、Rodentia和Artiodactyla分别以df1$的顺序出现了3次、2次和2次。 p>
现在,通过将 df1$order 匹配到 df2$ref,我想将 df1$name 粘贴到 df2$pasteHere 并使用“_”将它们与多个匹配项结合起来。我在使用 R for-loop 方面仍然缺乏经验。
以下是我失败的尝试:
## my failed attempt:
for(i in 1:length(df2$ref))
for(j in df2$ref)
df2$pasteHere[i] <- ifelse(grepl(df2$ref==j, df1$order), paste(df1$name, collapse="_"), "NA")
从 grepl 给出以下警告:
> warnings()[1:5]
Warning messages:
1: In grepl(df2$ref == j, df1$order) :
argument 'pattern' has length > 1 and only the first element will be used
2: In df2$pasteHere[i] <- ifelse(grepl(df2$ref == j, df1$order), ... :
number of items to replace is not a multiple of replacement length
3: In grepl(df2$ref == j, df1$order) :
argument 'pattern' has length > 1 and only the first element will be used
4: In df2$pasteHere[i] <- ifelse(grepl(df2$ref == j, df1$order), ... :
number of items to replace is not a multiple of replacement length
5: In grepl(df2$ref == j, df1$order) :
argument 'pattern' has length > 1 and only the first element will be used
我希望我的最终数据框是什么样的:
> final_df
ref pasteHere
1 Carnivora Cheetah_Northern fur seal_Dog
2 Primates Owl monkey
3 Rodentia Mountain beaver_Vesper mouse
4 Soricomorpha Greater short-tailed shrew
5 Artiodactyla Cow_Roe deer
6 Pilosa Three-toed sloth
我不确定问题是否来自粘贴多个项目。请指教。其他解决方案也可以! :)
---------------更新:--------------- --------------
更新原因:
上面的虚拟数据对于我的预期问题来说过于简化,下面更新了更适合我当前情况的新虚拟数据:
df1 <- data.frame(ggplot2::msleep[c(1:10),c(1,4)])
order_longString <- list(c("eeny", "Carnivora", "meeny"),
c("Primates", "miny", "moe"),
c("catch","a","tiger","Rodentia"),
c("by","the","toe","Soricomorpha","If"),
c("he","Artiodactyla","hollers"),
c("let","Pilosa"),
c("him","go","Carnivora"),
c("eenie","Rodentia","minie","money","more"),
c("Carnivora","catch"),
c("a","piggy","Artiodactyla","by","the","snout"))
df1$order_longString <- order_longString
df2 = data.frame(ref = unique(df1$order), pasteHere = NA)
## Updated df looks like this:
> df1
name order order_longString
1 Cheetah Carnivora eeny, Carnivora, meeny
2 Owl monkey Primates Primates, miny, moe
3 Mountain beaver Rodentia catch, a, tiger, Rodentia
4 Greater short-tailed shrew Soricomorpha by, the, toe, Soricomorpha, If
5 Cow Artiodactyla he, Artiodactyla, hollers
6 Three-toed sloth Pilosa let, Pilosa
7 Northern fur seal Carnivora him, go, Carnivora
8 Vesper mouse Rodentia eenie, Rodentia, minie, money, more
9 Dog Carnivora Carnivora, catch
10 Roe deer Artiodactyla a, piggy, Artiodactyla, by, the, snout
> df2 # remain the same
ref pasteHere
1 Carnivora NA
2 Primates NA
3 Rodentia NA
4 Soricomorpha NA
5 Artiodactyla NA
6 Pilosa NA
现在,让我们看看 df1$order_longString。它是一个长字符串,字符串数量不等,每个字符用“,”分隔。我需要将 df2$ref 模式与 df1$order_longString 中的字符串匹配。这就是我使用 grepl 的原因。
然后,如上所述,一旦模式匹配,然后将行的 df1$name 粘贴到 df2$pasteHere 并使用“_”将多次出现的行合并。
希望我说清楚了!
【问题讨论】:
【参考方案1】:在这种情况下你甚至需要第二个 df 吗?
我建议使用 data.table:
library(data.table)
df1 = data.table(ggplot2::msleep[c(1:10),c(1:5)])
df_final = df1[, .(pasteHere = str_c(name, collapse = "_")), by=order]
输出:
> df_final
order pasteHere
1: Carnivora Cheetah_Northern fur seal_Dog
2: Primates Owl monkey
3: Rodentia Mountain beaver_Vesper mouse
4: Soricomorpha Greater short-tailed shrew
5: Artiodactyla Cow_Roe deer
6: Pilosa Three-toed sloth
如果您需要合并的 df,您可以这样做:
library(data.table)
df1 = data.table(ggplot2::msleep[c(1:10),c(1:5)])
df2 = data.table(ref = unique(df1$order), pasteHere = NA)
df1 = df1[, .(pasteHere = str_c(name, collapse = "_")), by=order]
df_final = merge(df2[, c("ref")], df1, by.x="ref", by.y="order")
输出:
ref pasteHere
1: Artiodactyla Cow_Roe deer
2: Carnivora Cheetah_Northern fur seal_Dog
3: Pilosa Three-toed sloth
4: Primates Owl monkey
5: Rodentia Mountain beaver_Vesper mouse
6: Soricomorpha Greater short-tailed shrew
【讨论】:
感谢您这么快回复!我意识到我使用过度简化的虚拟数据犯了一个大错误。这不完全是我的意图,我需要使虚拟数据更类似于我的情况。您是否建议我使用下面的“回答您的问题”按钮重新发布我的问题,或直接编辑帖子?对不起,我在这个网站上太新了 只需在您的初始帖子中创建一个“更新:”或“编辑:”块 :-)以上是关于R:在列表中模式的grepl之后粘贴并组合来自ifelse的多个输出的主要内容,如果未能解决你的问题,请参考以下文章
R - 子集 - 基于列值的 grepl 选择排除行 [重复]
R语言应用substr函数和substring函数抽取(extract)删除(Remove)替换匹配(Match)特定的字符串并对比两个函数的异同grepl检查子字符串是否存在于字符串中