如何找到至少 2 个向量中的共同元素？

Posted 2023-02-22

技术标签:

【中文标题】如何找到至少 2 个向量中的共同元素？【英文标题】：How to find elements common in at least 2 vectors? 【发布时间】：2014-11-28 07:29:29 【问题描述】：

假设我有 5 个向量：

a <- c(1,2,3)
b <- c(2,3,4)
c <- c(1,2,5,8)
d <- c(2,3,4,6)
e <- c(2,7,8,9)

我知道我可以通过使用Reduce() 和intersect() 来计算所有它们之间的交集，如下所示：

Reduce(intersect, list(a, b, c, d, e))
[1] 2

但是我怎样才能找到至少在 2 个向量中常见的元素呢？即：

[1] 1 2 3 4 8

【问题讨论】：

【参考方案1】：

它比很多人想象的要简单得多。这应该非常有效。

将所有内容放入向量中：

x <- unlist(list(a, b, c, d, e))

查找重复项

unique(x[duplicated(x)])
# [1] 2 3 1 4 8

如果需要，还有sort。

注意：如果列表元素中可能存在重复项（您的示例似乎没有暗示），请将x 替换为x <- unlist(lapply(list(a, b, c, d, e), unique))

编辑：由于 OP 对 n >= 2 的更通用的解决方案表示了兴趣，我会这样做：

which(tabulate(x) >= n)

如果数据仅由自然整数（1、2 等）组成，如示例中所示。如果没有：

f <- table(x)
names(f)[f >= n]

这与 James 的解决方案相距不远，但它避免了昂贵的 sort。它比计算所有可能的组合要快几英里。

【讨论】：

不错的一个。这可以概括为 n > 2 吗？如，我如何找到至少 n 个向量中的共同元素？不，这需要我通过table 或tabulate 使用频率表，请参阅我的编辑。【参考方案2】：

你可以尝试所有可能的组合，例如：

## create a list
l <- list(a, b, c, d)

## get combinations
cbn <- combn(1:length(l), 2)

## Intersect them 
unique(unlist(apply(cbn, 2, function(x) intersect(l[[x[1]]], l[[x[2]]]))))
## 2 3 1 4

【讨论】：

你能解释一下 combn() (1:4) 的第一个参数是什么吗？我把它改成了更通用的length(l)。当您选择 k 时，它会创建 n 个元素的所有可能组合。【参考方案3】：

这是另一个选择：

# For each vector, get a vector of values without duplicates
deduplicated_vectors <- lapply(list(a,b,c,d,e), unique)

# Flatten the lists, then sort and use rle to determine how many
# lists each value appears in
rl <- rle(sort(unlist(deduplicated_vectors)))

# Get the values that appear in two or more lists
rl$values[rl$lengths >= 2]

【讨论】：

【参考方案4】：

这是一种计算每个唯一值出现的向量数量的方法。

unique_vals <- unique(c(a, b, c, d, e))

setNames(rowSums(!!(sapply(list(a, b, c, d, e), match, x = unique_vals)),
                 na.rm = TRUE), unique_vals)
# 1 2 3 4 5 8 6 7 9 
# 2 5 3 2 1 2 1 1 1

【讨论】：

【参考方案5】：

@rengis 方法的一个变体是：

unique(unlist(Map(`intersect`, cbn[1,], cbn[2,])))
#[1] 2 3 1 4 8

在哪里，

l <- mget(letters[1:5])
cbn <- combn(l,2)

【讨论】：

【参考方案6】：

另一种方法，使用outer 应用矢量化函数：

L <- list(a, b, c, d, e)
f <- function(x, y) intersect(x, y)
fv <- Vectorize(f, list("x","y"))
o <- outer(L, L, fv)
table(unlist(o[upper.tri(o)]))

#  1  2  3  4  8 
#  1 10  3  1  1

上面的输出给出了共享每个重复元素 1、2、3、4 和 8 的向量对的数量。

【讨论】：

【参考方案7】：

当向量很大时，duplicated 或 tabulate 之类的解决方案可能会溢出您的系统。在这种情况下，dplyr 可以通过以下代码派上用场

library(dplyr) combination_of_vectors <- c(a, b, c, d, e)
#For more than 1 
combination_of_vectors %>% as_tibble() %>% group_by(x) %>% filter(n()>1)
#For more than 2 
combination_of_vectors %>% as_tibble() %>% group_by(x) %>% filter(n()>2)
#For more than 3 
combination_of_vectors %>% as_tibble() %>% group_by(x) %>% filter(n()>2)

希望对某人有所帮助

【讨论】：

以上是关于如何找到至少 2 个向量中的共同元素？的主要内容，如果未能解决你的问题，请参考以下文章