R项目中的数据操作：比较行

Posted 2023-04-13

技术标签:

【中文标题】R项目中的数据操作：比较行【英文标题】：Data Manipulation in R Project: compare rows 【发布时间】：2014-02-03 02:29:33 【问题描述】：

我正在寻找比较数据集中的值

每一行都以唯一的 ID 开头，后跟几个二进制变量数据如下所示：

row.name v1 v2 v3 ... 
1         0  0  0
2         1  1  1
3         1  0  1

我想知道对于所有唯一配对，哪些值是相同的（如果相等则赋值为 1）和哪些不同（如果不相等则赋值为 0）。例如在列 v1：row1 == 0 和 row2 == 1，这应该导致分配 0。

所以，输出应该是这样的

id1 id2 v1 v2 v3 ...
  1   2  0  0  0 ...
  1   3  0  1  0 ...
  2   3  1  0  1 ...

我正在寻找一种有效的方法来处理超过 1000 行...

【问题讨论】：

【参考方案1】：

如果不扩展行的每个组合，就无法做到这一点，因此如果有 1000 行，这将需要一些时间。但这里有一个解决方案：

dat <- read.table(header=T, text="row.name v1 v2 v3 
1         0  0  0
2         1  1  1
3         1  0  1")

创建索引行：

indices <- t(combn(dat$row.name, 2))
colnames(indices) <- c('id1', 'id2')

遍历索引行，并收集比较：

res1 <- t(apply(indices, 1, function(x) as.numeric(dat[x[1],-1] == dat[x[2],-1])))
colnames(res1) <- names(dat[-1])

把它们放在一起：

result <- cbind(indices, res1)

result
##      id1 id2 v1 v2 v3
## [1,]   1   2  0  0  0
## [2,]   1   3  0  1  0
## [3,]   2   3  1  0  1

【讨论】：

这就像一个魅力（虽然花了一些 CPU 时间）。谢谢

以上是关于R项目中的数据操作：比较行的主要内容，如果未能解决你的问题，请参考以下文章