R使用自定义比较功能进行排序或排序

Posted 2023-03-27

技术标签:

【中文标题】R使用自定义比较功能进行排序或排序【英文标题】：R Sort or order with custom compare function 【发布时间】：2022-01-05 07:23:32 【问题描述】：

我可以将自定义比较函数传递给order，给定两个项目，指示哪个项目排名更高？

在我的具体情况下，我有以下列表。

scores <- list(
    'a' = c(1, 1, 2, 3, 4, 4),
    'b' = c(1, 2, 2, 2, 3, 4),
    'c' = c(1, 1, 2, 2, 3, 4),
    'd' = c(1, 2, 3, 3, 3, 4)
)

如果我们采用两个向量a 和b，则i 的第一个元素的索引a[i] > b[i] 或a[i] < b[i] 应该确定哪个向量先出现。在这个例子中，scores[['d']] > scores[['a']] 因为scores[['d']][2] > scores[['a']][2]（注意scores[['d']][5] < scores[['a']][5] 无关紧要）。

比较其中两个向量可能看起来像这样。

compare <- function(a, b) 
    # get first element index at which vectors differ
    i <- which.max(a != b)
    if(a[i] > b[i])
        1
    else if(a[i] < b[i])
        -1
    else
        0

scores使用这个比较函数排序后的key应该是d, b, a, c。

从我找到的其他解决方案中，他们mess with the data before ordering 或介绍S3 classes and apply comparison attributes。对于前者，我看不到如何处理我的数据（也许将其转换为字符串？但是 9 以上的数字呢？），对于后者，我觉得在我的 R 包中引入一个新类只是为了比较向量而感到不舒服。而且似乎没有我想传递给order的比较器参数。

【问题讨论】：

@akrun 我不太清楚你所说的使用组合是什么意思。我的目标是在Python、cpp、Java 或Go 中经常找到的东西，您可以在其中提供一个简单地比较两个项目的比较函数。 【参考方案1】：

这是一个尝试。我已经解释了 cmets 中的每一步。

compare <- function(a, b) 
  
  # subtract vector a from vector b
  comparison <- a - b
  # get the first non-zero result
  restult <- comparison[comparison != 0][1]
  # return 1 if result == 1 and 2 if result == -1 (0 if equal)
  if(is.na(restult)) return(0) else if(restult == 1) return(1) else return(2)
  


compare_list <- function(list_) 
  
  # get combinations of all possible comparison
  comparisons <- combn(length(list_), 2)
  # compare all possibilities
  results <- apply(comparisons, 2, function(x) 
    # get the "winner"
    x[compare(list_[[x[1]]], list_[[x[2]]])]
  )
  # get frequency table (how often a vector "won" -> this is the result you want)
  fr_tab <- table(results)
  # vector that is last in comparison
  last_vector <- which(!(1:length(list_) %in% as.numeric(names(fr_tab))))
  # return the sorted results and add the last vectors name
  c(as.numeric(names(sort(fr_tab, decreasing = T))), last_vector)

如果你在你的例子上运行这个函数，结果是

> compare_list(scores)
[1] 4 2 1 3

我还没有处理过两个向量相同的情况，你还没有解释如何处理。

【讨论】：

我觉得compn 被这样使用很有趣！肯定在那里学到了一些新东西。我唯一担心的是复杂性似乎在O(n^2) 区域中的某个地方确实如此，尽管您可以实现某种形式的蒙特卡洛树搜索来预先消除一些比较。例如。将d 与b 进行比较，然后将c 与a 进行比较，然后将b 与a 进行比较。【参考方案2】：

执行此操作的原生 R 方法是引入 S3 类。

你可以在课堂上做两件事。您可以为xtfrm 定义一个将列表条目转换为数字的方法。这可以被矢量化，并且可以想象会非常快。

但您要求的是用户定义的比较函数。这会很慢，因为 R 函数调用很慢，而且有点笨拙，因为没有人这样做。但按照xtfrm 帮助页面中的说明，操作方法如下：

scores <- list(
  'a' = c(1, 1, 2, 3, 4, 4),
  'b' = c(1, 2, 2, 2, 3, 4),
  'c' = c(1, 1, 2, 2, 3, 4),
  'd' = c(1, 2, 3, 3, 3, 4)
)

# Add a class to the list

scores <- structure(scores, class = "lexico")

# Need to keep the class when subsetting

`[.lexico` <- function(x, i, ...) structure(unclass(x)[i], class = "lexico")

# Careful here:  identical() might be too strict

`==.lexico` <- function(a, b) identical(a, b)

`>.lexico` <- function(a, b) 
  a <- a[[1]]
  b <- b[[1]]
  i <- which(a != b)
  length(i) > 0 && a[i[1]] > b[i[1]]


is.na.lexico <- function(a) FALSE

sort(scores)
#> $c
#> [1] 1 1 2 2 3 4
#> 
#> $a
#> [1] 1 1 2 3 4 4
#> 
#> $b
#> [1] 1 2 2 2 3 4
#> 
#> $d
#> [1] 1 2 3 3 3 4
#> 
#> attr(,"class")
#> [1] "lexico"

^{由reprex package (v2.0.1) 于 2021 年 11 月 27 日创建}

这与您要求的顺序相反，因为默认情况下sort() 排序为递增顺序。如果你真的想要 d、b、a、c 使用 sort(scores, decreasing = TRUE。

【讨论】：

感谢您的详细回复！您是否对这如何影响 CRAN 的包开发有任何意见，即。我应该对简单地向列表中添加新类保持警惕吗？我应该 #' @export 那些 lexico 函数等吗？【参考方案3】：

这是另一个非常简单的解决方案：

sort(sapply(scores, function(x) as.numeric(paste(x, collapse = ""))), decreasing = T)

它的作用是获取所有向量，将它们“压缩”成一个数字，然后按降序对这些数字进行排序。

【讨论】：

哦，我在考虑压缩向量时没有考虑as.numeric 部分。整洁的！您是否也知道如何考虑负数？（例如，c(-3, -1, 0, 4) 大于 c(-3, 0, 2, 5)）由于向量是单调递增的，只需将最小的数字添加到所有向量（在您的示例中将 3 添加到所有向量）。这可以确保向量中的所有数字都是正数。

以上是关于R使用自定义比较功能进行排序或排序的主要内容，如果未能解决你的问题，请参考以下文章

SQL Server的自定义排序功能

SQLite 自定义函数,聚合,排序规则

使用Collections 将自定义对象进行排序

为自定义模型使用 QTableWidget 的排序功能

使用自定义比较器进行排序时出现运行时错误

如何在排序对象列表时解决自定义比较器的意外行为？