使用 Perl 中对列表进行排序的索引对另一个列表进行排序和索引

Posted 2023-03-24

技术标签:

【中文标题】使用 Perl 中对列表进行排序的索引对另一个列表进行排序和索引【英文标题】：Using the indices from sorting a list in Perl to sort and index another 【发布时间】：2012-09-07 18:53:35 【问题描述】：

假设我有一个包含单词的列表和另一个包含与这些单词相关的置信度的列表：

my @list = ("word1", "word2", "word3", "word4");
my @confidences = (0.1, 0.9, 0.3, 0.6);

我想获得第二对列表，其中包含@list 的元素的置信度高于0.4 的排序顺序，以及它们对应的置信度。我如何在 Perl 中做到这一点？（即使用用于排序另一个列表的索引列表）

在上面的示例中，输出将是：

my @sorted_and_thresholded_list = ("word2", "word4");
my @sorted_and_thresholded_confidences = (0.9, 0.6);

@list 中的条目可能不是唯一的（即排序应该是稳定的）应按降序排序。

【问题讨论】：

@list 中的条目是否唯一？ 【参考方案1】：

在处理并行数组时，必须使用索引。

my @sorted_and_thresholded_indexes =
    sort  $confidences[$b] <=> $confidences[$a] 
     grep $confidences[$_] > 0.4,
      0..$#confidences;

my @sorted_and_thresholded_list =
   @list[ @sorted_and_thresholded_indexes ];
my @sorted_and_thresholded_confidences =
   @confidences[ @sorted_and_thresholded_indexes ];

【讨论】：

【参考方案2】：

使用List::MoreUtils'pairwise 和part：

use List::MoreUtils qw(pairwise part);
my @list = ("word1", "word2", "word3", "word4");
my @confidences = (0.1, 0.9, 0.3, 0.6);

my $i = 0;
my @ret = part  $i++ % 2  
          grep  defined  
          pairwise  $b > .4 ? ($a, $b) : undef  @list, @confidences;

print Dumper @ret;

输出：

$VAR1 = [
          'word2',
          'word4'
        ];
$VAR2 = [
          '0.9',
          '0.6'
        ];

【讨论】：

【参考方案3】：

如果您确定不会有重复的单词，我认为在此任务中使用哈希可能更容易，例如：

my %hash = ( "word1" => 0.1,
             "word2" => 0.9,
             "word3" => 0.3,
             "word4" => 0.6
           );

然后您可以遍历哈希中的键，只找出符合您条件的键：

foreach my $key (keys %hash) 
    if ($hash$key > 0.4) 
        print $key;

【讨论】：

如果@list中有重复条目怎么办？那么这将不起作用 - 根据您的评论编辑我的答案。【参考方案4】：

尽管ikegami 已经说明了我的首选解决方案——使用索引——也可以选择将数组组合成一个二维数组(*)。这样做的好处是数据都被收集到同一个数据结构中，因此很容易被操纵。

use strict;
use warnings;
use Data::Dumper;

my @list = ("word1", "word2", "word3", "word4");
my @conf = (0.1, 0.9, 0.3, 0.6);
my @comb;

for (0 .. $#list)                        # create two-dimensional array
    push @comb, [ $list[$_], $conf[$_] ];


my @all = sort  $b->[1] <=> $a->[1]     # sort according to conf
          grep  $_->[1] > 0.4  @comb;   # conf limit

my @list_done = map $_->[0], @all;        # break the lists apart again
my @conf_done = map $_->[1], @all;

print Dumper \@all, \@list_done, \@conf_done;

输出：

$VAR1 = [
          [
            'word2',
            '0.9'
          ],
          [
            'word4',
            '0.6'
          ]
        ];
$VAR2 = [
          'word2',
          'word4'
        ];
$VAR3 = [
          '0.9',
          '0.6'
        ];

(*) = 使用哈希也是一种选择，假设 1) 原始顺序不重要，2) 所有单词都是唯一的。但是，除非快速查找是一个问题，否则使用数组没有缺点。

【讨论】：

【参考方案5】：

my @list = ("word1", "word2", "word3", "word4");
my @confidences = (0.1, 0.9, 0.3, 0.6);

my @result = map  $list[$_] 
              sort  $confidences[$b] <=> $confidences[$a] 
                 grep  $confidences[$_] > 0.4  (0..$#confidences);

【讨论】：

以上是关于使用 Perl 中对列表进行排序的索引对另一个列表进行排序和索引的主要内容，如果未能解决你的问题，请参考以下文章