按另一个具有重复项的列表对列表进行排序

Posted 2023-03-27

技术标签:

【中文标题】按另一个具有重复项的列表对列表进行排序【英文标题】：Sorting a list by another list with duplicates 【发布时间】：2022-01-04 14:30:51 【问题描述】：

我有两个列表 [1, 2, 3, 1, 2, 1] 和 [a, b, c, d, e, f]。我想根据对第一个列表进行排序的排列对第二个列表中的元素进行重新排序。对第一个列表进行排序会给出 [1, 1, 1, 2, 2, 3] 但第二个列表有许多可能的排列，即按第一个列表排序，即 [a, d, f, b, e, c], [ d、f、a、e、b、c]等。

如何在 python 中高效地生成所有这些排列？

如果我只想要一种排列，我可以通过以下方式获得一种：

sorted_numbers, sorted_letters = list(zip(*[(x, y) for x, y in sorted(zip(numbers, letters))]))

【问题讨论】：

您的单排列代码做了不必要的事情，可能只是sorted_numbers, sorted_letters = zip(*sorted(zip(numbers, letters)))。 【参考方案1】：

如果列表的大小不是太大，您可以使用 list comprehension 来过滤所有排列，并使用辅助函数：

from itertools import permutations


def is_valid_ordering(perm: str, ch_to_order: dict) -> bool:
    if not perm or len(perm) <= 1:
        return True
    for ch1, ch2 in zip(perm[:-1], perm[1:]):
        if ch_to_order[ch1] > ch_to_order[ch2]:
            return False
    return True


lst_1 = [1, 2, 3, 1, 2, 1]
lst_2 = ['a', 'b', 'c', 'd', 'e', 'f']
ch_to_order = ch: o for ch, o in zip(lst_2, lst_1)
valid_permutations = [
    list(p) for p in permutations(lst_2)
    if is_valid_ordering(p, ch_to_order)
]
for valid_perm in valid_permutations:
    print(valid_perm)

输出：

['a', 'd', 'f', 'b', 'e', 'c']
['a', 'd', 'f', 'e', 'b', 'c']
['a', 'f', 'd', 'b', 'e', 'c']
['a', 'f', 'd', 'e', 'b', 'c']
['d', 'a', 'f', 'b', 'e', 'c']
['d', 'a', 'f', 'e', 'b', 'c']
['d', 'f', 'a', 'b', 'e', 'c']
['d', 'f', 'a', 'e', 'b', 'c']
['f', 'a', 'd', 'b', 'e', 'c']
['f', 'a', 'd', 'e', 'b', 'c']
['f', 'd', 'a', 'b', 'e', 'c']
['f', 'd', 'a', 'e', 'b', 'c']

或者，如果列表很大，因此效率很重要，您可以只构建有效的排序（请参阅Stef's answer，了解比下面更好的方法）：

from collections import defaultdict
from itertools import permutations, product
from iteration_utilities import flatten

lst_1 = [1, 2, 3, 1, 2, 1]
lst_2 = ['a', 'b', 'c', 'd', 'e', 'f']
equivalent_chars = defaultdict(list)
for o, ch in zip(lst_1, lst_2):
    equivalent_chars[o].append(ch)
equivalent_char_groups = [g for o, g in sorted(equivalent_chars.items())]
all_group_permutations = [[list(p) for p in permutations(group)]
                          for group in equivalent_char_groups]
valid_permutations = [
    list(flatten(p)) for p in product(*all_group_permutations)
]
for valid_perm in valid_permutations:
    print(valid_perm)

【讨论】：

如果列表的长度很大 n 并且只有少量重复项，那么您建议的方法将必须检查列表的所有 n! 排列，而不是仅检查少量解决方案. 有效点，但我应该说在所有情况下n 将关于 "请参阅 Stef 的回答，以获得比下面更好的方法";我不会更好地称呼我的答案。唯一的两个区别是：1）我与groupby 分组，而您与defaultdict 分组； 2）您将代码分成几行简单的行，而我却因难以辨认的单行而发疯。由于无论如何都需要对列表进行排序，因此groupby 和dict 分组方法的效率相当。【参考方案2】：

使用itertools 为每个重复键构建排列的笛卡尔积：

代码

from itertools import chain, permutations, groupby, product
from operator import itemgetter

def all_sorts(numbers, letters):
    return [list(map(itemgetter(1), chain.from_iterable(p))) for p in product(*(permutations(g) for _,g in groupby(sorted(zip(numbers, letters)), key=itemgetter(0))))]

print( all_sorts([1,2,3,1,2,1], 'abcdef') )
# [['a', 'd', 'f', 'b', 'e', 'c'], ['a', 'd', 'f', 'e', 'b', 'c'], ['a', 'f', 'd', 'b', 'e', 'c'], ['a', 'f', 'd', 'e', 'b', 'c'], ['d', 'a', 'f', 'b', 'e', 'c'], ['d', 'a', 'f', 'e', 'b', 'c'], ['d', 'f', 'a', 'b', 'e', 'c'], ['d', 'f', 'a', 'e', 'b', 'c'], ['f', 'a', 'd', 'b', 'e', 'c'], ['f', 'a', 'd', 'e', 'b', 'c'], ['f', 'd', 'a', 'b', 'e', 'c'], ['f', 'd', 'a', 'e', 'b', 'c']]

这种方法是最佳的，因为它直接生成解决方案，而不是从庞大的候选列表中过滤它们。对于给定的大小为 6 的示例列表，它仅生成 12 个解决方案，而不是过滤大小为 6 的列表的所有 720 个排列。

工作原理：

首先，我们使用 sorted 和 itertools.groupby 按键进行排序和分组。注意operator.itemgetter(0) 与lambda t: t[0] 相同。

>>> [list(g) for _,g in groupby(sorted(zip(numbers, letters)), key=itemgetter(0))]
[[(1, 'a'), (1, 'd'), (1, 'f')],
 [(2, 'b'), (2, 'e')],
 [(3, 'c')]]

然后我们生成每个键的可能排列，在每个组上使用itertools.permutation。

>>> [list(permutations(g)) for _,g in groupby(sorted(zip(numbers, letters)), key=itemgetter(0))]
[[((1, 'a'), (1, 'd'), (1, 'f')), ((1, 'a'), (1, 'f'), (1, 'd')), ((1, 'd'), (1, 'a'), (1, 'f')), ((1, 'd'), (1, 'f'), (1, 'a')), ((1, 'f'), (1, 'a'), (1, 'd')), ((1, 'f'), (1, 'd'), (1, 'a'))],
 [((2, 'b'), (2, 'e')), ((2, 'e'), (2, 'b'))],
 [((3, 'c'),)]]

然后我们使用itertools.product 构建这些排列列表的笛卡尔积；我们从笛卡尔积中的每个元组重建一个列表，使用itertools.chain 连接。最后我们“取消装饰”，丢弃键并只保留字母，我用map(itemgetter(1), ...) 做的，但也可以用列表理解[t[1] for t in ...] 做同样的事情。

>>> [list(map(itemgetter(1), chain.from_iterable(p))) for p in product(*(permutations(g) for _,g in groupby(sorted(zip(numbers, letters)), key=itemgetter(0))))]
[['a', 'd', 'f', 'b', 'e', 'c'], ['a', 'd', 'f', 'e', 'b', 'c'], ['a', 'f', 'd', 'b', 'e', 'c'], ['a', 'f', 'd', 'e', 'b', 'c'], ['d', 'a', 'f', 'b', 'e', 'c'], ['d', 'a', 'f', 'e', 'b', 'c'], ['d', 'f', 'a', 'b', 'e', 'c'], ['d', 'f', 'a', 'e', 'b', 'c'], ['f', 'a', 'd', 'b', 'e', 'c'], ['f', 'a', 'd', 'e', 'b', 'c'], ['f', 'd', 'a', 'b', 'e', 'c'], ['f', 'd', 'a', 'e', 'b', 'c']]

【讨论】：

【参考方案3】：

另一个没有过滤的实现：

from itertools import product, permutations, chain

numbers = [1, 2, 3, 1, 2, 1]
letters = ['a', 'b', 'c', 'd', 'e', 'f']

grouper = 
for number, letter in zip(numbers, letters):
    grouper.setdefault(number, []).append(letter)
groups = [grouper[number] for number in sorted(grouper)]
for prod in product(*map(permutations, groups)):
    print(list(chain.from_iterable(prod)))

输出：

['a', 'd', 'f', 'b', 'e', 'c']
['a', 'd', 'f', 'e', 'b', 'c']
['a', 'f', 'd', 'b', 'e', 'c']
['a', 'f', 'd', 'e', 'b', 'c']
['d', 'a', 'f', 'b', 'e', 'c']
['d', 'a', 'f', 'e', 'b', 'c']
['d', 'f', 'a', 'b', 'e', 'c']
['d', 'f', 'a', 'e', 'b', 'c']
['f', 'a', 'd', 'b', 'e', 'c']
['f', 'a', 'd', 'e', 'b', 'c']
['f', 'd', 'a', 'b', 'e', 'c']
['f', 'd', 'a', 'e', 'b', 'c']

它首先使用字典将字母按数字分组：

grouper = 1: ['a', 'd', 'f'], 2: ['b', 'e'], 3: ['c']

然后它对数字进行排序并提取它们的字母组：

groups = [['a', 'd', 'f'], ['b', 'e'], ['c']]

然后只需排列每个组并构建和链接产品。

【讨论】：

以上是关于按另一个具有重复项的列表对列表进行排序的主要内容，如果未能解决你的问题，请参考以下文章