通过 Python 查找和分组字谜

Posted 2023-02-23

技术标签:

【中文标题】通过 Python 查找和分组字谜【英文标题】：Finding and grouping anagrams by Python 【发布时间】：2012-01-01 03:55:24 【问题描述】：

input: ['abc', 'cab', 'cafe', 'face', 'goo']
output: [['abc', 'cab'], ['cafe', 'face'], ['goo']]

问题很简单：它按字谜分组。顺序无关紧要。

当然，我可以用 C++（那是我的母语）来做到这一点。但是，我想知道这可以通过 Python 在一行中完成。 已编辑：如果不可能，可能是 2 或 3 行代码。我是 Python 新手。

为了检查两个字符串是否是字谜，我使用了排序。

>>> input = ['abc', 'cab', 'cafe', 'face', 'goo']
>>> input2 = [''.join(sorted(x)) for x in input]
>>> input2
['abc', 'abc', 'acef', 'acef', 'goo']

我认为结合map 左右可能是可行的。但是，我需要使用dict 作为哈希表。我还不知道这在一行中是否可行。任何提示都会被欣赏！

【问题讨论】：

为什么要在单行中这样做？这只是一种脑筋急转弯。我已经编辑过了。我只想尽量减少代码行数。在 Ruby 中：xs.group_by |x| x.chars.sort.join .values。我想知道为什么 Python 在标准库的某处没有（或有它？）group_by 函数（itertools.groupby() 仅对连续元素进行分组）。有人吗？ 【参考方案1】：

一个可读的单行解决方案：

output = [list(group) for key,group in groupby(sorted(words,key=sorted),sorted)]

例如：

>>> words = ['abc', 'cab', 'cafe', 'goo', 'face']
>>> from itertools import groupby
>>> [list(group) for key,group in groupby(sorted(words,key=sorted),sorted)]
[['abc', 'cab'], ['cafe', 'face'], ['goo']]

这里的关键是使用itertools.groupby from the itertools module，它将列表中的项目组合在一起。

我们提供给groupby 的列表必须预先排序，因此我们将其传递给sorted(words,key=sorted)。这里的技巧是sorted 可以接受一个键函数，并根据该函数的输出进行排序，所以我们再次将sorted 作为键函数传递，这将使用字符串的字母按顺序对单词进行排序.无需定义我们自己的函数或创建lambda。

groupby 采用一个关键函数，用于判断项目是否应该组合在一起，我们可以再次将其传递给内置的 sorted 函数。

最后要注意的是，输出是成对的键和组对象，因此我们只取分组器对象并使用list 函数将它们中的每一个转换为列表。

（顺便说一句 - 我不会把你的变量称为input，因为你隐藏的the built-in input function，虽然它可能不是你应该使用的。）

【讨论】：

@wutz - 你是对的，它需要在初始排序中处理长度。会有厕所 @wutz - 现在通过将 sorted(words) 更改为 sorted(words,key=sorted) 来修复 @wutz -谢谢，感谢您帮助测试它。 :-) 感谢您的精彩解释。我试图了解key 和key func 如何为groupby() 工作。我发现在您的示例中，如果您没有为 groupby() 指定 keyfunc，则结果中的键将是 'abc'、'cab'...与列表元素相同。但是，在使用sorted 作为keyfunc 之后，key 会变成['a', 'b', 'c']...基本上是从每个组对象中拼出来的。你愿意解释一下为什么它会这样做吗？谢谢。【参考方案2】：

不可读的单行解决方案：

>>> import itertools
>>> input = ['abc', 'face', 'goo', 'cab', 'cafe']
>>> [list(group) for key,group in itertools.groupby(sorted(input, key=sorted), sorted)]
[['abc', 'cab'], ['cafe', 'face'], ['goo']]

（好吧，如果算上导入的话，确实是 2 行...）

【讨论】：

如果字谜在输入中不相邻，则会失败【参考方案3】：

不是单行，而是解决方案...

d = 
for item in input:
  s = "".join(sorted(item))
  if not d.has_key(s):
    d[s] = []
  d[s].append(item)
input2 = d.values()

【讨论】：

【参考方案4】：

可读版本：

from itertools import groupby
from operator import itemgetter

def norm(w):
  return "".join(sorted(w))

words = ['abc', 'cba', 'gaff', 'ffag', 'aaaa']

words_aug = sorted((norm(word), word) for word in words)

grouped = groupby(words_aug, itemgetter(0))

for _, group in grouped:
  print map(itemgetter(1), group)

单线：

print list(list(anagrams for _, anagrams in group) for _, group in groupby(sorted(("".join(sorted(word)), word) for word in words), itemgetter(0)))

打印：

[['aaaa'], ['abc', 'cba'], ['ffag', 'gaff']]

【讨论】：

+1，我更喜欢使用[[anagrams... 而不是list(list(anagrams 以提高可读性【参考方案5】：

戴夫的回答很简洁，但是groupby 所需的排序是O(n log(n)) 操作。一个更快的解决方案是：

from collections import defaultdict

def group_anagrams(strings):
    m = defaultdict(list)

    for s in strings:
        m[tuple(sorted(s))].append(s)

    return list(m.values())

【讨论】：

groupby 方法使用collections.Counter 作为键而不是sorted。在这种情况下，它是线性的。但是sorted实际上速度很快，我怀疑除非单词很长，否则使用sorted会更快。 @Stef 时间复杂度与传递给group_anagrams 的字符串列表的长度有关，而不是传递给键函数的字符串长度。将键功能更改为Counter 对Dave 的回答没有帮助，时间复杂度仍然是O(n log(n))，因为sorted 在groupby 调用之前。但你说得对，如果 n 不大（几百万），实际运行时性能可能差不多。哦，你是对的。我很困惑。戴夫的答案使用字符串列表中的sorted 和作为每个字符串调用的键，当我在你的答案中读到“排序”时，出于某种原因，我首先想到了键。跨度> 【参考方案6】：

from itertools import groupby

words = ['oog', 'abc', 'cab', 'cafe', 'face', 'goo', 'foo']

print [list(g) for k, g in groupby(sorted(words, key=sorted), sorted)]

结果：

[['abc', 'cab'], ['cafe', 'face'], ['foo'], ['oog', 'goo']]

您不能只使用 groupby 函数，因为它只会将您的关键函数产生相同结果的顺序元素组合在一起。

简单的解决方案是首先使用与分组相同的功能对单词进行排序。

【讨论】：

是的，忽略了这一点，再加上单词必须相邻的事实。固定。【参考方案7】：

尽管如果您在不使用导入及其内置函数（idk 用于脑筋急转弯）的情况下尝试解决问题，那么 cmets 是 100% 正确的，那么您就可以了

def sort_anagrams(li):
        new_li = []
    for i in li:
        tree = False
        for j in new_li:
            if sorted(i) == sorted(j[0]):
                j.append(i)
                tree = True
        if not tree:
            new_li.append([i])
    return new_li

在使用中：

list_of = ['abc', 'face', 'goo', 'cab', 'cafe']
print(sort_anagrams(list_of))

输出：

[['abc', 'cab'], ['cafe', 'face'], ['goo']]

【讨论】：

以上是关于通过 Python 查找和分组字谜的主要内容，如果未能解决你的问题，请参考以下文章