查找字符串中单词的 semordnilap(reverse anagram)

Posted 2023-03-30

技术标签:

【中文标题】查找字符串中单词的 semordnilap(reverse anagram)【英文标题】：Find semordnilap(reverse anagram) of words in a string 【发布时间】：2019-05-21 14:18:33 【问题描述】：

我正在尝试输入一个字符串，比如一个句子，并找到句子中所有有反向词的词。到目前为止我有这个：

s = "Although he was stressed when he saw his desserts burnt, he managed to stop the pots from getting ruined"

def semordnilap(s):
    s = s.lower()
    b = "!@#$,"
    for char in b:
        s = s.replace(char,"")
    s = s.split(' ')

    dict = 
    index=0
    for i in range(0,len(s)):
        originalfirst = s[index]
        sortedfirst = ''.join(sorted(str(s[index])))
        for j in range(index+1,len(s)):
            next = ''.join(sorted(str(s[j])))
            if sortedfirst == next:
                dict.update(originalfirst:s[j])
        index+=1

    print (dict)

semordnilap(s)

所以这在大多数情况下都有效，但如果你运行它，你会看到它也将“he”和“he”配对为一个字谜，但这不是我想要的。关于如何修复它的任何建议，以及是否可以加快运行时间，如果我要输入一个大文本文件。

【问题讨论】：

您只是在寻找反转的字符串吗？在这种情况下，它就像为每个句子创建一个尊敬的单词列表然后进行查找一样简单。这不是一个非常缓慢的过程吗？不一定。标准库是用 C 实现的，至少在 CPython 中是这样。所以它通常比纯 python 代码更快。不过，您可以使用timeit.timeit 对其进行测量。对于更高级的用例，您可能应该使用nltk。 【参考方案1】：

您可以将字符串拆分为单词列表，然后比较其中一个单词对的所有组合的小写版本。以下示例使用re.findall() 将字符串拆分为单词列表，并使用itertools.combinations() 进行比较：

import itertools
import re

s = "Although he was stressed when he saw his desserts burnt, he managed to stop the pots from getting ruined"

words = re.findall(r'\w+', s)
pairs = [(a, b) for a, b in itertools.combinations(words, 2) if a.lower() == b.lower()[::-1]]

print(pairs)
# OUTPUT
# [('was', 'saw'), ('stressed', 'desserts'), ('stop', 'pots')]

编辑：我仍然更喜欢上面的解决方案，但根据您关于在不导入任何包的情况下执行此操作的评论，请参见下文。但是，请注意，以这种方式使用str.translate() 可能会产生意想不到的后果，具体取决于您的文本性质（例如从电子邮件地址中删除@） - 换句话说，您可能需要比这更仔细地处理标点符号。此外，我通常会使用import string 并使用string.punctuation，而不是我传递给str.translate() 的标点字符的文字字符串，但为了满足您在不导入的情况下执行此操作的要求，请避免使用以下内容。

s = "Although he was stressed when he saw his desserts burnt, he managed to stop the pots from getting ruined"

words = s.translate(None, '!"#$%&\'()*+,-./:;<=>?@[\]^_`|~').split()
length = len(words)
pairs = []
for i in range(length - 1):
    for j in range(i + 1, length):
        if words[i].lower() == words[j].lower()[::-1]:
            pairs.append((words[i], words[j]))

print(pairs)
# OUTPUT
# [('was', 'saw'), ('stressed', 'desserts'), ('stop', 'pots')]

【讨论】：

感谢您发布此信息！它有效，但我试图在函数中执行此操作而不使用库以提高效率。有没有办法在不使用 re 和 itertools 的情况下做到这一点？ @codingtherapy itertools.combinations() 非常快（我还没有检查过re.findall() 的性能，所以不能肯定地说，但匹配正则表达式模式肯定会产生开销）。没有这些库绝对是可能的，但不确定它是否会更有性能，因为您仍然需要找到一种方法来处理标点符号等 - 请参阅答案编辑。谢谢你们两位向我解释这一点。很高兴知道 itertools 很快，我会在一个大文本文件上尝试这两种方法，看看运行时间是多少:)

以上是关于查找字符串中单词的 semordnilap(reverse anagram)的主要内容，如果未能解决你的问题，请参考以下文章