返回字符串中单词的字典长度[重复]

Posted 2023-02-23

技术标签:

【中文标题】返回字符串中单词的字典长度[重复]【英文标题】：Returning Dictionary-length of words in string [duplicate] 【发布时间】：2016-06-29 10:01:01 【问题描述】：

我需要构建一个将字符串作为输入并返回字典的函数。键是数字，值是包含唯一单词的列表，这些单词的字母数等于键。比如输入函数如下：

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

函数应该返回：

2: ['is'], 3: ['and', 'see', 'the', 'way', 'you'], 4: ['them', 'they', 'what'], 5: ['treat'], 6: ['become', 'people']

我写的代码如下：

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary=
    for word in my_string:
        words=len(word)
        sample_dictionary[words]=word
    print(sample_dictionary)
    return sample_dictionary

函数返回字典如下：

2: 'is', 3: 'you', 4: 'they', 5: 'treat', 6: 'become'

字典不包含具有相同数量字母的所有单词，而是仅返回字符串中的最后一个。

【问题讨论】：

【参考方案1】：

由于您只想在lists 中存储唯一值，因此使用set 实际上更有意义。你的代码几乎是正确的，你只需要确保你创建一个set 如果words 不是你字典中的一个键，但是你添加到set 如果words 已经是一个键在你的字典里。以下显示：

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary=
    for word in my_string:
        words=len(word)
        if words in sample_dictionary:
            sample_dictionary[words].add(word)
        else:
            sample_dictionary[words] = word
    print(sample_dictionary)
    return sample_dictionary

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

输出

2: set(['is']), 3: set(['and', 'the', 'see', 'you', 'way']), 
 4: set(['them', 'what', 'they']), 5: set(['treat']), 6: set(['become', 'people'])

【讨论】：

哦，这样更好，我们的其他解决方案会引发 KeyError... 如何对列表进行排序 ['the', 'way', 'you', 'see', 'the', 'way', 'you', 'and', 'the', '方式'，'你'] 如果你想按字母顺序做some_list.sort()【参考方案2】：

您的代码的问题在于您只是将最新的单词放入字典中。相反，您必须将该单词添加到一些具有相同长度的单词集合中。在您的示例中，这是一个list，但假设顺序不重要，set 似乎更合适。

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary=
    for word in my_string:
        if len(word) not in sample_dictionary:
            sample_dictionary[len(word)] = set()
        sample_dictionary[len(word)].add(word)
    return sample_dictionary

您可以使用collections.defaultdict(set) 将其缩短一点：

    my_string=my_string.lower().split()
    sample_dictionary=collections.defaultdict(set)
    for word in my_string:
        sample_dictionary[len(word)].add(word)
    return dict(sample_dictionary)

或使用itertools.groupby，但为此您必须先按长度排序：

    words_sorted = sorted(my_string.lower().split(), key=len)
    return k: set(g) for k, g in itertools.groupby(words_sorted, key=len)

示例（三种实现的结果相同）：

>>> n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")
2: 'is', 3: 'way', 'the', 'you', 'see', 'and', 4: 'what', 'them', 'they', 5: 'treat', 6: 'become', 'people'

【讨论】：

完全正确，当然删除重复更有意义！【参考方案3】：

使用sample_dictionary[words]=word，您将覆盖您目前放在那里的当前内容。您需要一个列表，并且可以附加到列表中。

你需要的是：

if words in sample_dictionary.keys():
    sample_dictionary[words].append(word)
else:
    sample_dictionary[words]=[word]

因此，如果此键有值，我将附加到它，否则创建一个新列表。

【讨论】：

是的，您实际上并不需要.keys() 您好，非常感谢您的帮助。尽管如此，我仍然得到字典中已经存在的键的重复值。你知道不使用 set() 来防止重复单词的方法吗？为什么不想使用 set()？嗯，当然有办法。将else: 替换为elif word not in sample_dictionary[words]: -- 然后它将检查此条件【参考方案4】：

您可以使用collections 库中的defaultdict。您可以使用它为字典的值部分创建默认类型，在本例中为列表，然后根据单词的长度附加到它。

from collections import defaultdict

def n_letter_dictionary(my_string):
    my_dict = defaultdict(list)
    for word in my_string.split():
        my_dict[len(word)].append(word)

    return my_dict

您仍然可以在没有默认字典的情况下执行此操作，但长度会长一些。

def n_letter_dictionary(my_string):
    my_dict = 
    for word in my_string.split():
        word_length = len(word)
        if word_length in my_dict:
            my_dict[word_length].append(word)
        else:
            my_dict[word_length] = [word]

    return my_dict

为了确保值列表中没有重复，不使用set()。但是请注意，如果您的值列表很大，并且您的输入数据相当独特，您将遇到性能挫折，因为检查列表中是否已经存在的值只会在遇到它时提前退出。

from collections import defaultdict

def n_letter_dictionary(my_string):
    my_dict = defaultdict(list)
    for word in my_string.split():
        if word not in my_dict[len(word)]:
            my_dict[len(word)].append(word)

    return my_dict

# without defaultdicts
def n_letter_dictionary(my_string):
    my_dict =                                   # Init an empty dict
    for word in my_string.split():                # Split the string and iterate over it
        word_length = len(word)                   # Get the length, also the key
        if word_length in my_dict:                # Check if the length is in the dict
            if word not in my_dict[word_length]:  # If the length exists as a key, but the word doesn't exist in the value list
                my_dict[word_length].append(word) # Add the word
        else:
            my_dict[word_length] = [word]         # The length/key doesn't exist, so you can safely add it without checking for its existence

因此，如果您有很高的重复频率并且要扫描的单词列表很短，那么这种方法是可以接受的。例如，如果您有一个随机生成的单词列表，其中仅包含字母字符的排列，导致值列表膨胀，那么扫描它们会变得很昂贵。

【讨论】：

非常感谢，我仍然得到字典中已经存在的键的重复值。有没有办法不使用 set() 删除重复的单词？我添加了一个关于在不使用set()的情况下确保没有重复的部分。我正在尝试使用您的第一种方法而不使用默认字典，方法是在“for word in my_string.split():”之后添加一个“if word not in my_dict”，但我仍然得到重复单词的相同输出。你能帮我解决没有默认字典的方法吗？我添加了一个不使用defaultdict 的示例，但列表中的结果是唯一的，但不使用set()。如果你有if word not in my_dict，那将始终返回True，因为word 在值中，并且你的语句只检查my_dict 的键。【参考方案5】：

我想出的最短解决方案使用defaultdict：

from collections import defaultdict

sentence = ("The way you see people is the way you treat them"
            " and the Way you treat them is what they become")

现在算法：

wordsOfLength = defaultdict(list)
for word in sentence.split():
    wordsOfLength[len(word)].append(word)

现在wordsOfLength 将保存所需的字典。

【讨论】：

【参考方案6】：

itertools groupby 是完美的工具。

from itertools import groupby
def n_letter_dictionary(string):
    result = 
    for key, group in groupby(sorted(string.split(), key = lambda x: len(x)), lambda x: len(x)):
        result[key] = list(group)
    return result

print n_letter_dictionary("你看人的方式就是你对待他们的方式，你对待他们的方式就是他们变成的样子")

# 2: ['is', 'is'], 3: ['The', 'way', 'you', 'see', 'the', 'way', 'you', 'and', 'the', 'Way', 'you'], 4: ['them', 'them', 'what', 'they'], 5: ['treat', 'treat'], 6: ['people', 'become']

【讨论】：

确实，让我尽快纠正。另外，key = lambda x: len(x) 与 key=len 相同；-) 是的，注意到了，谢谢！为了取悦groupby，对事物进行分类是不必要的。重新考虑这方面。【参考方案7】：

my_string="a aa bb ccc a bb".lower().split()
sample_dictionary=
for word in my_string:
    words=len(word)
    if words not in sample_dictionary:
        sample_dictionary[words] = []
    sample_dictionary[words].append(word)
print(sample_dictionary)

【讨论】：

重新考虑变量 words 的名称。它是wordLength 或类似的。

以上是关于返回字符串中单词的字典长度[重复]的主要内容，如果未能解决你的问题，请参考以下文章

2021-10-16：单词拆分 II。给定一个非空字符串 s 和一个包含非空单词列表的字典 wordDict，在字符串中增加空格来构建一个句子，使得句子中所有的单词都在词典中。返回所有这些可能的句子。

《LeetCode之每日一题》:149.通过删除字母匹配到字典里最长单词

524. 通过删除字母匹配到字典里最长单词

140. 单词拆分 II

每日写题分享--单词拆分/动态规划/剪枝

字典树统计难题