我如何从句子中计算相同的单词？

Posted 2021-04-30

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了我如何从句子中计算相同的单词？相关的知识，希望对你有一定的参考价值。

我想问如何从句子中计算相同的单词（在Python中。

例如，来自一个句子：“多么美好的一天。鸟儿在唱歌，孩子们在笑。”

我要提取的是：['what'：1，'a'：1，'wonderful'：1，'dat'：1，'birds'：1，'are'：2，'singing'：1，'children'：1，'笑'：1]

我在这里做了：

sent = "What a wonderful day. Birds are singing, children are laughing."
b = set([word.lower() for word in a])
c = list(b)

如果此代码不适合该工作，请告诉我。谢谢。

答案

您可以使用counter并重新使用它

import re
from collections import Counter
remove_punctutation = re.findall("[A-Za-z]+",sent)
print(dict(Counter(remove_punctutation)))
#{'What': 1,'a': 1,'wonderful': 1,'day': 1,'Birds': 1,'are': 2,'singing': 1,'children': 1,'laughing': 1}

另一答案

collections.Counter可用于计算列表中任何内容的出现次数。这是一个好的开始。这意味着，但是我们首先应该将句子变成单词列表，并删除标点符号。

要列出单词，有一种称为.split()的方法将在空白处分割句子。要删除标点符号，方法.strip()是一个不错的选择。

正如您已经暗示的，我们还应该对案件进行规范化。为此，最好使用.casefold()，而不要使用.lower()。在某些本地人中，它们将是不同的。

全部导致代码看起来像：

import string
from collections import Counter

sent = "What a wonderful day. Birds are singing, children are laughing."
words = [word.strip(string.punctuation).casefold() for word in sent.split()]
freq = Counter(words)

另一答案

使用collections.Counter + string.strip去除标点符号：

from collections import Counter
import string

sent = "What a wonderful day. Birds are singing, children are laughing."

c = Counter([x.strip(string.punctuation) for x in sent.split()])
print(c)

# Counter({'are': 2, 'What': 1, 'a': 1, 'wonderful': 1, 'day': 1, 'Birds': 1, 'singing': 1, 'children': 1, 'laughing': 1})

如果不区分大小写，请在查找计数之前转换为小写，如下所示：

s = sent.lower().translate(str.maketrans('', '', string.punctuation))

以上是关于我如何从句子中计算相同的单词？的主要内容，如果未能解决你的问题，请参考以下文章