自然语言19.1_Lemmatizing with NLTK

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了自然语言19.1_Lemmatizing with NLTK相关的知识,希望对你有一定的参考价值。

https://www.pythonprogramming.net/lemmatizing-nltk-tutorial/?completed=/named-entity-recognition-nltk-tutorial/

 

Lemmatizing with NLTK




A very similar operation to stemming is called lemmatizing. The major difference between these is, as you saw earlier, stemming can often create non-existent words, whereas lemmas are actual words.

So, your root stem, meaning the word you end up with, is not something you can just look up in a dictionary, but you can look up a lemma.

Some times you will wind up with a very similar word, but sometimes, you will wind up with a completely different word. Let‘s see some examples.

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

print(lemmatizer.lemmatize("cats"))
print(lemmatizer.lemmatize("cacti"))
print(lemmatizer.lemmatize("geese"))
print(lemmatizer.lemmatize("rocks"))
print(lemmatizer.lemmatize("python"))
print(lemmatizer.lemmatize("better", pos="a"))
print(lemmatizer.lemmatize("best", pos="a"))
print(lemmatizer.lemmatize("run"))
print(lemmatizer.lemmatize("run",‘v‘))

Here, we‘ve got a bunch of examples of the lemma for the words that we use. The only major thing to note is that lemmatize takes a part of speech parameter, "pos." If not supplied, the default is "noun." This means that an attempt will be made to find the closest noun, which can create trouble for you. Keep this in mind if you use lemmatizing!

In the next tutorial, we‘re going to dive into the NTLK corpus that came with the module, looking at all of the awesome documents they have waiting for us there.

以上是关于自然语言19.1_Lemmatizing with NLTK的主要内容,如果未能解决你的问题,请参考以下文章

从 MS Visual C++ 编译器更改为英特尔 C++ 19.1 编译器时未解析的外部符号 __imp__fread

LinuxMint19.1安装搜狗拼音输入法

自然语言16_Chunking with NLTK

自然语言17_Chinking with NLTK

自然语言14_Stemming words with NLTK

自然语言23_Text Classification with NLTK