兽化熊猫(Python)
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了兽化熊猫(Python)相关的知识,希望对你有一定的参考价值。
我是Pandas的初学者,我试图弄清楚如何对数据框的单个列进行定标。以下面的示例为例(这是(非)常用词去除后的一些文本,我想对其进行词形化处理:
0良好的需求变化使天然微酿啤酒...
1个新的收藏夹给人惊喜的发现...
2个最喜欢的红酱享受强劲的单宁好拉...
3种品质出色的1800年代21世纪尝试饮品...
4红初次尝试恋爱100优秀交融...
这是我用来进行词法化的代码(摘自here:
df['words'] = df['words'].apply(lambda x: "".join([Word(word).lemmatize() for word in x]))
df['words'].head()
但是运行此代码后,输出不会更改:
0好的需要改变维尔吉尔自然微酿造的酒...
1个新的收藏夹给人惊喜的发现...
2个最喜欢的红酱享受强劲的单宁好拉...
3种品质出色的1800年代21世纪尝试饮品...
4红初次尝试恋爱100优秀交融...
任何帮助将不胜感激:)
P.S:words
是标记词的列表
答案
您可能不再需要解决方案,但是如果您想在许多pos上进行定理,则可以使用:
如果需要更多,可以尝试以下代码:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lemmatizer = nltk.stem.WordNetLemmatizer()
wordnet_lemmatizer = WordNetLemmatizer()
stop = stopwords.words('english')
def nltk_tag_to_wordnet_tag(nltk_tag):
if nltk_tag.startswith('J'):
return wordnet.ADJ
elif nltk_tag.startswith('V'):
return wordnet.VERB
elif nltk_tag.startswith('N'):
return wordnet.NOUN
elif nltk_tag.startswith('R'):
return wordnet.ADV
else:
return None
def lemmatize_sentence(sentence):
#tokenize the sentence and find the POS tag for each token
nltk_tagged = nltk.pos_tag(nltk.word_tokenize(sentence))
#tuple of (token, wordnet_tag)
wordnet_tagged = map(lambda x: (x[0], nltk_tag_to_wordnet_tag(x[1])), nltk_tagged)
lemmatized_sentence = []
for word, tag in wordnet_tagged:
if tag is None:
#if there is no available tag, append the token as is
lemmatized_sentence.append(word)
else:
#else use the tag to lemmatize the token
lemmatized_sentence.append(lemmatizer.lemmatize(word, tag))
return " ".join(lemmatized_sentence)
# Lemmatizing
df['Lemmatize'] = df['word'].apply(lambda x: lemmatize_sentence(x))
print(df.head())
df结果:
word | Lemmatize
0 Best scores, good cats, it rocks | Best score , good cat , it rock
1 You received best scores | You receive best score
2 Good news | Good news
3 Bad news | Bad news
4 I am loving it | I be love it
5 it rocks a lot | it rock a lot
6 it is still good to do better | it be still good to do good
以上是关于兽化熊猫(Python)的主要内容,如果未能解决你的问题,请参考以下文章