NLTK:情绪分析:结果一值

Posted

技术标签:

【中文标题】NLTK:情绪分析:结果一值【英文标题】:NLTK: sentiment analysis: result one value 【发布时间】:2015-04-27 21:56:42 【问题描述】:

很抱歉发布此消息,因为答案可能在此: NLTK sentiment analysis is only returning one value

或者这个帖子:Python NLTK not sentiment calculate correct

但我不知道如何将它应用到我的代码中。

我是 Python 和 NLTK 的新手,我讨厌我不得不用一大段代码来打扰你,再次抱歉。

使用我使用的代码,我总是得到“pos”。我尝试通过将积极特征排除在训练集中来进行分类。那么回报总是'中性'。

谁能告诉我我做错了什么? 非常感谢您!而且不要介意我使用的随机测试句子,这只是我试图找出问题所在时出现的一些东西。

import re, math, collections, itertools
import nltk
import nltk.classify.util, nltk.metrics
from nltk.classify import NaiveBayesClassifier
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist  
from nltk.util import ngrams
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.porter import *
from nltk.stem.snowball import SnowballStemmer

stemmer = SnowballStemmer("english", ignore_stopwords = True)

pos_tweets = ['I love bananas','I like pears','I eat oranges']
neg_tweets = ['I hate lettuce','I do not like tomatoes','I hate apples']
neutral_tweets = ['I buy chicken','I am boiling eggs','I am chopping vegetables']

def uni(doc):
    x = []
    y = []
    for tweet in doc:
        x.append(word_tokenize(tweet))
    for element in x:
        for word in element:
            if len(word)>2:
                word = word.lower()
                word = stemmer.stem(word)
                y.append(word)
    return y

def word_feats_uni(doc):
     return dict([(word, True) for word in uni(doc)])

def tokenizer_ngrams(document):
    all_tokens = []
    filtered_tokens = []
    for (sentence) in document:
        all_tokens.append(word_tokenize(sentence))
    return all_tokens

def get_bi (document):
    x = tokenizer_ngrams(document)
    c = []
    for sentence in x:
        c.extend([bigram for bigram in nltk.bigrams(sentence)])
    return c

def get_tri(document):
    x = tokenizer_ngrams(document)
    c = []
    for sentence in x:
        c.extend([bigram for bigram in nltk.bigrams(sentence)])
    return c

def word_feats_bi(doc): 
    return dict([(word, True) for word in get_bi(doc)])

def word_feats_tri(doc):
    return dict([(word, True) for word in get_tri(doc)])

def word_feats_test(doc):
    feats_test = 
    feats_test.update(word_feats_uni(doc))
    feats_test.update(word_feats_bi(doc))
    feats_test.update(word_feats_tri(doc))
    return feats_test

pos_feats = [(word_feats_uni(pos_tweets),'pos')] + [(word_feats_bi(pos_tweets),'pos')] + [(word_feats_tri(pos_tweets),'pos')]

neg_feats = [(word_feats_uni(neg_tweets),'neg')] + [(word_feats_bi(neg_tweets),'neg')] + [(word_feats_tri(neg_tweets),'neg')]

neutral_feats = [(word_feats_uni(neutral_tweets),'neutral')] + [(word_feats_bi(neutral_tweets),'neutral')] + [(word_feats_tri(neutral_tweets),'neutral')]

trainfeats = pos_feats + neg_feats + neutral_feats

classifier = NaiveBayesClassifier.train(trainfeats)

print (classifier.classify(word_feats_test('I am chopping vegetables and boiling eggs')))

【问题讨论】:

【参考方案1】:

解决方案非常简单。您的 word_feats_test 将为句子 'I am chopping vegetables and boiling eggs' 返回一个空字典;因此分类器在没有特征的情况下偏向pos

我把你的句子放在一个列表中:

print(classifier.classify(word_feats_test(
      ['I am chopping vegetables and boiling eggs'])))

并打印neutral

您应该使用完全相同的函数来计算所有 3 个特征:训练集、测试集和分类。

【讨论】:

非常感谢!完美运行!

以上是关于NLTK:情绪分析:结果一值的主要内容,如果未能解决你的问题,请参考以下文章

混淆矩阵 - 测试情绪分析模型

Python NLTK 不情绪计算正确

Python NLTK:SyntaxError:文件中的非 ASCII 字符“\xc3”(情绪分析-NLP)

如何用 Python 中的 NLTK 对中文进行分析和处理

是否可以编辑 NLTK 的 vader 情绪词典?

Beta冲刺-第三天