Python中的情感分析代码使用了啥算法?

Posted

技术标签:

【中文标题】Python中的情感分析代码使用了啥算法?【英文标题】:What algorithm that was used on the sentiment analysis code in Python?Python中的情感分析代码使用了什么算法? 【发布时间】:2018-09-07 02:04:30 【问题描述】:

我有一个关于情绪分析的问题。我有一个包含推文(加密货币)的数据。我计划进行情绪分析,以获得每条推文的正面和负面结果。

我找到了很好的情感分析代码,但因为我是这个领域的新手。我不知道对此使用了什么分类算法。代码如下:

# importing Libraries
from pandas import DataFrame, read_csv
import chardet
import matplotlib.pyplot as plt; plt.rcdefaults()
from matplotlib import rc
%matplotlib inline
import pandas as pd
plt.style.use('ggplot')
import numpy as np
import re
import warnings

#Visualisation
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
from IPython.display import display
from mpl_toolkits.basemap import Basemap
from wordcloud import WordCloud, STOPWORDS

#nltk
from nltk.stem import WordNetLemmatizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.sentiment.util import *
from nltk import tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.snowball import SnowballStemmer
from nltk.corpus import stopwords
stop = stopwords.words('english')


matplotlib.style.use('ggplot')
pd.options.mode.chained_assignment = None
warnings.filterwarnings("ignore")

%matplotlib inline



#########Sentiment Analysis code########

tweets['text_lem'] = [''.join([WordNetLemmatizer().lemmatize(re.sub('[^A-Za-z]', ' ', line)) for line in lists]).strip() for lists in tweets['text']]       
vectorizer = TfidfVectorizer(max_df=0.5,max_features=10000,min_df=10,stop_words='english',use_idf=True)
X = vectorizer.fit_transform(tweets['text_lem'].str.upper())
sid = SentimentIntensityAnalyzer()
tweets['sentiment_compound_polarity']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['compound'])
tweets['sentiment_neutral']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['neu'])
tweets['sentiment_negative']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['neg'])
tweets['sentiment_pos']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['pos'])
tweets['sentiment_type']=''
tweets.loc[tweets.sentiment_compound_polarity>0,'sentiment_type']='POSITIVE'
tweets.loc[tweets.sentiment_compound_polarity==0,'sentiment_type']='NEUTRAL'
tweets.loc[tweets.sentiment_compound_polarity<0,'sentiment_type']='NEGATIVE'

谁能告诉我有关情绪分析代码的更多信息? 用了什么算法?

【问题讨论】:

【参考方案1】:

此代码中的分类器是SentimentIntensityAnalyser()。 documentation 表示它可能是一个 NaiveBayesClassifier。

如果您访问原始论文 here,他们还会提到 NaiveBayesClassifier。

但是,来自github project,作者指出:

基于规则的情绪分析引擎的 Python 代码。实施论文中描述的语法和句法规则,结合经验得出的量化,以评估每个规则对句子级文本中情感感知强度的影响。

因此,您代码中的算法是基于规则的算法,而不是机器学习算法。代码是here。

测试库

使用论文中的代码:

hate_comments = ['I second that emotion! I can\'t understand how any decent human being could support them  considering their ongoing loathsome record. #ToriesOut2018 #NHSCrisis #CambridgeAnalytica',
             'Think we’d just share the ladder, Mikey pal. Nationalise all of the ladders and have a big old ladder party.',
             'The Tories, young and old, do not understand that where child poverty, homelessness and the destruction of the NHS are concerned, there is absolutely nothing to smile about. Well done Lara.',
             'I don\'t even like them!',
             'Boom! Get in......',
             'Me too',
             'That\'s fine, but do it with a smile.',
             'Yesss girl',
             'Me too!',
             'Ditto..',
             'one day she will be all grown up .. ah bless',
             'Who doesn\'t.',
             'I hate them too Lara'
              ]

for sentence in hate_comments:
    print(sentence)
    ss = sid.polarity_scores(sentence)
    for k in ss:
        print('0: 1, '.format(k, ss[k]), end='')
        print() 

[出]:

    I second that emotion! I can't understand how any decent human being could support them  considering their ongoing loathsome record. #ToriesOut2018 #NHSCrisis #CambridgeAnalytica
neg: 0.0, 
neu: 0.87, 
pos: 0.13, 
compound: 0.4574, 
Think we’d just share the ladder, Mikey pal. Nationalise all of the ladders and have a big old ladder party.
neg: 0.0, 
neu: 0.776, 
pos: 0.224, 
compound: 0.5994, 
The Tories, young and old, do not understand that where child poverty, homelessness and the destruction of the NHS are concerned, there is absolutely nothing to smile about. Well done Lara.
neg: 0.244, 
neu: 0.702, 
pos: 0.055, 
compound: -0.806, 
I don't even like them!
neg: 0.445, 
neu: 0.555, 
pos: 0.0, 
compound: -0.3404, 
Boom! Get in......
neg: 0.0, 
neu: 1.0, 
pos: 0.0, 
compound: 0.0, 
Me too
neg: 0.0, 
neu: 1.0, 
pos: 0.0, 
compound: 0.0, 
That's fine, but do it with a smile.
neg: 0.0, 
neu: 0.518, 
pos: 0.482, 
compound: 0.5647, 
Yesss girl
neg: 0.0, 
neu: 1.0, 
pos: 0.0, 
compound: 0.0, 
Me too!
neg: 0.0, 
neu: 1.0, 
pos: 0.0, 
compound: 0.0, 
Ditto..
neg: 0.0, 
neu: 1.0, 
pos: 0.0, 
compound: 0.0, 
one day she will be all grown up .. ah bless
neg: 0.0, 
neu: 0.781, 
pos: 0.219, 
compound: 0.4215, 
Who doesn't.
neg: 0.0, 
neu: 1.0, 
pos: 0.0, 
compound: 0.0, 
I hate them too Lara
neg: 0.552, 
neu: 0.448, 
pos: 0.0, 
compound: -0.5719, 

您可以观察到逃脱规则的消息没有正确注释,例如应该是肯定的Yesss girlMe too!

如果您负担得起标记大量文本以预测情绪的成本,机器学习分类器通常更适合这些情况。

【讨论】:

以上是关于Python中的情感分析代码使用了啥算法?的主要内容,如果未能解决你的问题,请参考以下文章

情感分析之TF-IDF算法

我用Python进行情感分析,让程序员和女神牵手成功

python情感分析:基于jieba的分词及snownlp的情感分析!

用Python对用户的评论数据进行情感倾向分析

如何解决情感分析中的歧义?

Python做文本挖掘的情感极性分析