用于 python 的斯坦福 nlp

Posted 2023-02-16

技术标签:

【中文标题】用于 python 的斯坦福 nlp【英文标题】：Stanford nlp for python 【发布时间】：2015-12-29 00:47:01 【问题描述】：

我想做的就是找到任何给定字符串的情绪（正面/负面/中性）。在研究过程中，我遇到了斯坦福 NLP。但遗憾的是它在Java中。关于如何使它适用于 python 的任何想法？

【问题讨论】：

看起来 GitHub 上的 dasmith 为此写了一个不错的小包装器：github.com/dasmith/stanford-corenlp-python NLTK 包含斯坦福 NLP 的包装器，但我不确定它是否包含情绪分析。从 Python 调用外部实用程序 - 用 Java 或其他方式 - 并不难。 【参考方案1】：

使用`py-corenlp`

下载Stanford CoreNLP

此时（2020-05-25）最新版本为4.0.0：

wget https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar

如果你没有wget，你可能有curl：

curl https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip -O https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar -O

如果一切都失败了，请使用浏览器;-)

安装包

unzip stanford-corenlp-4.0.0.zip
mv stanford-corenlp-4.0.0-models-english.jar stanford-corenlp-4.0.0

启动server

cd stanford-corenlp-4.0.0
java -mx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000

注意事项：

timeout

--help

-mx5g

安装python包

标准包

pip install pycorenlp

不是否适用于 Python 3.9，所以您需要这样做

pip install git+https://github.com/sam-s/py-corenlp.git

（另见the official list）。

使用它

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')
res = nlp.annotate("I love you. I hate him. You are nice. He is dumb",
                   properties=
                       'annotators': 'sentiment',
                       'outputFormat': 'json',
                       'timeout': 1000,
                   )
for s in res["sentences"]:
    print("%d: '%s': %s %s" % (
        s["index"],
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

你会得到：

0: 'I love you .': 3 Positive
1: 'I hate him .': 1 Negative
2: 'You are nice .': 3 Positive
3: 'He is dumb': 1 Negative

注意事项

句子

整个文本

sentimentValue

Neutral

Negative

VeryNegative

VeryPositive

Ctrl-C

kill $(lsof -ti tcp:9000)

9000

-port

timeout

sentiment

一个

'annotators': 'sentiment,lemma'

附言。我不敢相信我添加了 9th 答案，但是，我想，我必须这样做，因为现有的答案都没有帮助我（之前的 8 个答案中的一些现在已被删除，其他一些已被删除）转换为 cmets)。

【讨论】：

感谢您的回答！我认为这是唯一有希望的。但我想知道有没有其他方法可以通过句子。假设我有一个大的 .txt 文件，其中包含超过 10,000 行并且每个句子的每一行。什么是适合我的使用方式？谢谢！如果你会发现你不能在一个blob中传递所有10k行，你可以任意拆分（注意你的句子“每句每行”不清楚）。 @user5779223：另一种选择是增加超时 - 请参阅编辑。在for s in res["sentences"]里面，有没有办法像nlp.stanford.edu:8080/sentiment/rntnDemo.html一样漂亮地打印出来？嗨，截至 2020 年，Stanford NLP 为 Stanza 提供了 Stanford CoreNLP 客户端。它被称为 Stanford CoreNLP Client，文档可以在这里找到：stanfordnlp.github.io/stanza/corenlp_client.html【参考方案2】：

现在他们有 STANZA。

https://stanfordnlp.github.io/stanza/

发布历史 请注意，在 1.0.0 版本之前，Stanza 库被命名为“StanfordNLP”。要安装 v1.0.0 之前的历史版本，您需要运行 pip install stanfordnlp。

因此，它确认了 Stanza 是 stanford NLP 的完整 python 版本。

【讨论】：

截至 2020 年，这是该问题的最佳答案，因为 Stanza 是原生 python，因此无需运行 Java 包。可通过 pip 或 conda 获得。【参考方案3】：

斯坦福 NLP 工具的原生 Python 实现

最近斯坦福发布了一个新的Python packaged，它为最重要的 NLP 任务实现了基于神经网络 (NN) 的算法：

标记化多字标记 (MWT) 扩展词形化词性 (POS) 和形态特征标记依赖解析

它是用 Python 实现的，并使用 PyTorch 作为 NN 库。该软件包包含超过50 languages 的精确模型。

要安装，您可以使用 PIP：

pip install stanfordnlp

要执行基本任务，您可以使用带有 many NLP algorithms 的原生 Python 接口：

import stanfordnlp

stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
doc.sentences[0].print_dependencies()

编辑：

到目前为止，库不支持情感分析，但我没有删除答案，因为它直接回答了问题的“Stanford nlp for python”部分。

【讨论】：

感谢您的帖子。我正在尝试做类似的事情（分析陈述的情绪）。阅读您的帖子后，我才知道用于 python 的 stanfordnlp 尚未支持情绪。【参考方案4】：

在这个问题上有一个非常新的进展：

现在你可以在 python 中使用stanfordnlp 包了：

来自README：

>>> import stanfordnlp
>>> stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()

【讨论】：

【参考方案5】：

我建议使用 TextBlob 库。示例实现如下所示：

from textblob import TextBlob
def sentiment(message):
    # create TextBlob object of passed tweet text
    analysis = TextBlob(message)
    # set sentiment
    return (analysis.sentiment.polarity)

【讨论】：

【参考方案6】：

使用 stanfordcore-nlp python 库

stanford-corenlp 是 stanfordcore-nlp 之上的一个非常好的包装器，可以在 python 中使用它。

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip

用法

# Simple usage
from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('/Users/name/stanford-corenlp-full-2018-10-05')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print('Tokenize:', nlp.word_tokenize(sentence))
print('Part of Speech:', nlp.pos_tag(sentence))
print('Named Entities:', nlp.ner(sentence))
print('Constituency Parsing:', nlp.parse(sentence))
print('Dependency Parsing:', nlp.dependency_parse(sentence))

nlp.close() # Do not forget to close! The backend server will consume a lot memory.

More info

【讨论】：

您能否解释一下如何使用 stanfordcorenlp 来分析语句的情绪？【参考方案7】：

我也面临同样的问题：@roopalgarg 指出，stanford_corenlp_py 可能是使用 Py4j 的解决方案。

stanford_corenlp_py

这个 repo 提供了一个 Python 接口，用于调用斯坦福大学 CoreNLP Java 包的“sentiment”和“entitymentions”注释器，最新版本为 3.5.1。它使用 py4j 与 JVM 交互；因此，为了运行像 scripts/runGateway.py 这样的脚本，您必须首先编译并运行创建 JVM 网关的 Java 类。

【讨论】：

【参考方案8】：

我也遇到过类似的情况。我的大部分项目都是用 Python 编写的，而情感部分是 Java。幸运的是，学习如何使用 stanford CoreNLP jar 非常容易。

这是我的脚本之一，您可以下载 jars 并运行它。

import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.ArrayCoreMap;
import edu.stanford.nlp.util.CoreMap;

public class Simple_NLP 
static StanfordCoreNLP pipeline;

    public static void init() 
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
        pipeline = new StanfordCoreNLP(props);
    

    public static String findSentiment(String tweet) 
        String SentiReturn = "";
        String[] SentiClass ="very negative", "negative", "neutral", "positive", "very positive";

        //Sentiment is an integer, ranging from 0 to 4. 
        //0 is very negative, 1 negative, 2 neutral, 3 positive and 4 very positive.
        int sentiment = 2;

        if (tweet != null && tweet.length() > 0) 
            Annotation annotation = pipeline.process(tweet);

            List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
            if (sentences != null && sentences.size() > 0) 

                ArrayCoreMap sentence = (ArrayCoreMap) sentences.get(0);                
                Tree tree = sentence.get(SentimentAnnotatedTree.class);  
                sentiment = RNNCoreAnnotations.getPredictedClass(tree);             
                SentiReturn = SentiClass[sentiment];
            
        
        return SentiReturn;

【讨论】：

【参考方案9】：

Textblob 是用Python 编写的用于情感分析的出色软件包。您可以拥有 docs here 。通过检查单词及其相应的情感分数（情感）来对任何给定句子进行情感分析。你可以从

$ pip install -U textblob
$ python -m textblob.download_corpora

自从您通过 -U will upgrade the pip package its latest available version 后，第一个 pip install 命令将为您提供安装在 (virtualenv) 系统中的最新版本的 textblob。接下来将下载所需的所有数据，corpus。

【讨论】：

我实际上尝试过使用 Textblob，但情绪得分非常低。因此我打算改用 stanford nlp 您是否尝试过其他答案中提到的wrapper？ “情感分析” (-:

以上是关于用于 python 的斯坦福 nlp的主要内容，如果未能解决你的问题，请参考以下文章

用于 python 的斯坦福 nlp

使用py-corenlp

下载Stanford CoreNLP

安装包

启动server

安装python包

使用它

注意事项

斯坦福 NLP 工具的原生 Python 实现

使用 stanfordcore-nlp python 库

用法

stanford_corenlp_py

使用`py-corenlp`