如何将谷歌云自然语言实体情感响应转换为 Python 中的 JSON/dict？

Posted 2023-03-24

技术标签:

【中文标题】如何将谷歌云自然语言实体情感响应转换为 Python 中的 JSON/dict？【英文标题】：how to convert google cloud natural language entity sentiment response to JSON/dict in Python? 【发布时间】：2020-10-21 19:31:26 【问题描述】：

我正在尝试使用谷歌云自然语言 API 来分析实体情绪。

from google.cloud import language_v1
import os 
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/json'

client = language_v1.LanguageServiceClient()
text_content = 'Grapes are good. Bananas are bad.'

# Available types: PLAIN_TEXT, html
type_ = language_v1.Document.Type.PLAIN_TEXT

# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
document = language_v1.Document(content=text_content, type_=language_v1.Document.Type.PLAIN_TEXT)

# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = language_v1.EncodingType.UTF8
response = client.analyze_entity_sentiment(request = 'document': document, 'encoding_type': encoding_type)

然后我从响应中打印出实体及其属性。

for entity in response.entities:
    print('=' * 20)
    print(type(entity))
    print(entity)

====================
<class 'google.cloud.language_v1.types.language_service.Entity'>
name: "Grapes"
type_: OTHER
salience: 0.8335162997245789
mentions 
  text 
    content: "Grapes"
  
  type_: COMMON
  sentiment 
    magnitude: 0.8999999761581421
    score: 0.8999999761581421
  

sentiment 
  magnitude: 0.8999999761581421
  score: 0.8999999761581421


====================
<class 'google.cloud.language_v1.types.language_service.Entity'>
name: "Bananas"
type_: OTHER
salience: 0.16648370027542114
mentions 
  text 
    content: "Bananas"
    begin_offset: 17
  
  type_: COMMON
  sentiment 
    magnitude: 0.8999999761581421
    score: -0.8999999761581421
  

sentiment 
  magnitude: 0.8999999761581421
  score: -0.8999999761581421

现在我想以 JSON 或字典格式存储整个响应，以便将其存储到数据库中的表中或进行处理。我尝试关注converting Google Cloud NLP API entity sentiment output to JSON 和How can I JSON serialize an object from google's natural language API? (No __dict__ attribute)，但没有成功。

如果我使用

from google.protobuf.json_format import MessageToDict, MessageToJson 
result_dict = MessageToDict(response)
result_json = MessageToJson(response)

我收到一个错误提示

>>> result_dict = MessageToDict(response)
Traceback (most recent call last):
  File "/Users/pmehta/Anaconda-3/anaconda3/envs/nlp_36/lib/python3.6/site-packages/proto/message.py", line 555, in __getattr__
    pb_type = self._meta.fields[key].pb_type
KeyError: 'DESCRIPTOR'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pmehta/Anaconda-3/anaconda3/envs/nlp_36/lib/python3.6/site-packages/google/protobuf/json_format.py", line 175, in MessageToDict
    return printer._MessageToJsonObject(message)
  File "/Users/pmehta/Anaconda-3/anaconda3/envs/nlp_36/lib/python3.6/site-packages/google/protobuf/json_format.py", line 209, in _MessageToJsonObject
    message_descriptor = message.DESCRIPTOR
  File "/Users/pmehta/Anaconda-3/anaconda3/envs/nlp_36/lib/python3.6/site-packages/proto/message.py", line 560, in __getattr__
    raise AttributeError(str(ex))
AttributeError: 'DESCRIPTOR'

如何解析此响应以将其正确转换为 json 或 dict？

【问题讨论】：

您有哪些版本的 protobuf 和 google-cloud-language？ pip freeze | grep -e google-cloud-language -e protobuf google-cloud-language==2.0.0 和 protobuf==3.13.0 【参考方案1】：

作为google-cloud-language2.0.0 migration 的一部分，响应消息由proto-plus 提供，它封装了原始protobuf 消息。 ParseDict 和 MessageToDict 是 protobuf 提供的方法，由于 proto-plus 包装了 proto 消息，这些 protobuf 方法不能再直接使用。

替换

from google.protobuf.json_format import MessageToDict, MessageToJson 
result_dict = MessageToDict(response)
result_json = MessageToJson(response)

与

import json
result_json = response.__class__.to_json(response)
result_dict = json.loads(result_json)
result_dict

【讨论】：

这并没有真正起作用并产生错误，但这对我有用：response.query_result.__dict__【参考方案2】：

tl;dr 公认的解决方案不是苹果对苹果的替代品。为了恢复原始行为，您需要执行以下操作：

from google.protobuf.json_format import MessageToDict
result_dict = MessageToDict(response.__class__.pb(response))

在我自己经历过这个之后，我想指出 to_json 和 MessageToDict 有重大变化。参数including_default_value_fields 和use_integers_for_enums 默认为False 对应MessageToDict，现在它们默认为True 对应to_json。

在此处了解更多信息：https://github.com/googleapis/proto-plus-python/blob/5c14cbaf21e3864a247e0183480903e7640e5460/proto/message.py#L372

这里参考to_json的官方实现：

def to_json(cls, instance, *, use_integers_for_enums=True) -> str:
    """Given a message instance, serialize it to json

    Args:
        instance: An instance of this message type, or something
            compatible (accepted by the type's constructor).
        use_integers_for_enums (Optional(bool)): An option that determines whether enum
            values should be represented by strings (False) or integers (True).
            Default is True.

    Returns:
        str: The json string representation of the protocol buffer.
    """
    return MessageToJson(
        cls.pb(instance),
        use_integers_for_enums=use_integers_for_enums,
        including_default_value_fields=True,
    )

【讨论】：

以上是关于如何将谷歌云自然语言实体情感响应转换为 Python 中的 JSON/dict？的主要内容，如果未能解决你的问题，请参考以下文章