Watson Speech to Text API 中不返回说话者标签/分类

Posted

技术标签:

【中文标题】Watson Speech to Text API 中不返回说话者标签/分类【英文标题】:Speaker label/ diarization does not return in Watson Speech to Text API 【发布时间】:2020-04-08 01:36:23 【问题描述】:

我正在尝试通过 IBM watson Speech to text api 获取演讲者标签。 在我的最终输出中,我希望它显示整个音频的成绩单、置信度和扬声器标签。我的代码如下:

import json
from os.path import join, dirname
from ibm_watson import SpeechToTextV1
from ibm_watson.websocket import RecognizeCallback, Audiosource
import threading
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
import pandas as pd
authenticator = IAMAuthenticator('rXXXYYZZ')
service = SpeechToTextV1(authenticator=authenticator)
service.set_service_url('https://api.us-east.speech-to-text.watson.cloud.ibm.com')

models = service.list_models().get_result()
#print(json.dumps(models, indent=2))

model = service.get_model('en-US_BroadbandModel').get_result()
#print(json.dumps(model, indent=2))

with open(join(dirname('__file__'), 'testvoicejen.wav'),
          'rb') as audio_file:
#    print(json.dumps(
    output = service.recognize(
    audio=audio_file,
    speaker_labels=True,
    content_type='audio/wav',
    #timestamps=True,
    #word_confidence=True,
    model='en-US_NarrowbandModel',
    continuous=True).get_result(),
    indent=2
df = pd.DataFrame([i for elts in output for alts in elts['results'] for i in alts['alternatives']])

然而,df的输出是:

df
Out[22]: 
                                          timestamps  ...                                         transcript
0  [[thank, 3.88, 4.04], [you, 4.04, 4.13], [for,...  ...  thank you for calling my name is Britney and h...
1  [[thank, 30.21, 30.56], [you, 30.56, 30.74], [...  ...  thank you %HESITATION and then %HESITATION you..

如您所见,我确实成功获得了成绩单,但是,我获得的不是说话者分类或标签,而是时间戳。扬声器标签如下所示:

from": 0.68,
      "to": 1.19,
      "speaker": 2

我如何得到这个?

【问题讨论】:

【参考方案1】:

当您打开speaker_labels 时,您会自动获得timestamps。如果您查看服务文档中的示例输出 - https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-output#speaker_labels

您会看到演讲者标签部分与替代品/结果部分是分开的。您的代码仅解析结果/替代部分。要获得扬声器标签,您需要类似 -

df = pd.DataFrame([i for elts in output for i in elts['speaker_labels']])

【讨论】:

以上是关于Watson Speech to Text API 中不返回说话者标签/分类的主要内容,如果未能解决你的问题,请参考以下文章

Watson Speech to Text 无法对数据流音频/wav 进行转码

将 WAV 录制到 IBM Watson Speech-To-Text

IBM Watson JavaScript SDK for Speech-To-Text WebSocket 问题

如何提高 Watson Speech to Text 的准确性?

IBM Watson Speech To Text:无法使用 Swift SDK 转录文本

使用 IAM API 密钥的 IBM Watson 语音转文本 WebSocket 授权