Azure 认知服务 - 使用 python 和 websockets 自定义语音

Posted 2023-02-16

技术标签:

【中文标题】Azure 认知服务 - 使用 python 和 websockets 自定义语音【英文标题】：Azure Cognitive Services - Custom Speech with python and websockets 【发布时间】：2017-10-25 16:06:11 【问题描述】：

我在 Python 中使用 Microsoft 自定义语音服务。目前仅适用于 HTTP 端点。根据文档，也支持 websockets。

有没有人有通过 websockets 发送数据的例子？到目前为止，我已经开始使用我的令牌打开 websocket 服务。但是当我开始发送数据时，连接关闭并出现错误 104。

详情： - Python3 - websocket客户端 - 带有 RIFF 标头的 wav（适用于 HTTP）

谢谢！

代码示例：

# pip install websocket-client
def main_websocket_cris():
    path_root = os.path.abspath(os.path.dirname(__file__))
    filename = os.path.join(path_root, 'example_011.wav')
    chunk_size = 8192

    key = '<mykey>'
    url = 'wss://<mydeployment>.api.cris.ai/ws/cris/speech/recognize'
    token = auth_cris(key)

    header = ['Authorization: Bearer %s' % token]
    ws = websocket.create_connection(url, header=header)
    try:
        print('--- send ping')
        ws.ping()
        print('> ping done')

        print('--- send pong')
        ws.pong(b'')
        print('> pong done')

        print('--- status and headers')
        print('> status:',  ws.getstatus())
        print('> headers:', ws.getheaders())
        print('> status done')

        headers = ['Path: audio',
                   'X-RequestId: %s' % str(uuid.uuid4()).replace('-', ''),
                   'X-Timestamp: %s' % datetime.datetime.now().isoformat(),
                   'Content-Type: audio/x-wav']
        headers = 
            'Path':         'audio',
            'X-RequestId':  str(uuid.uuid4()).replace('-', ''),
            'X-Timestamp':  str(datetime.datetime.now().isoformat()),
            'Content-Type': 'audio/x-wav'
        
        print(headers)
        #ws.send(json.dumps(headers))

        print('--- send binary data')
        print('> read file in chunks of %s bytes' % chunk_size)
        with open(filename, 'rb') as f:
            while True:
                chunk = f.read(chunk_size)
                if not chunk:
                    break
                ws.send(json.dumps(headers))
                ws.send_binary(chunk)
        print("> Sent")
        print('--- now receive answer')
        print("> Receiving...")
        result = ws.recv()
        print("> Received '%s'" % result)
    finally:
        print('--- close')
        ws.close()
        print('> closed')

【问题讨论】：

您好，请您发布有关您的代码和错误日志的更多详细信息，以便我为您提供帮助吗？已添加代码示例... 你有没有让这个工作？我正在尝试并且似乎遇到了非常相似的问题。我会很感激你的手 【参考方案1】：

我建议使用语音协议端点（例如，“wss://YOUR_DEPLOYMENT.api.cris.ai/speech/recognition/interactive/cognitiveservices/v1”）。

这些 endoints 使用的协议记录在这里：Microsoft Speech WebSocket Protocol。您可以找到协议 here 的 javascript 实现。

【讨论】：

以上是关于Azure 认知服务 - 使用 python 和 websockets 自定义语音的主要内容，如果未能解决你的问题，请参考以下文章