使用来自 Electron 的 gRPC 实时转录 Google Cloud Speech API

Posted 2023-03-22

技术标签:

【中文标题】使用来自 Electron 的 gRPC 实时转录 Google Cloud Speech API【英文标题】：Real-time transcription Google Cloud Speech API with gRPC from Electron 【发布时间】：2018-04-13 22:58:30 【问题描述】：

我想要实现的是与 Web Speech API 相同的实时转录过程，但使用的是 Google Cloud Speech API。

主要目标是使用 gRPC 协议通过带有 Speech API 的 Electron 应用转录现场录音。

这是我实现的简化版本：

const  desktopCapturer  = window.require('electron');
const speech = require('@google-cloud/speech');

const client = speech.v1(
  projectId: 'my_project_id',
  credentials: 
    client_email: 'my_client_email',
    private_key: 'my_private_key',
  ,
);

desktopCapturer.getSources( types: ['window', 'screen'] , (error, sources) => 
  navigator.mediaDevices
    .getUserMedia(
      audio: true,
    )
    .then((stream) => 
      let fileReader = new FileReader();
      let arrayBuffer;
      fileReader.onloadend = () => 
        arrayBuffer = fileReader.result;
        let speechStreaming = client
          .streamingRecognize(
            config: 
              encoding: speech.v1.types.RecognitionConfig.AudioEncoding.LINEAR16,
              languageCode: 'en-US',
              sampleRateHertz: 44100,
            ,
            singleUtterance: true,
          )
          .on('data', (response) => response);

        speechStreaming.write(arrayBuffer);
      ;

      fileReader.readAsArrayBuffer(stream);
    );
);

Speech API 的错误响应是音频流太慢，我们没有实时发送。

感觉是因为我传递的流没有任何格式化或对象初始化，所以无法进行流识别。

【问题讨论】：

你有没有设法用电子来解决这个问题？我有同样的任务。我也在寻找 Electron 中的实时转录。我没有回答您如何在 Electron 中使用 Google Cloud Speech 的具体问题，但我想我会提到一个替代方案：为“otter.ai”创建一个 iframe（每月免费 600 分钟的转录服务），让用户登录（使用 open-auth 登录，因此速度非常快），然后将自定义代码插入 iframe（webview 预加载），让您在需要时启动转录并检索其中转录的文本。不寻常的方法，但 Otter 的转录非常好，每个用户每月 600 分钟的免费时间也不错。 【参考方案1】：

Github 上的这个官方示例项目似乎符合您的要求：https://github.com/googleapis/nodejs-speech/blob/master/samples/infiniteStreaming.js

此应用程序演示了如何通过 Google Cloud Speech API 使用 streamingRecognize 操作来执行无限流式传输。

另请参阅my comment，了解 Electron 中的替代方案，使用 OtterAI 的转录服务。（这是我即将尝试的方法）

【讨论】：

【参考方案2】：

您可以使用node-record-lpcm16 模块录制音频并直接通过管道传输到 Google 等语音识别系统。

在存储库中，有一个使用 wit.ai 的example。

对于谷歌语音识别，你可以使用类似的东西：

'use strict'
const  SpeechClient  = require('@google-cloud/speech')
const recorder = require('node-record-lpcm16')

const RECORD_CONFIG = 
  sampleRate: 44100,
  recorder: 'arecord'


const RECOGNITION_CONFIG = 
  config: 
    sampleRateHertz: 44100,
    language: 'en-US',
    encoding: 'LINEAR16'
  ,
  interimResults: true


const client = new SpeechClient(/* YOUR CREDENTIALS */)

const recognize = () => 
  client
    .streamingRecognize(RECOGNITION_CONFIG)
    .on('error', err => 
      console.error('Error during recognition: ', err)
    )
    .once('writing', data => 
      console.log('Recognition started!')
    
    .on('data', data => 
      console.log('Received recognition data: ', data)
    


const recording = recorder.record(RECORD_CONFIG)
recording
  .stream()
  .on('error', err => 
     console.error('Error during recognition: ', err)
  .pipe(recognize)

【讨论】：

以上是关于使用来自 Electron 的 gRPC 实时转录 Google Cloud Speech API的主要内容，如果未能解决你的问题，请参考以下文章