如何将元数据谷歌语音传递给文本 api - swift ios

Posted

技术标签:

【中文标题】如何将元数据谷歌语音传递给文本 api - swift ios【英文标题】:how to pass metadata google speech to text api - swift ios 【发布时间】:2020-04-13 19:35:52 【问题描述】:

任何人请帮我找到这个例子中使用的 pod 的官方文档:https://github.com/GoogleCloudPlatform/ios-docs-samples/tree/master/speech/Swift/Speech-gRPC-Streaming

此外,我正在开发一个 iOS 应用程序,在该应用程序中,我们使用流式方法将 Google 语音转换为文本,在示例中,您没有演示如何传递元数据,因此官方文档可能对如何在初始化时传递元数据有一些帮助,这里是我想要提供的完整配置:


        "encoding": "LINEAR16",
        "sampleRateHertz": 16000,
        "languageCode": "en-US",
        "maxAlternatives": 30,
        "metadata": 
            "interactionType": "VOICE_SEARCH",
            "recordingDeviceType": "SMARTPHONE",
            "microphoneDistance": "NEARFIELD",
            "originalMediaType": "AUDIO",
            "recordingDeviceName": "iPhone",
            "audioTopic": "Quran surah and ayah search"
        ,

        "speechContexts": [
            
                "phrases": ["mumtahinah"],
                "boost": 2
            ,
            
                "phrases": ["Hujrat"],
                "boost": 2
            ,
            
                "phrases": ["taubah"],
                "boost": 2
            ,
            
                "phrases": ["fajar"],
                "boost": 2
            
        ]
    

这是我当前的代码:

import Foundation
import googleapis

let API_KEY : String = "YOUR_API_KEY"
let HOST = "speech.googleapis.com"

typealias SpeechRecognitionCompletionHandler = (StreamingRecognizeResponse?, NSError?) -> (Void)

class SpeechRecognitionService 
  var sampleRate: Int = 16000
  private var streaming = false

  private var client : Speech!
  private var writer : GRXBufferedPipe!
  private var call : GRPCProtoCall!

  static let sharedInstance = SpeechRecognitionService()

  func streamAudioData(_ audioData: NSData, completion: @escaping SpeechRecognitionCompletionHandler) 
    if (!streaming) 
      // if we aren't already streaming, set up a gRPC connection
      client = Speech(host:HOST)
      writer = GRXBufferedPipe()
      call = client.rpcToStreamingRecognize(withRequestsWriter: writer,
                                            eventHandler:
         (done, response, error) in
                                              completion(response, error as? NSError)
      )
      // authenticate using an API key obtained from the Google Cloud Console
      call.requestHeaders.setObject(NSString(string:API_KEY),
                                    forKey:NSString(string:"X-Goog-Api-Key"))
      // if the API key has a bundle ID restriction, specify the bundle ID like this
      call.requestHeaders.setObject(NSString(string:Bundle.main.bundleIdentifier!),
                                    forKey:NSString(string:"X-Ios-Bundle-Identifier"))

      print("HEADERS:\(call.requestHeaders)")

      call.start()
      streaming = true

      // send an initial request message to configure the service
      let recognitionConfig = RecognitionConfig()
      recognitionConfig.encoding =  .linear16
      recognitionConfig.sampleRateHertz = Int32(sampleRate)
      recognitionConfig.languageCode = "en-US"
      recognitionConfig.maxAlternatives = 30
      recognitionConfig.enableWordTimeOffsets = true

      let streamingRecognitionConfig = StreamingRecognitionConfig()
      streamingRecognitionConfig.config = recognitionConfig
      streamingRecognitionConfig.singleUtterance = false
      streamingRecognitionConfig.interimResults = true

      let streamingRecognizeRequest = StreamingRecognizeRequest()
      streamingRecognizeRequest.streamingConfig = streamingRecognitionConfig

      writer.writeValue(streamingRecognizeRequest)
    

    // send a request message containing the audio data
    let streamingRecognizeRequest = StreamingRecognizeRequest()
    streamingRecognizeRequest.audioContent = audioData as Data
    writer.writeValue(streamingRecognizeRequest)
  

  func stopStreaming() 
    if (!streaming) 
      return
    
    writer.finishWithError(nil)
    streaming = false
  

  func isStreaming() -> Bool 
    return streaming
  


【问题讨论】:

【参考方案1】:

谷歌语音转文本并不简单。在使用 cocoapods 时需要稍作调整,因为我们需要添加 google 依赖项,您可以从 githubLink 中下载这些依赖项。 此外,对于完整的教程,请阅读本文,其中:

part-1 显示集成:STT-iOS integration part 1 和 第 2 部分解释了如何使用 元数据 训练 google 云以更好地识别单词:STT cloud training。

对于您的问题,获取所有元数据并将其添加到字典中。我们就叫它mySpeechContext吧。

您需要将此上下文传递给RecognitonConfigspeechContextArray 属性。

在你的代码中添加这一行:

 recognitionConfig.speechContextsArray = NSMutableArray(array: [mySpeechContext])

这会将您的整个语音上下文发送到谷歌语音云并使用您的密钥,每当您使用语音转文本服务时,它都会查看数据以更好地训练和识别,从而对元数据/单词有更大的提升和信心.

【讨论】:

以上是关于如何将元数据谷歌语音传递给文本 api - swift ios的主要内容,如果未能解决你的问题,请参考以下文章

如何在已部署的 WPF 应用程序中使用谷歌语音文本 api 密钥?

利用 AWS Serverless 组件构建语音合成服务

scala:将元组引用传递给函数

C# 中的 Google 语音转文本 API

如何在网络上使用谷歌语音到文本

Swift 2.0 - 将元组结构传递给函数