Swift SFSpeechRecognizer 附加现有的 UITextView 内容

Posted

技术标签:

【中文标题】Swift SFSpeechRecognizer 附加现有的 UITextView 内容【英文标题】:Swift SFSpeechRecognizer appending existing UITextView content 【发布时间】:2018-11-21 15:37:46 【问题描述】:

我在我的应用程序中使用了 SFSpeechRecognizer,由于有一个专用按钮(开始语音识别),它可以很好地帮助最终用户在 UITextView 中输入评论。

但是,如果用户先手动输入一些文本,然后启动语音识别,则之前手动输入的文本将被删除。如果用户在同一个 UITextView 上执行两次语音识别(用户正在“语音”录制其文本的第一部分,然后停止录制,最后重新开始录制),也会出现这种情况,之前的文本将被删除。

因此,我想知道如何将 SFSpeechRecognizer 识别的文本附加到现有文本。

这是我的代码:

func recordAndRecognizeSpeech()

    if recognitionTask != nil 
        recognitionTask?.cancel()
        recognitionTask = nil
    
    let audiosession = AVAudioSession.sharedInstance()
    do 
        try audioSession.setCategory(AVAudioSessionCategoryRecord)
        try audioSession.setMode(AVAudioSessionModeMeasurement)
        try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
     catch 
        print("audioSession properties weren't set because of an error.")
    
    self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    guard let inputNode = audioEngine.inputNode else 
        fatalError("Audio engine has no input node")
    
    let recognitionRequest = self.recognitionRequest
    recognitionRequest.shouldReportPartialResults = true

    recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler:  (result, error) in
        var isFinal = false
        self.decaration.text = (result?.bestTranscription.formattedString)!

        isFinal = (result?.isFinal)!
        let bottom = NSMakeRange(self.decaration.text.characters.count - 1, 1)
        self.decaration.scrollRangeToVisible(bottom)

        if error != nil || isFinal 
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)
            self.recognitionTask = nil
            self.recognitionRequest.endAudio()
            self.oBtSpeech.isEnabled = true
        
    )
    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat)  (buffer, when) in
        self.recognitionRequest.append(buffer)
    
    audioEngine.prepare()

    do 
        try audioEngine.start()
     catch 
        print("audioEngine couldn't start because of an error.")
    


我尝试更新

self.decaration.text = (result?.bestTranscription.formattedString)!

通过

self.decaration.text += (result?.bestTranscription.formattedString)!

但它为每个识别的句子制作一个双倍。

知道我该怎么做吗?

【问题讨论】:

【参考方案1】:

在启动识别系统之前尝试保存文本。

func recordAndRecognizeSpeech()
    // one change here
    let defaultText = self.decaration.text

    if recognitionTask != nil 
        recognitionTask?.cancel()
        recognitionTask = nil
    
    let audioSession = AVAudioSession.sharedInstance()
    do 
        try audioSession.setCategory(AVAudioSessionCategoryRecord)
        try audioSession.setMode(AVAudioSessionModeMeasurement)
        try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
     catch 
        print("audioSession properties weren't set because of an error.")
    
    self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    guard let inputNode = audioEngine.inputNode else 
        fatalError("Audio engine has no input node")
    
    let recognitionRequest = self.recognitionRequest
    recognitionRequest.shouldReportPartialResults = true

    recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler:  (result, error) in
        var isFinal = false
        // one change here
        self.decaration.text = defaultText + " " + (result?.bestTranscription.formattedString)!

        isFinal = (result?.isFinal)!
        let bottom = NSMakeRange(self.decaration.text.characters.count - 1, 1)
        self.decaration.scrollRangeToVisible(bottom)

        if error != nil || isFinal 
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)
            self.recognitionTask = nil
            self.recognitionRequest.endAudio()
            self.oBtSpeech.isEnabled = true
        
    )
    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat)  (buffer, when) in
        self.recognitionRequest.append(buffer)
    
    audioEngine.prepare()

    do 
        try audioEngine.start()
     catch 
        print("audioEngine couldn't start because of an error.")
    

result?.bestTranscription.formattedString 返回被识别的整个短语,这就是为什么每次收到来自SFSpeechRecognnizer 的响应时都应该重置self.decaration.text

【讨论】:

以上是关于Swift SFSpeechRecognizer 附加现有的 UITextView 内容的主要内容,如果未能解决你的问题,请参考以下文章

Swift - AVAudioPlayer 无法正常工作

使用 SFSpeechRecognizer 的单个口语字母?

SFSpeechRecognizer - 检测话语结束

iOS 应用程序上的 SFSpeechRecognizer(Siri 转录)超时错误

在 AVSpeechUtterance 之后使用 SFSpeechRecognizer 时出现 AVAudioSession 问题

从 SFSpeechRecognizer 获取语音幅度