在 Xamarin Forms App 中尝试 Speech-To-Text 后，Text-To-Speech 播放的音量非常低

Posted 2023-04-13

技术标签:

【中文标题】在 Xamarin Forms App 中尝试 Speech-To-Text 后，Text-To-Speech 播放的音量非常低【英文标题】：Text-To-Speech playback has very low volume after trying Speech-To-Text in Xamarin Forms App 【发布时间】：2021-10-26 01:29:49 【问题描述】：

免责声明：我是 c# 和 Xamarin.Forms 的新手 - 很抱歉遗漏了任何明显的内容。

我正在尝试创建一个应用程序，它以语音命令的形式（使用 Speech-To-Text）接收用户输入并从应用程序输出音频通知（使用 Text-To-Speech）。

问题在于，当您开始为 Speech-To-Text 服务录制音频时，设备的音频设置为录制模式（不确定技术术语是什么）并且播放音频设置为非常低音量（如SO question 和here 中所述）和here。

理想情况下，我正在寻找一种方法来恢复它，以便一旦通过 Speech-To-Text 识别出适当的语音命令（即“秘密命令”），用户就可以完整地听到秘密短语/通过 Xamarin Forms 应用程序中的 Text-To-Speech 的正常音量。

我尝试通过修改Azure Cognitive Speech Service 的示例代码来生成一个工作示例。我克隆了代码并稍微调整了 MainPage 的 Xaml 和 CS，如下所示，以在触发某个语音命令后停止语音识别服务，然后通过 Text-To-Speech 服务激活要说出的短语。我的示例演示了这个问题。如果用户首先选择“转录”按钮并输入适当的语音命令，他们应该会听到密语，但在物理 ios 设备上进行测试时播放音量太低，我几乎听不见。

XAML

<ContentPage xmlns="http://xamarin.com/schemas/2014/forms"
             xmlns:x="http://schemas.microsoft.com/winfx/2009/xaml"
             xmlns:d="http://xamarin.com/schemas/2014/forms/design"
             xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
             mc:Ignorable="d"
             x:Class="CognitiveSpeechService.MyPage"
             Title="Speech Services Transcription"
             Padding="10,35,10,10">

    <StackLayout>
        <Frame BorderColor="DarkGray"
               CornerRadius="10"
               HeightRequest="300"
               WidthRequest="280"
               HorizontalOptions="Center"
               VerticalOptions="Start"
               BackgroundColor="LightGray">
            <ScrollView x:Name="scroll">
                <Label x:Name="transcribedText"
                       Margin="10,10,10,10" />
            </ScrollView>
        </Frame>

        <ActivityIndicator x:Name="transcribingIndicator"
                           HorizontalOptions="Center"
                           VerticalOptions="Start"
                           WidthRequest="300"
                           IsRunning="False" />
        <Button x:Name="transcribeButton"
                WidthRequest="300"
                HeightRequest="50"
                Text="Transcribe"
                TextColor="White"
                CornerRadius="10"
                BackgroundColor="Green"
                BorderColor="DarkGray"
                BorderWidth="1"
                FontAttributes="Bold"
                HorizontalOptions="Center"
                VerticalOptions="Start"
                Clicked="TranscribeClicked"/>

        <Button x:Name="SpeakBtn"
                WidthRequest="300"
                HeightRequest="50"
                Text="Speak"
                TextColor="White"
                CornerRadius="10"
                BackgroundColor="Red"
                BorderColor="DarkGray"
                BorderWidth="1"
                FontAttributes="Bold"
                HorizontalOptions="Center"
                VerticalOptions="Start"
                Clicked="SpeakBtn_Clicked"/>

    </StackLayout>

</ContentPage>

代码隐藏

namespace CognitiveSpeechService

    public partial class MyPage : ContentPage
    

        AudioRecorderService recorder = new AudioRecorderService();

        SpeechRecognizer recognizer;
        IMicrophoneService micService;
        bool isTranscribing = false;

        public MyPage()
        
            InitializeComponent();

            micService = DependencyService.Resolve<IMicrophoneService>();
        

        async void TranscribeClicked(object sender, EventArgs e)
        
            bool isMicEnabled = await micService.GetPermissionAsync();

            // EARLY OUT: make sure mic is accessible
            if (!isMicEnabled)
            
                UpdateTranscription("Please grant access to the microphone!");
                return;
            

            // initialize speech recognizer 
            if (recognizer == null)
            
                var config = SpeechConfig.FromSubscription(Constants.CognitiveServicesApiKey, Constants.CognitiveServicesRegion);
                recognizer = new SpeechRecognizer(config);
                recognizer.Recognized += (obj, args) =>
                
                    UpdateTranscription(args.Result.Text);
                ;
            

            // if already transcribing, stop speech recognizer
            if (isTranscribing)
            
                StopSpeechRecognition();
            

            // if not transcribing, start speech recognizer
            else
            
                Device.BeginInvokeOnMainThread(() =>
                
                    InsertDateTimeRecord();
                );
                try
                
                    await recognizer.StartContinuousRecognitionAsync();
                
                catch (Exception ex)
                
                    UpdateTranscription(ex.Message);
                
                isTranscribing = true;
            
            UpdateDisplayState();
        

        // https://***.com/questions/56514413/volume-has-dropped-significantly-in-text-to-speech-since-adding-speech-to-text
        private async void StopSpeechRecognition()
        
            if (recognizer != null)
            
                try
                
                    await recognizer.StopContinuousRecognitionAsync();
                    Console.WriteLine($"IsRecording: recorder.IsRecording");
                
                catch (Exception ex)
                
                    UpdateTranscription(ex.Message);
                
                isTranscribing = false;
                UpdateDisplayState();
            
        

        void UpdateTranscription(string newText)
        
            Device.BeginInvokeOnMainThread(() =>
            
                if (!string.IsNullOrWhiteSpace(newText))
                

                    if (newText.ToLower().Contains("Secret command"))
                    
                        Console.WriteLine("heart rate voice command detected");

                        // stop speech recognition
                        StopSpeechRecognition();

                        // do callout
                        string success = "this works!";

                        var settings = new SpeechOptions()
                        
                            Volume = 1.0f,
                        ;

                        TextToSpeech.SpeakAsync(success, settings);

                        // start speech recongition 


                     else
                    
                        transcribedText.Text += $"newText\n";
                    
                
            );
        

        void InsertDateTimeRecord()
        
            var msg = $"=================\nDateTime.Now.ToString()\n=================";
            UpdateTranscription(msg);
        

        void UpdateDisplayState()
        
            Device.BeginInvokeOnMainThread(() =>
            
                if (isTranscribing)
                
                    transcribeButton.Text = "Stop";
                    transcribeButton.BackgroundColor = Color.Red;
                    transcribingIndicator.IsRunning = true;
                
                else
                
                    transcribeButton.Text = "Transcribe";
                    transcribeButton.BackgroundColor = Color.Green;
                    transcribingIndicator.IsRunning = false;
                
            );
        

        async void SpeakBtn_Clicked(object sender, EventArgs e)
        
            await TextToSpeech.SpeakAsync("Sample audio line. Blah blah blah. ");

感谢您的帮助！

【问题讨论】：

AudioRecorderService的代码是什么？如果对您方便的话，能否请您在github或onedriver上发布一个基本演示，以便我们进行测试？ @JessieZhang-MSFT AudioRecorderService 来自我用来诊断问题的插件。请无视。我创建了一个测试仓库来更好地展示这个问题 [这里] (github.com/TketEZ/xamarin-forms-samples)。希望这更清楚。 【参考方案1】：

找到了一个可行的解决方案。将它发布在下面，以供它可以帮助和未来我的任何人使用。

我注意到这个问题只发生在 IOS 而不是 android 上，它与启用 STT 时设置的 AVAudioSession 类别有关。据我所知，一旦启用 STT，音频闪避功能就会针对任何非 STT 相关的音频打开。

您可以通过使用AVAudioSession Xamarin.IOS API 以编程方式设置正确的类别来解决此问题。

要使其在 Xamarin.Forms 项目中正常工作，您需要使用 Dependency Service 在共享项目代码中执行 Xamarin.IOS 代码。

我已经在下面列出了对我有用的代码的相关部分。

可以在上面 cmets 中提到的solution branch of the Github repo 中找到完整的工作示例。

主页（STT 和 TTS 服务发生的地方）

    public partial class MainPage : ContentPage
    
        IAudioSessionService audioService;

        public MainPage()
        
            InitializeComponent();

            micService = DependencyService.Resolve<IMicrophoneService>();

            if (Device.RuntimePlatform == Device.iOS)
            
                audioService = DependencyService.Resolve<IAudioSessionService>();
            
        

        public void SpeechToText()
        
            // wherever STT is required, call this first to set the right audio category
            audioService?.ActivateAudioRecordingSession();
        

        public void TextToSpeech()
        
            // wherever TTS is required, let the OS know that you're playing audio so TTS interrupts instead of ducking. 
            audioService?.ActivateAudioPlaybackSession();

            await TextToSpeech.SpeakAsync(TextForTextToSpeechAfterSpeechToText, settings);

            // set audio session back to recording mode ready for STT
            audioService?.ActivateAudioRecordingSession();

IAudioSessionService

// this interface should be in your shared project 
namespace CognitiveSpeechService.Services

    public interface IAudioSessionService
    
        void ActivateAudioPlaybackSession();
        void ActivateAudioRecordingSession();

project.Android/AndroidAudioSessionService

using System;
using CognitiveSpeechService.Services;
using Xamarin.Forms;

[assembly: Dependency(typeof(CognitiveSpeechService.Droid.Services.AndroidAudioSessionService))]
namespace CognitiveSpeechService.Droid.Services

    public class AndroidAudioSessionService : IAudioSessionService
    
        public void ActivateAudioPlaybackSession()
        
            // do nothing as not required on Android
        

        public void ActivateAudioRecordingSession()
        
            // do nothing as not required on Android

Project.iOS/IOSAudioSessionService

using System;
using AVFoundation;
using CognitiveSpeechService.Services;
using Foundation;
using Xamarin.Forms;

[assembly: Dependency(typeof(CognitiveSpeechService.iOS.Services.IOSAudioSessionService))]
namespace CognitiveSpeechService.iOS.Services

    public class IOSAudioSessionService : IAudioSessionService
    
        public void ActivateAudioPlaybackSession()
        
            var session = AVAudioSession.SharedInstance();
            session.SetCategory(AVAudioSessionCategory.Playback, AVAudioSessionCategoryOptions.DuckOthers);
            session.SetMode(AVAudioSession.ModeSpokenAudio, out NSError error);
            session.SetActive(true);
        

        public void ActivateAudioRecordingSession()
        
            try
            
                new System.Threading.Thread(new System.Threading.ThreadStart(() =>
                
                    var session = AVAudioSession.SharedInstance();
                    session.SetCategory(AVAudioSessionCategory.Record);
                    session.SetActive(true);
                )).Start();
            
            catch (Exception ex)
            
                Console.WriteLine(ex.Message);

【讨论】：

【参考方案2】：

ProgrammingPractice 很好地了解 iOS 设置更改！我有同样的问题。应该标注解决方案。

【讨论】：

以上是关于在 Xamarin Forms App 中尝试 Speech-To-Text 后，Text-To-Speech 播放的音量非常低的主要内容，如果未能解决你的问题，请参考以下文章