确定音频处理中的延迟

Posted 2023-02-25

技术标签:

【中文标题】确定音频处理中的延迟【英文标题】：Determining Latency in Audio Processing 【发布时间】：2012-07-27 18:23:18 【问题描述】：

我一直致力于在现有音乐软件项目中实现实时音频捕获和分析系统。该系统的目标是在用户按下录音按钮时（或在指定的预备时间后）开始捕获音频，确定用户唱歌或演奏的音符，并在乐谱上标记这些音符。我的方法的要点是使用一个线程捕获音频数据块并将它们放入队列中，并使用另一个线程从队列中删除数据并执行分析。这个方案运作良好，但我无法量化音频捕获开始和 MIDI 支持乐器播放之间的延迟。音频捕获在 MIDI 乐器开始播放之前开始，用户可能会将他或她的演奏与 MIDI 乐器同步。因此，我需要忽略在支持 MIDI 乐器开始播放之前捕获的音频数据，只分析在那之后收集的音频数据。伴奏曲目的播放由已经存在了很长时间并由其他人维护的代码体处理，因此我希望尽可能避免重构整个程序。音频捕获由 Timer 对象和扩展 TimerTask 的类控制，其实例在称为 Notate 的笨重（约 25k 行）类中创建。顺便说一句，Notate 还保留了处理背景音轨播放的对象的标签。 Timer 的 .scheduleAtFixedRate() 方法用于控制音频捕获的周期，TimerTask 通过调用队列（ArrayBlockingQueue）上的 .notify() 通知捕获线程开始。我计算这两个进程初始化之间时间间隔的策略是从回放开始时的时间戳中减去捕获开始前的时间戳（以毫秒为单位），我将其定义为 .start () 方法在负责 MIDI 支持轨道的 Java Sequencer 对象上调用。然后，我使用结果来确定我希望在此间隔 (n) 期间捕获的音频样本数，并忽略捕获的音频数据数组中的前 n * 2 个字节（n * 2，因为我正在捕获 16-位样本，而数据存储为字节数组……每个样本 2 个字节）。但是，这种方法并没有给我准确的结果。计算出的偏移量总是比我预期的要小，因此在指定位置开始分析后，音频数据中仍然存在非平凡（不幸地变化）数量的“空白”空间。这会导致程序尝试分析在用户尚未开始与伴奏 MIDI 乐器一起演奏时收集的音频数据，从而有效地在用户乐段的乞求处添加休止符（没有音符）并破坏节奏值为所有后续注释计算。下面是我的音频捕获线程的代码，它还确定了捕获的音频数据数组的延迟和相应的位置偏移。谁能提供有关为什么我的确定延迟的方法无法正常工作的见解？

public class CaptureThread extends Thread

    public void run()
    
        //number of bytes to capture before putting data in the queue.
    //determined via the sample rate, tempo, and # of "beats" in 1 "measure"
        int bytesToCapture = (int) ((SAMPLE_RATE * 2.) / (score.getTempo()
                / score.getMetre()[0] / 60.));
    //temporary buffer - will be added to ByteArrayOutputStream upon filling.
        byte tempBuffer[] = new byte[target.getBufferSize() / 5];

        int limit = (int) (bytesToCapture / tempBuffer.length);

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream(bytesToCapture);
        int bytesRead;

        try
         //Loop until stopCapture is set.
            while (!stopCapture)
             //first, wait for notification from TimerTask
                synchronized (thisCapture)
                
                    thisCapture.wait();
                

                if (!processingStarted)
                 //the time at which audio capture begins
                    startTime = System.currentTimeMillis();
                

                //start the TargetDataLine, from which audio data is read
                target.start();

                //collect 1 captureInterval's worth of data
                for (int n = 0; n < limit; n++)
                
                    bytesRead = target.read(tempBuffer, 0, tempBuffer.length);
                    if (bytesRead > 0)
                       //Append data to output stream.
                        outputStream.write(tempBuffer, 0, bytesRead);
                    
                

                if (!processingStarted)
                
                    long difference = (midiSynth.getPlaybackStartTime()
                            + score.getCountInTime() * 1000 - startTime);

                    positionOffset = (int) ((difference / 1000.)
                            * SAMPLE_RATE * 2.);

                    if (positionOffset % 2 != 0)
                     //1 sample = 2 bytes, so positionOffset must be even
                        positionOffset += 1;
                    
                
                if (outputStream.size() > 0)
                   //package data collected in the output stream into a byte array
                    byte[] capturedAudioData = outputStream.toByteArray();
                    //add captured data to the queue for processing
                    processingQueue.add(capturedAudioData);

                    synchronized (processingQueue)
                    
                        try
                         //notify the analysis thread that data is in the queue
                            processingQueue.notify();
                         catch (Exception e)
                        
                            //handle the error
                        
                    

                    outputStream.reset(); //reset the output stream
                
            
         catch (Exception e)
        
            //handle error

我正在研究使用Mixer 对象来同步接受来自麦克风的数据的TargetDataLine 和处理来自 MIDI 乐器播放的 Line。现在要找到处理播放的线路...有什么想法吗？

【问题讨论】：

您提到计时器在播放线程开始时开始。当音频输出开始时，这可能是 not 吗？在音频播放实际发出声音之前是否有某种延迟？您能否在播放计时器开始时吐出控制台消息以确保它是实际产生声音的时间？ @Gray 我担心是这种情况。我得到的时间似乎是准确的，但我不确定它们是否对应于发出声音的确切时刻（当然在 1 毫秒内）。有没有实用的测试方法？再一次，你能不能吐出一个控制台消息，然后观察和听，以确保消息和声音同时出现？是的，消息似乎与声音同时显示。 Java 音频延迟不小，而且因平台而异。您需要将设备时钟用作主时钟而不是系统时钟。您需要提出诸如“播放了多少音频样本？”之类的问题。不是“我什么时候开始向设备推送音频？”。但是，我不记得设备时间是否在 Java 中可用。 【参考方案1】：

Google 有一个您可能熟悉的名为 AudioBufferSize 的优秀开源应用程序。我修改了这个应用程序的单向延迟测试——也就是说，用户按下按钮和音频 API 播放声音之间的时间。这是我添加到 AudioBufferSize 以实现此目的的代码。您能否使用这种方法来提供事件与用户感知事件之间的时间差？

final Button latencyButton = (Button) findViewById(R.id.latencyButton);
latencyButton.setOnClickListener(new OnClickListener() 
    public void onClick(View v) 
        mLatencyStartTime = getCurrentTime();
        latencyButton.setEnabled(false);

        // Do the latency calculation, play a 440 hz sound for 250 msec
        AudioTrack sound = generateTone(440, 250);              
        sound.setNotificationMarkerPosition(count /2); // Listen for the end of the sample

        sound.setPlaybackPositionUpdateListener(new OnPlaybackPositionUpdateListener() 
            public void onPeriodicNotification(AudioTrack sound)  
            public void onMarkerReached(AudioTrack sound) 
                // The sound has finished playing, so record the time
                mLatencyStopTime = getCurrentTime();
                diff = mLatencyStopTime - mLatencyStartTime;
                // Update the latency result
                TextView lat = (TextView)findViewById(R.id.latency);
                lat.setText(diff + " ms");
                latencyButton.setEnabled(true);
                logUI("Latency test result= " + diff + " ms");
            
        );
        sound.play();
    
);

有一个对 generateTone 的引用，看起来像这样：

private AudioTrack generateTone(double freqHz, int durationMs) 
    int count = (int)(44100.0 * 2.0 * (durationMs / 1000.0)) & ~1;
    short[] samples = new short[count];
    for(int i = 0; i < count; i += 2)
        short sample = (short)(Math.sin(2 * Math.PI * i / (44100.0 / freqHz)) * 0x7FFF);
        samples[i + 0] = sample;
        samples[i + 1] = sample;
    
    AudioTrack track = new AudioTrack(AudioManager.STREAM_MUSIC, 44100,
    AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT,
    count * (Short.SIZE / 8), AudioTrack.MODE_STATIC);
    track.write(samples, 0, count);
    return track;

刚刚意识到，这个问题已经存在多年了。对不起，也许有人会觉得它有用。

【讨论】：

以上是关于确定音频处理中的延迟的主要内容，如果未能解决你的问题，请参考以下文章

WebRTC 一对一语音通话中音频端到端分段延迟分析

如何把视频中的音频处理掉

如何为 HTML5 视频设置音频延迟（不同步）

matlab中的实时音频

iOS中的音频信号处理[关闭]

Android的双簧管库中是否支持音频处理？