使用扩展音频文件服务将两个文件混合在一起

Posted 2023-02-23

技术标签:

【中文标题】使用扩展音频文件服务将两个文件混合在一起【英文标题】：Mixing down two files together using Extended Audio File Services 【发布时间】：2010-11-11 21:55:42 【问题描述】：

我正在使用音频单元进行一些自定义音频后处理。我有两个文件要合并在一起（下面的链接），但是在输出中出现了一些奇怪的噪音。我做错了什么？

我已经验证，在这一步之前，2 个文件（workTrack1 和workTrack2）处于正常状态并且听起来不错。在此过程中也没有遇到任何错误。

缓冲区处理代码：

- (BOOL)mixBuffersWithBuffer1:(const int16_t *)buffer1 buffer2:(const int16_t *)buffer2 outBuffer:(int16_t *)mixbuffer outBufferNumSamples:(int)mixbufferNumSamples 
    BOOL clipping = NO;

    for (int i = 0 ; i < mixbufferNumSamples; i++) 
        int32_t s1 = buffer1[i];
        int32_t s2 = buffer2[i];
        int32_t mixed = s1 + s2;

        if ((mixed < -32768) || (mixed > 32767)) 
            clipping = YES; // don't break here because we dont want to lose data, only to warn the user
        

        mixbuffer[i] = (int16_t) mixed;
    
    return clipping;

混音代码：

////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////      PHASE 4      ////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// In phase 4, open workTrack1 and workTrack2 for reading,
// mix together, and write out to outfile.

// open the outfile for writing -- this will erase the infile if they are the same, but its ok cause we are done with it
err = [self openExtAudioFileForWriting:outPath audioFileRefPtr:&outputAudioFileRef numChannels:numChannels];
if (err)  [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; 

// setup vars
framesRead = 0;
totalFrames = [self totalFrames:mixAudioFile1Ref]; // the long one.
NSLog(@"Mix-down phase, %d frames (%0.2f secs)", totalFrames, totalFrames / RECORD_SAMPLES_PER_SECOND);

moreToProcess = YES;
while (moreToProcess) 

    conversionBuffer1.mBuffers[0].mDataByteSize = LOOPER_BUFFER_SIZE;
    conversionBuffer2.mBuffers[0].mDataByteSize = LOOPER_BUFFER_SIZE;

    UInt32 frameCount1 = framesInBuffer;
    UInt32 frameCount2 = framesInBuffer;

    // Read a buffer of input samples up to AND INCLUDING totalFrames
    int numFramesRemaining = totalFrames - framesRead; // Todo see if we are off by 1 here.  Might have to add 1
    if (numFramesRemaining == 0) 
        moreToProcess = NO; // If no frames are to be read, then this phase is finished

     else 
        if (numFramesRemaining < frameCount1)  // see if we are near the end
            frameCount1 = numFramesRemaining;
            frameCount2 = numFramesRemaining;
            conversionBuffer1.mBuffers[0].mDataByteSize = (frameCount1 * bytesPerFrame);
            conversionBuffer2.mBuffers[0].mDataByteSize = (frameCount2 * bytesPerFrame);
        

        NSbugLog(@"Attempting to read %d frames from mixAudioFile1Ref", (int)frameCount1);
        err = ExtAudioFileRead(mixAudioFile1Ref, &frameCount1, &conversionBuffer1);
        if (err)  [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; 

        NSLog(@"Attempting to read %d frames from mixAudioFile2Ref", (int)frameCount2);
        err = ExtAudioFileRead(mixAudioFile2Ref, &frameCount2, &conversionBuffer2);
        if (err)  [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; 

        NSLog(@"Read %d frames from mixAudioFile1Ref in mix-down phase", (int)frameCount1);
        NSLog(@"Read %d frames from mixAudioFile2Ref in mix-down phase", (int)frameCount2);

        // If no frames were returned, phase is finished
        if (frameCount1 == 0) 
            moreToProcess = NO;

         else  // Process pcm data

            // if buffer2 was not filled, fill with zeros
            if (frameCount2 < frameCount1) 
                bzero(inBuffer2 + frameCount2, (frameCount1 - frameCount2));
                frameCount2 = frameCount1;
            

            const int numSamples = (frameCount1 * bytesPerFrame) / sizeof(int16_t);

            if ([self mixBuffersWithBuffer1:(const int16_t *)inBuffer1
                                    buffer2:(const int16_t *)inBuffer2
                                  outBuffer:(int16_t *)outBuffer
                        outBufferNumSamples:numSamples]) 
                NSLog(@"Clipping");
            
            // Write pcm data to the main output file
            conversionOutBuffer.mBuffers[0].mDataByteSize = (frameCount1 * bytesPerFrame);
            err = ExtAudioFileWrite(outputAudioFileRef, frameCount1, &conversionOutBuffer);

            framesRead += frameCount1;
         // frame count
     // else

    if (err) 
        moreToProcess = NO;
    
 // while moreToProcess

// Check for errors
TTDASSERT(framesRead == totalFrames);
if (err) 
    if (error) *error = [NSError errorWithDomain:kUAAudioselfCrossFaderErrorDomain
                                            code:UAAudioSelfCrossFaderErrorTypeMixDown
                                        userInfo:[NSDictionary dictionaryWithObjectsAndKeys:[NSNumber numberWithInt:err],@"Underlying Error Code",[self commonExtAudioResultCode:err],@"Underlying Error Name",nil]];
    [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err];
    return NO;

NSLog(@"Done with mix-down phase");

假设

mixAudioFile1Ref 总是比 mixAudioFile2Ref 长 mixAudioFile2Ref 用完字节后，outputAudioFileRef 听起来应该与mixAudioFile2Ref 完全相同

预期的声音应该在开始时混合淡入和淡出，以在曲目循环时产生自交叉淡入淡出。请听输出，看看代码，让我知道哪里出错了。

源音：http://cl.ly/2g2F2A3k1r3S36210V23产生的音：http://cl.ly/3q2w3S3Y0x0M3i2a1W3v

【问题讨论】：

【参考方案1】：

原来这里有两个问题。

缓冲区处理代码

int32_t mixed = s1 + s2; 导致剪辑。更好的方法是除以混合的通道数：int32_t mixed = (s1 + s2)/2;，然后在另一遍中归一化。

帧数 != 字节 当声音用完时将第二个轨道的缓冲区归零时，我错误地将偏移量和持续时间设置为帧而不是字节。这会在缓冲区中产生垃圾并产生您定期听到的噪音。易于修复：

if (frameCount2 < frameCount1) 
    bzero(inBuffer2 + (frameCount2 * bytesPerFrame), (frameCount1 - frameCount2) * bytesPerFrame);
    frameCount2 = frameCount1;

现在样本很棒：http://cl.ly/1E2q1L441s2b3e2X2z0J

【讨论】：

我正在尝试做同样的事情 bt 将无法做到这一点你能帮帮我吗，，【参考方案2】：

您发布的答案看起来不错；我只能看到一个小问题。您的削波解决方案除以 2 会有所帮助，但它也相当于应用 50% 的增益减少。也就是不和归一化一样； normalization 是查看整个音频文件、找到最高峰值并应用给定增益降低以使该峰值达到某个水平（通常为 0.0dB）的过程。结果是在正常（即非削波）情况下，输出信号会非常低，需要再次提升。

在混音期间，您无疑遇到了导致失真的溢出，因为该值会回绕并导致信号跳跃。你想要做的是应用一种称为“brick-wall limiter”的技术，它基本上对正在裁剪的样本应用了一个硬上限。最简单的方法是：

int32_t mixed = s1 + s2;
if(mixed >= 32767) 
  mixed = 32767;

else if(mixed <= -32767) 
  mixed = -32767;

这种技术的结果是，您会在被削波的样本周围听到一些失真，但声音不会像整数溢出那样被完全破坏。失真虽然存在，但不会破坏聆听体验。

【讨论】：

我知道我所做的不是归一化，我在另一遍中应用归一化，方法是将每个样本按最大峰值达到 -0.5db 的比例放大。重读之前的答案，很模棱两可。已编辑。根据实现的不同，砖墙限制器与剪辑相同。这取决于如何处理溢出。无论哪种情况，听起来都会很糟糕。在添加样本之前，我会先减少 50% 的增益，然后再进行归一化。

以上是关于使用扩展音频文件服务将两个文件混合在一起的主要内容，如果未能解决你的问题，请参考以下文章