MF SinkWriter mp4 文件的播放持续时间是添加音频样本时的一半时间，图像的播放速度也快两倍

Posted 2023-02-16

技术标签:

【中文标题】MF SinkWriter mp4 文件的播放持续时间是添加音频样本时的一半时间，图像的播放速度也快两倍【英文标题】：Playback duration from MF SinkWriter mp4 file is half the time when adding an audio sample also the playback speed of the images is twice as fast 【发布时间】：2017-10-12 13:43:28 【问题描述】：

我为我的 c# 项目创建了一个托管 c++ 库，用于根据 MSDN 教程 SinkWriter 将图像和音频编码到 mp4 容器中。为了测试结果是否正常，我创建了一个提供 600 帧的方法。这些帧代表一个 10 秒的视频，每秒 60 帧。

我提供的图像每秒都在变化，我的音频文件包含一个数到 10 的声音。

我面临的问题是输出视频实际上只有 5 秒长。视频的元数据显示它是 10 秒，但不是。声音也勉强数到 5。

如果我只编写没有音频部分的图像样本，则视频的持续时间是预期的 10 秒。

我在这里错过了什么？

这是我的应用程序的一些部分。

这是我用来创建 600 帧的 c# 部分，然后我也在 c# 部分中调用 PushFrame 方法。

var videoFrameCount = 10 * FPS;
SetBinaryImage();

for (int i = 0; i <= videoFrameCount; i++)

    // New picture every second
    if (i > 0 &&  i % FPS == 0)
    
        SetBinaryImage();
    

    PushFrame();

PushFrame 方法将图像和音频数据复制到 SinkWriter 提供的指针。然后调用 SinkWriter 的 PushFrame 方法。

private void PushFrame()

    try
    
        encodeStopwatch.Reset();
        encodeStopwatch.Start();

        // Video
        var frameBufferHandler = GCHandle.Alloc(frameBuffer, GCHandleType.Pinned);
        frameBufferPtr = frameBufferHandler.AddrOfPinnedObject();
        CopyImageDataToPointer(BinaryImage, ScreenWidth, ScreenHeight, frameBufferPtr);

        // Audio
        var audioBufferHandler = GCHandle.Alloc(audioBuffer, GCHandleType.Pinned);
        audioBufferPtr = audioBufferHandler.AddrOfPinnedObject();
        var readLength = audioBuffer.Length;

        if (BinaryAudio.Length - (audioOffset + audioBuffer.Length) < 0)
        
            readLength = BinaryAudio.Length - audioOffset;
        

        if (!EndOfFile)
        
            Marshal.Copy(BinaryAudio, audioOffset, (IntPtr)audioBufferPtr, readLength);
            audioOffset += audioBuffer.Length;

        

        if (readLength < audioBuffer.Length && !EndOfFile)
        
            EndOfFile = true;
        

        unsafe
        
            // Copy video data
            var yuv = SinkWriter.VideoCapturerBuffer();
            SinkWriter.Encode((byte*)frameBufferPtr, ScreenWidth, ScreenHeight, (int)SWPF.SWPF_RGB, yuv);

            // Copy audio data
            var audioDestPtr = SinkWriter.AudioCapturerBuffer();
            SinkWriter.EncodeAudio((byte*)audioBufferPtr, audioDestPtr);

            SinkWriter.PushFrame();
        

        encodeStopwatch.Stop();
        Console.WriteLine($"YUV frame generated in: encodeStopwatch.TakeTotalMilliseconds() ms");
    
    catch (Exception ex)

这是我在 C++ 中添加到 SinkWriter 的一些部分。我猜音频部分的 MediaTypes 没问题，因为音频的播放工作正常。

rtStart 和 rtDuration 定义如下：

LONGLONG rtStart = 0;
UINT64 rtDuration;
MFFrameRateToAverageTimePerFrame(fps, 1, &rtDuration);

编码器的两个缓冲区是这样使用的

int SinkWriter::Encode(Byte * rgbBuf, int w, int h, int pxFormat, Byte * yufBuf)

    const LONG cbWidth = 4 * VIDEO_WIDTH;
    const DWORD cbBuffer = cbWidth * VIDEO_HEIGHT;

    // Create a new memory buffer.
    HRESULT hr = MFCreateMemoryBuffer(cbBuffer, &pFrameBuffer);

    // Lock the buffer and copy the video frame to the buffer.
    if (SUCCEEDED(hr))
    
        hr = pFrameBuffer->Lock(&yufBuf, NULL, NULL);
    

    if (SUCCEEDED(hr))
    
        // Calculate the stride
        DWORD bitsPerPixel = GetBitsPerPixel(pxFormat);
        DWORD bytesPerPixel = bitsPerPixel / 8;
        DWORD stride = w * bytesPerPixel;

        // Copy image in yuv pointer
        hr = MFCopyImage(
            yufBuf,                      // Destination buffer.
            stride,                    // Destination stride.
            rgbBuf,     // First row in source image.
            stride,                    // Source stride.
            stride,                    // Image width in bytes.
            h                // Image height in pixels.
        );
    

    if (pFrameBuffer)
    
        pFrameBuffer->Unlock();
    

    // Set the data length of the buffer.
    if (SUCCEEDED(hr))
    
        hr = pFrameBuffer->SetCurrentLength(cbBuffer);
    

    if (SUCCEEDED(hr))
    
        return 0;
    
    else
    
        return -1;
    

    return 0;


int SinkWriter::EncodeAudio(Byte * src, Byte * dest)

    DWORD samplePerSecond = AUDIO_SAMPLES_PER_SECOND * AUDIO_BITS_PER_SAMPLE * AUDIO_NUM_CHANNELS;
    DWORD cbBuffer = samplePerSecond / 1000;

    // Create a new memory buffer.
    HRESULT hr = MFCreateMemoryBuffer(cbBuffer, &pAudioBuffer);

    // Lock the buffer and copy the video frame to the buffer.
    if (SUCCEEDED(hr))
    
        hr = pAudioBuffer->Lock(&dest, NULL, NULL);
    

    CopyMemory(dest, src, cbBuffer);

    if (pAudioBuffer)
    
        pAudioBuffer->Unlock();
    

    // Set the data length of the buffer.
    if (SUCCEEDED(hr))
    
        hr = pAudioBuffer->SetCurrentLength(cbBuffer);
    

    if (SUCCEEDED(hr))
    
        return 0;
    
    else
    
        return -1;
    

    return 0;

这是 SinkWriter 的 PushFrame 方法，将 SinkWriter、streamIndex、audioIndex、rtStart 和 rtDuration 传递给 WriteFrame 方法。

int SinkWriter::PushFrame()

    if (initialized)
    
        HRESULT hr = WriteFrame(ptrSinkWriter, stream, audio, rtStart, rtDuration);
        if (FAILED(hr))
        
            return -1;
        

        rtStart += rtDuration;

        return 0;
    

    return -1;

这里是结合视频和音频样本的 WriteFrame 方法。

HRESULT SinkWriter::WriteFrame(IMFSinkWriter *pWriter, DWORD streamIndex, DWORD audiostreamIndex, const LONGLONG& rtStart, const LONGLONG& rtDuration)

    IMFSample *pVideoSample = NULL;

    // Create a media sample and add the buffer to the sample.
    HRESULT hr = MFCreateSample(&pVideoSample);

    if (SUCCEEDED(hr))
    
        hr = pVideoSample->AddBuffer(pFrameBuffer);
    
    if (SUCCEEDED(hr))
    
        pVideoSample->SetUINT32(MFSampleExtension_Discontinuity, FALSE);
    
    // Set the time stamp and the duration.
    if (SUCCEEDED(hr))
    
        hr = pVideoSample->SetSampleTime(rtStart);
    
    if (SUCCEEDED(hr))
    
        hr = pVideoSample->SetSampleDuration(rtDuration);
    

    // Send the sample to the Sink Writer.
    if (SUCCEEDED(hr))
    
        hr = pWriter->WriteSample(streamIndex, pVideoSample);
    

    // Audio
    IMFSample *pAudioSample = NULL;

    if (SUCCEEDED(hr))
    
        hr = MFCreateSample(&pAudioSample);
    

    if (SUCCEEDED(hr))
    
        hr = pAudioSample->AddBuffer(pAudioBuffer);
    

    // Set the time stamp and the duration.
    if (SUCCEEDED(hr))
    
        hr = pAudioSample->SetSampleTime(rtStart);
    
    if (SUCCEEDED(hr))
    
        hr = pAudioSample->SetSampleDuration(rtDuration);
    
    // Send the sample to the Sink Writer.
    if (SUCCEEDED(hr))
    
        hr = pWriter->WriteSample(audioStreamIndex, pAudioSample);
    


    SafeRelease(&pVideoSample);
    SafeRelease(&pFrameBuffer);
    SafeRelease(&pAudioSample);
    SafeRelease(&pAudioBuffer);
    return hr;

【问题讨论】：

pVideoSample->SetSampleTime 参数可能是您想要的一半。至少，您应该使用调试器检查并排除这种情况。 SetSampleTime 应该没问题，因为当我删除 AudioSample 时，视频的持续时间和时间是有效的。但我会尽量加倍时间。编辑：我尝试将其翻倍，但随后视频的创建将不再起作用。在严重帧之后，我需要很长时间才能生成剩余的帧。 【参考方案1】：

问题是音频缓冲区大小的计算错误。这是正确的计算：

var avgBytesPerSecond = sampleRate * 2 * channels;
var avgBytesPerMillisecond = avgBytesPerSecond / 1000;
var bufferSize = avgBytesPerMillisecond * (1000 / 60);
audioBuffer = new byte[bufferSize];

在我的问题中，我的缓冲区大小为一毫秒。所以看起来 MF 框架加速了图像，所以音频听起来不错。在我修复缓冲区大小后，视频的持续时间完全符合我的预期，并且声音也没有错误。

【讨论】：

以上是关于MF SinkWriter mp4 文件的播放持续时间是添加音频样本时的一半时间，图像的播放速度也快两倍的主要内容，如果未能解决你的问题，请参考以下文章