使用 AVAssetReader 读取 mp4 文件时第一个音频 CMSampleBuffer 丢失
Posted
技术标签:
【中文标题】使用 AVAssetReader 读取 mp4 文件时第一个音频 CMSampleBuffer 丢失【英文标题】:First audio CMSampleBuffer lost when reading mp4 file using AVAssetReader 【发布时间】:2020-02-21 23:04:36 【问题描述】:我正在使用 AVAssetWriter 将音频 CMSampleBuffer 写入 mp4 文件,但是当我后来使用 AVAssetReader 读取该文件时,它似乎丢失了初始数据块。
这是第一个 CMSampleBuffer 传递给写入器输入附加方法的调试描述(注意 1024/44_100 的启动持续时间附件):
CMSampleBuffer 0x102ea5b60 retainCount: 7 allocator: 0x1c061f840
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
TrimDurationAtStart =
epoch = 0;
flags = 1;
timescale = 44100;
value = 1024;
formatDescription = <CMAudioFormatDescription 0x281fd9720 [0x1c061f840]>
mediaType:'soun'
mediaSubType:'aac '
mediaSpecific:
ASBD:
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x2
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0
cookie: <CFData 0x2805f50a0 [0x1c061f840]>length = 39, capacity = 39, bytes = 0x03808080220000000480808014401400 ... 1210068080800102
ACL: (null)
FormatList Array:
Index: 0
ChannelLayoutTag: 0x650002
ASBD:
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x0
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0
extensions: (null)
sbufToTrackReadiness = 0x0
numSamples = 1
outputPTS = 6683542167/44100 = 151554.244, rounded(based on cachedOutputPresentationTimeStamp)
sampleTimingArray[1] =
PTS = 6683541143/44100 = 151554.221, rounded, DTS = 6683541143/44100 = 151554.221, rounded, duration = 1024/44100 = 0.023,
sampleSizeArray[1] =
sampleSize = 163,
dataBuffer = 0x281cc7a80
这是第二个 CMSampleBuffer 的调试说明(注意 1088/44_100 的启动持续时间附件,它与之前的修剪持续时间相结合产生标准值 2112):
CMSampleBuffer 0x102e584f0 retainCount: 7 allocator: 0x1c061f840
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
TrimDurationAtStart =
epoch = 0;
flags = 1;
timescale = 44100;
value = 1088;
formatDescription = <CMAudioFormatDescription 0x281fd9720 [0x1c061f840]>
mediaType:'soun'
mediaSubType:'aac '
mediaSpecific:
ASBD:
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x2
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0
cookie: <CFData 0x2805f50a0 [0x1c061f840]>length = 39, capacity = 39, bytes = 0x03808080220000000480808014401400 ... 1210068080800102
ACL: (null)
FormatList Array:
Index: 0
ChannelLayoutTag: 0x650002
ASBD:
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x0
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0
extensions: (null)
sbufToTrackReadiness = 0x0
numSamples = 1
outputPTS = 6683543255/44100 = 151554.269, rounded(based on cachedOutputPresentationTimeStamp)
sampleTimingArray[1] =
PTS = 6683542167/44100 = 151554.244, rounded, DTS = 6683542167/44100 = 151554.244, rounded, duration = 1024/44100 = 0.023,
sampleSizeArray[1] =
sampleSize = 179,
dataBuffer = 0x281cc4750
现在,当我使用 AVAssetReader 读取音轨时,我得到的第一个 CMSampleBuffer 是:
CMSampleBuffer 0x102ed7b20 retainCount: 7 allocator: 0x1c061f840
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
EmptyMedia(P) = true
formatDescription = (null)
sbufToTrackReadiness = 0x0
numSamples = 0
outputPTS = 0/1 = 0.000(based on outputPresentationTimeStamp)
sampleTimingArray[1] =
PTS = 0/1 = 0.000, DTS = INVALID, duration = 0/1 = 0.000,
dataBuffer = 0x0
下一个是包含 1088/44_100 的启动信息:
CMSampleBuffer 0x10318bc00 retainCount: 7 allocator: 0x1c061f840
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
FillDiscontinuitiesWithSilence(P) = true
GradualDecoderRefresh(P) = 1
TrimDurationAtStart(P) =
epoch = 0;
flags = 1;
timescale = 44100;
value = 1088;
IsGradualDecoderRefreshAuthoritative(P) = false
formatDescription = <CMAudioFormatDescription 0x281fdcaa0 [0x1c061f840]>
mediaType:'soun'
mediaSubType:'aac '
mediaSpecific:
ASBD:
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x0
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0
cookie: <CFData 0x2805f3800 [0x1c061f840]>length = 39, capacity = 39, bytes = 0x03808080220000000480808014401400 ... 1210068080800102
ACL: Stereo (L R)
FormatList Array:
Index: 0
ChannelLayoutTag: 0x650002
ASBD:
mSampleRate: 44100.000000
mFormatID: 'aac '
mFormatFlags: 0x0
mBytesPerPacket: 0
mFramesPerPacket: 1024
mBytesPerFrame: 0
mChannelsPerFrame: 2
mBitsPerChannel: 0
extensions:
VerbatimISOSampleEntry = length = 87, bytes = 0x00000057 6d703461 00000000 00000001 ... 12100680 80800102 ;
sbufToTrackReadiness = 0x0
numSamples = 43
outputPTS = 83/600 = 0.138(based on outputPresentationTimeStamp)
sampleTimingArray[1] =
PTS = 1024/44100 = 0.023, DTS = 1024/44100 = 0.023, duration = 1024/44100 = 0.023,
sampleSizeArray[43] =
sampleSize = 179,
sampleSize = 173,
sampleSize = 178,
sampleSize = 172,
sampleSize = 172,
sampleSize = 159,
sampleSize = 180,
sampleSize = 200,
sampleSize = 187,
sampleSize = 189,
sampleSize = 206,
sampleSize = 192,
sampleSize = 195,
sampleSize = 186,
sampleSize = 183,
sampleSize = 189,
sampleSize = 211,
sampleSize = 198,
sampleSize = 204,
sampleSize = 211,
sampleSize = 204,
sampleSize = 202,
sampleSize = 218,
sampleSize = 210,
sampleSize = 206,
sampleSize = 207,
sampleSize = 221,
sampleSize = 219,
sampleSize = 236,
sampleSize = 219,
sampleSize = 227,
sampleSize = 225,
sampleSize = 225,
sampleSize = 229,
sampleSize = 225,
sampleSize = 236,
sampleSize = 233,
sampleSize = 231,
sampleSize = 249,
sampleSize = 234,
sampleSize = 250,
sampleSize = 249,
sampleSize = 259,
dataBuffer = 0x281cde370
输入追加方法不断返回true
,这原则上意味着所有样本缓冲区都已追加,但读取器由于某种原因跳过了第一块数据。我在这里做错了什么吗?
我正在使用以下代码来读取文件:
let asset = AVAsset(url: fileURL)
guard let assetReader = try? AVAssetReader(asset: asset) else
return
asset.loadValuesAsynchronously(forKeys: ["tracks"]) in
guard let audioTrack = asset.tracks(withMediaType: .audio).first else return
let audioOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: nil)
assetReader.startReading()
while assetReader.status == .reading
if let sampleBuffer = audioOutput.copyNextSampleBuffer()
// do something
【问题讨论】:
【参考方案1】:首先有些迂腐:您并没有丢失第一个样本缓冲区,而是丢失了第一个样本缓冲区中的第一个数据包。
在 ios 13 和 macOS 10.15 (Catalina) 上,AVAssetReader
和 nil
outputSettings
在读取 AAC 数据包数据时的行为发生了变化。
之前您将获得第一个 AAC 数据包、该数据包的呈现时间戳(零)和一个修剪附件,指示您丢弃通常的前 2112 帧解码音频。
现在 [iOS 13, macOS 10.15] AVAssetReader
似乎丢弃了第一个数据包,留下了第二个数据包,其呈现时间戳为 1024,您只需丢弃解码帧的 2112 - 1024 = 1088
。
在上述情况下可能不会立即显而易见的是AVAssetReader
是在谈论两条时间线,而不是一条。数据包时间戳被引用到一个,即未修剪的时间线,修剪指令意味着存在另一个:未修剪的时间线。
从未修剪时间戳到修剪时间戳的转换非常简单,通常为trimmed = untrimmed - 2112
。
那么新行为是错误吗?如果您解码为 LPCM 并正确遵循修剪说明,那么您仍然应该获得相同的音频,这让我相信更改是故意的(注意:我还没有亲自确认 LPCM 样本是相同的)。
但是,文档说:
outputSettings 的值 nil 将输出配置为以指定轨道存储的原始格式提供样本。
我不认为你可以同时丢弃数据包[即使是第一个,这基本上是一个常数]并声称以“原始格式”出售样本,所以从这个角度来看,我认为这种变化有类似错误的质量。
我也认为这是一个不幸的变化,因为我曾经认为 nil
outputSettings
AVAssetReader
是一种“原始”模式,但现在它假设您唯一的用例是解码为 LPCM。
只有一件事可以将“不幸”降级为“严重错误”,那就是如果这种新的“让我们假设第一个 AAC 数据包不存在”的方法扩展到使用 AVAssetWriter
创建的文件,因为这会破坏互操作性使用非AVAssetReader
代码,其中要修剪的帧数已凝结为恒定的 2112 帧。我也没有亲自证实这一点。您是否有使用上述示例缓冲区创建的文件可以共享?
附言我认为您的输入样本缓冲区与此处无关,我认为您会丢失从任何 AAC 文件读取的第一个数据包。但是,您的输入样本缓冲区似乎有点不寻常,因为它们具有主机时间 [捕获会话?] 样式时间戳,但是是 AAC,并且每个样本缓冲区只有一个数据包,这不是很多,而且 23 毫秒的开销似乎很大声音的。您是在AVCaptureSession
-> AVAudioConverter
链中自己创建它们吗?
【讨论】:
我刚刚注意到您的回复,非常感谢您花时间为我解释和详细说明。我不是音频转换和处理方面的专家,所以我的术语有时肯定会被关闭。不过,我确实找到了解决问题的方法,并在这篇博文中分享了我的发现:medium.com/fandom-engineering/… 如果您有任何反馈,请告诉我 :) 另外,关于博客文章——我并不是说这是解决这个问题的正确方法,但它确实让我的问题消失了,所以...... 我认为您的博文和解决方案是正确的!我不认为我提供了答案,但我确实说了很多。你的问题是“我在这里做错什么了吗”?我认为答案是“不”。以上是关于使用 AVAssetReader 读取 mp4 文件时第一个音频 CMSampleBuffer 丢失的主要内容,如果未能解决你的问题,请参考以下文章
如何使用 AVAssetReader 在 iOS 上正确读取解码的 PCM 样本——目前解码不正确