AAC音频编码 相关的原理和设置
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了AAC音频编码 相关的原理和设置相关的知识,希望对你有一定的参考价值。
参考技术A AAC(Advanced Audio Coding),中文名:高级 音频 编码 ,出现于1997年,基于 MPEG-2 的音频编码技术。由Fraunhofer IIS、 杜比实验室 、 AT&T 、 Sony 等公司共同开发,目的是取代 MP3 格式。2000年, MPEG-4 标准出现后,AAC重新集成了其特性,加入了SBR技术和PS技术,为了区别于传统的MPEG-2 AAC又称为MPEG-4 AAC。ios平台支持AAC编码器,主要使用AudioToolbox中的AudioConverter API。之所以做AAC编码器是因为在做一个HLS的功能,HLS要求的TS文件,需要视频采用H264编码,音频采用AAC编码。H264可以使用硬件或软件编码器,前面已经介绍。AAC也可以使用硬件或者软件编码,iOS全都支持。
首先需要创建一个Converter,也就是一个AAC Encoder,使用如下接口:
extern OSStatus
AudioConverterNew( const AudioStreamBasicDescription* inSourceFormat,
const AudioStreamBasicDescription* inDestinationFormat,
AudioConverterRef* outAudioConverter) __OSX_AVAILABLE_STARTING(__MAC_10_1,__IPHONE_2_0);
输入参数分别是源和目的的数据格式。
在AAC编码的场景下,源格式就是采集到的PCM数据,目的格式就是AAC。
AudioStreamBasicDescription inAudioStreamBasicDescription;
// FillOutASBDForLPCM()
inAudioStreamBasicDescription.mFormatID = kAudioFormatLinearPCM;
inAudioStreamBasicDescription.mSampleRate = 44100;
inAudioStreamBasicDescription.mBitsPerChannel = 16;
inAudioStreamBasicDescription.mFramesPerPacket = 1;
inAudioStreamBasicDescription.mBytesPerFrame = 2;
inAudioStreamBasicDescription.mBytesPerPacket = inAudioStreamBasicDescription.mBytesPerFrame * inAudioStreamBasicDescription.mFramesPerPacket;
inAudioStreamBasicDescription.mChannelsPerFrame = 1;
inAudioStreamBasicDescription.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsNonInterleaved;
inAudioStreamBasicDescription.mReserved = 0;
AudioStreamBasicDescription outAudioStreamBasicDescription = 0; // Always initialize the fields of a new audio stream basic description structure to zero, as shown here: ...
outAudioStreamBasicDescription.mChannelsPerFrame = 1;
outAudioStreamBasicDescription.mFormatID = kAudioFormatMPEG4AAC;
UInt32 size = sizeof(outAudioStreamBasicDescription);
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &outAudioStreamBasicDescription);
OSStatus status = AudioConverterNew(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, &_audioConverter);
if(status != 0) NSLog(@"setup converter failed: %d", (int)status);
这样就创建了AAC编码器,默认情况下,Apple会创建一个硬件编码器,如果硬件不可用,会创建软件编码器。
经过我的测试,硬件AAC编码器的编码时延很高,需要buffer大约2秒的数据才会开始编码。而软件编码器的编码时延就是正常的,只要喂给1024个样点,就会开始编码。
那么如何在创建的时候指定使用软件编码器呢?需要用到下面的接口:
- (AudioClassDescription *)getAudioClassDescriptionWithType:(UInt32)type
fromManufacturer:(UInt32)manufacturer
static AudioClassDescription desc;
UInt32 encoderSpecifier = type;
OSStatus st;
UInt32 size;
st = AudioFormatGetPropertyInfo(kAudioFormatProperty_Encoders,
sizeof(encoderSpecifier),
&encoderSpecifier,
&size);
if (st)
NSLog(@"error getting audio format propery info: %d", (int)(st));
return nil;
unsigned int count = size / sizeof(AudioClassDescription);
AudioClassDescription descriptions[count];
st = AudioFormatGetProperty(kAudioFormatProperty_Encoders,
sizeof(encoderSpecifier),
&encoderSpecifier,
&size,
descriptions);
if (st)
NSLog(@"error getting audio format propery: %d", (int)(st));
return nil;
for (unsigned int i = 0; i < count; i++)
if ((type == descriptions[i].mSubType) &&
(manufacturer == descriptions[i].mManufacturer))
memcpy(&desc, &(descriptions[i]), sizeof(desc));
return &desc;
return nil;
AudioClassDescription *desc = [self getAudioClassDescriptionWithType:kAudioFormatMPEG4AAC
fromManufacturer:kAppleSoftwareAudioCodecManufacturer];
OSStatus status = AudioConverterNewSpecific(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, 1, desc, &_audioConverter);
如果要正确的编码,编码码率参数是必须设置的。否则编码时会返回560226676错误码(!dat)。
UInt32 ulBitRate = 64000;
UInt32 ulSize = sizeof(ulBitRate);
status = AudioConverterSetProperty(_audioConverter, kAudioConverterEncodeBitRate, ulSize, &ulBitRate);
需要注意,AAC并不是随便的码率都可以支持。比如如果PCM采样率是44100KHz,那么码率可以设置64000bps,如果是16K,可以设置为32000bps。
创建完成Converter和设置完Bitrate之后,可以查询一下最大编码输出的大小,后续会用到。
UInt32 value = 0;
size = sizeof(value);
AudioConverterGetProperty(_audioConverter, kAudioConverterPropertyMaximumOutputPacketSize, &size, &value);
获取出来的Value表示编码器最大输出的包大小。
然后调用AudioConverterFillCOmplexBuffer进行编码:
AudioBufferList outAudioBufferList = 0;
outAudioBufferList.mNumberBuffers = 1;
outAudioBufferList.mBuffers[0].mNumberChannels = 1;
outAudioBufferList.mBuffers[0].mDataByteSize = value;//value是上面查询到的值
outAudioBufferList.mBuffers[0].mData = new int8[value];
UInt32 ioOutputDataPacketSize = 1;
status = AudioConverterFillComplexBuffer(_audioConverter, inInputDataProc, (__bridge void *)(self), &ioOutputDataPacketSize, &outAudioBufferList, NULL);
编码接口中,inInputDataProc是一个输入数据的回调函数。用来喂PCM数据给Converter,ioOutputDataPacketSize为1表示编码产生1帧数据即返回。outAudioBufferList用来存放编码后的数据。
inInputDataProc中的处理如下:
static OSStatus inInputDataProc(AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, void *inUserData)
AACEncoder *encoder = (__bridge AACEncoder *)(inUserData);
UInt32 requestedPackets = *ioNumberDataPackets;
uint8_t *buffer;
uint32_t bufferLength = requestedPackets * 2;
uint32_t bufferRead;
bufferRead = [encoder.pcmPool readBuffer:&buffer withLength:bufferLength];
if (bufferRead == 0)
*ioNumberDataPackets = 0;
return -1;
ioData->mBuffers[0].mData = buffer;
ioData->mBuffers[0].mDataByteSize = bufferRead;
ioData->mNumberBuffers = 1;
ioData->mBuffers[0].mNumberChannels = 1;
*ioNumberDataPackets = bufferRead >> 1;
return noErr;
pcmPool是一个用于存放PCM数据的环形缓冲区。
因为采集输入每次不一定有1024样点,所以可以将数据缓存起来,再满足1024样点时再调用编码。
另外,对于TS文件来说,每个AAC数据需要增加一个adts头,adts头是一个7bit的数据,通过adts可以得知AAC数据的编码参数,方便解码器进行解码。
adts头的计算方法如下:
- (NSData*) adtsDataForPacketLength:(NSUInteger)packetLength
int adtsLength = 7;
char *packet = (char *)malloc(sizeof(char) * adtsLength);
// Variables Recycled by addADTStoPacket
int profile = 2; //AAC LC
//39=MediaCodecInfo.CodecProfileLevel.AACObjectELD;
int freqIdx = 8; //16KHz
int chanCfg = 1; //MPEG-4 Audio Channel Configuration. 1 Channel front-center
NSUInteger fullLength = adtsLength + packetLength;
// fill in ADTS data
packet[0] = (char)0xFF; // 11111111 = syncword
packet[1] = (char)0xF9; // 1111 1 00 1 = syncword MPEG-2 Layer CRC
packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2));
packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11));
packet[4] = (char)((fullLength&0x7FF) >> 3);
packet[5] = (char)(((fullLength&7)<<5) + 0x1F);
packet[6] = (char)0xFC;
NSData *data = [NSData dataWithBytesNoCopy:packet length:adtsLength freeWhenDone:YES];
return data;
iOS平台上音频编码成aac
小程之前介绍解码aac时,曾经使用了fadd,并且有提到,如果想编码成aac格式,可以使用facc、fdk-aac等,但使用fdk-aac等编码方式,都是软编码,在cpu的消耗上会明显大于硬件编码。
硬编码的优势是可以用硬件芯片集成的功能,高速且低功耗地完成编码任务。
在iOS平台,也提供了硬编码的能力,APP开发时只需要调用相应的SDK接口就可以了。
这个SDK接口就是AudioConverter。
本文介绍iOS平台上,如何调用AudioConverter来完成aac的硬编码。
从名字来看,AudioConverter就是格式转换器,这里小程使用它,把pcm格式的数据,转换成aac格式的数据。
对于媒体格式(编码格式或封装格式),读者可以关注“广州小程”公众号,并在“音视频->基础概念与流程”菜单中查阅相关文章。
AudioConverter在内存中实现转换,并不需要写文件,而ExtAudioFile接口则是对文件的操作,并且内部使用AudioConerter来转换格式,也就是说,读者在某种场景下,也可以使用ExtAudioFile接口。
如何使用AudioConverter呢?基本上,对接口的调用都需要阅读对应的头文件,通过看文档注释来理解怎么调用。
小程这里演示一下,怎么把pcm格式的数据转换成aac格式的数据。
在演示代码之后,小程只做简单的解释,有需要的读者请耐心阅读代码来理解,并应用到自己的开发场景中。
下面的例子演示从pcm转aac的实现(比如把录音数据保存成aac的实现)。
typedef struct
{
void *source;
UInt32 sourceSize;
UInt32 channelCount;
AudioStreamPacketDescription *packetDescriptions;
}FillComplexInputParam;
// 填写源数据,即pcm数据
OSStatus audioConverterComplexInputDataProc( AudioConverterRef inAudioConverter,
UInt32* ioNumberDataPackets,
AudioBufferList* ioData,
AudioStreamPacketDescription** outDataPacketDescription,
void* inUserData)
{
FillComplexInputParam* param = (FillComplexInputParam*)inUserData;
if (param->sourceSize <= 0) {
*ioNumberDataPackets = 0;
return -1;
}
ioData->mBuffers[0].mData = param->source;
ioData->mBuffers[0].mNumberChannels = param->channelCount;
ioData->mBuffers[0].mDataByteSize = param->sourceSize;
*ioNumberDataPackets = 1;
param->sourceSize = 0;
param->source = NULL;
return noErr;
}
typedef struct _tagConvertContext {
AudioConverterRef converter;
int samplerate;
int channels;
}ConvertContext;
// init
// 最终用AudioConverterNewSpecific创建ConvertContext,并设置比特率之类的属性
void* convert_init(int sample_rate, int channel_count)
{
AudioStreamBasicDescription sourceDes;
memset(&sourceDes, 0, sizeof(sourceDes));
sourceDes.mSampleRate = sample_rate;
sourceDes.mFormatID = kAudioFormatLinearPCM;
sourceDes.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger;
sourceDes.mChannelsPerFrame = channel_count;
sourceDes.mBitsPerChannel = 16;
sourceDes.mBytesPerFrame = sourceDes.mBitsPerChannel/8*sourceDes.mChannelsPerFrame;
sourceDes.mBytesPerPacket = sourceDes.mBytesPerFrame;
sourceDes.mFramesPerPacket = 1;
sourceDes.mReserved = 0;
AudioStreamBasicDescription targetDes;
memset(&targetDes, 0, sizeof(targetDes));
targetDes.mFormatID = kAudioFormatMPEG4AAC;
targetDes.mSampleRate = sample_rate;
targetDes.mChannelsPerFrame = channel_count;
UInt32 size = sizeof(targetDes);
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &targetDes);
AudioClassDescription audioClassDes;
memset(&audioClassDes, 0, sizeof(AudioClassDescription));
AudioFormatGetPropertyInfo(kAudioFormatProperty_Encoders, sizeof(targetDes.mFormatID), &targetDes.mFormatID, &size);
int encoderCount = size / sizeof(AudioClassDescription);
AudioClassDescription descriptions[encoderCount];
AudioFormatGetProperty(kAudioFormatProperty_Encoders, sizeof(targetDes.mFormatID), &targetDes.mFormatID, &size, descriptions);
for (int pos = 0; pos < encoderCount; pos ++) {
if (targetDes.mFormatID == descriptions[pos].mSubType && descriptions[pos].mManufacturer == kAppleSoftwareAudioCodecManufacturer) {
memcpy(&audioClassDes, &descriptions[pos], sizeof(AudioClassDescription));
break;
}
}
ConvertContext *convertContex = malloc(sizeof(ConvertContext));
OSStatus ret = AudioConverterNewSpecific(&sourceDes, &targetDes, 1, &audioClassDes, &convertContex->converter);
if (ret == noErr) {
AudioConverterRef converter = convertContex->converter;
tmp = kAudioConverterQuality_High;
AudioConverterSetProperty(converter, kAudioConverterCodecQuality, sizeof(tmp), &tmp);
UInt32 bitRate = 96000;
UInt32 size = sizeof(bitRate);
ret = AudioConverterSetProperty(converter, kAudioConverterEncodeBitRate, size, &bitRate);
}
else {
free(convertContex);
convertContex = NULL;
}
return convertContex;
}
// converting
void convert(void* convertContext, void* srcdata, int srclen, void** outdata, int* outlen)
{
ConvertContext* convertCxt = (ConvertContext*)convertContext;
if (convertCxt && convertCxt->converter) {
UInt32 theOuputBufSize = srclen;
UInt32 packetSize = 1;
void *outBuffer = malloc(theOuputBufSize);
memset(outBuffer, 0, theOuputBufSize);
AudioStreamPacketDescription *outputPacketDescriptions = NULL;
outputPacketDescriptions = (AudioStreamPacketDescription*)malloc(sizeof(AudioStreamPacketDescription) * packetSize);
FillComplexInputParam userParam;
userParam.source = srcdata;
userParam.sourceSize = srclen;
userParam.channelCount = convertCxt->channels;
userParam.packetDescriptions = NULL;
OSStatus ret = noErr;
AudioBufferList* bufferList = malloc(sizeof(AudioBufferList));
AudioBufferList outputBuffers = *bufferList;
outputBuffers.mNumberBuffers = 1;
outputBuffers.mBuffers[0].mNumberChannels = convertCxt->channels;
outputBuffers.mBuffers[0].mData = outBuffer;
outputBuffers.mBuffers[0].mDataByteSize = theOuputBufSize;
ret = AudioConverterFillComplexBuffer(convertCxt->converter, audioConverterComplexInputDataProc, &userParam, &packetSize, &outputBuffers, outputPacketDescriptions);
if (ret == noErr) {
if (outputBuffers.mBuffers[0].mDataByteSize > 0) {
NSData* rawAAC = [NSData dataWithBytes:outputBuffers.mBuffers[0].mData length:outputBuffers.mBuffers[0].mDataByteSize];
*outdata = malloc([rawAAC length]);
memcpy(*outdata, [rawAAC bytes], [rawAAC length]);
*outlen = (int)[rawAAC length];
// 测试转换出来的aac数据,保存成adts-aac文件
#if 1
int headerLength = 0;
char* packetHeader = newAdtsDataForPacketLength((int)[rawAAC length], convertCxt->samplerate, convertCxt->channels, &headerLength);
NSData* adtsPacketHeader = [NSData dataWithBytes:packetHeader length:headerLength];
free(packetHeader);
NSMutableData* fullData = [NSMutableData dataWithData:adtsPacketHeader];
[fullData appendData:rawAAC];
NSFileManager *fileMgr = [NSFileManager defaultManager];
NSString *filepath = [NSHomeDirectory() stringByAppendingFormat:@"/Documents/test%p.aac", convertCxt->converter];
NSFileHandle *file = nil;
if (![fileMgr fileExistsAtPath:filepath]) {
[fileMgr createFileAtPath:filepath contents:nil attributes:nil];
}
file = [NSFileHandle fileHandleForWritingAtPath:filepath];
[file seekToEndOfFile];
[file writeData:fullData];
[file closeFile];
#endif
}
}
free(outBuffer);
if (outputPacketDescriptions) {
free(outputPacketDescriptions);
}
}
}
// uninit
// ...
int freqIdxForAdtsHeader(int samplerate)
{
/**
0: 96000 Hz
1: 88200 Hz
2: 64000 Hz
3: 48000 Hz
4: 44100 Hz
5: 32000 Hz
6: 24000 Hz
7: 22050 Hz
8: 16000 Hz
9: 12000 Hz
10: 11025 Hz
11: 8000 Hz
12: 7350 Hz
13: Reserved
14: Reserved
15: frequency is written explictly
*/
int idx = 4;
if (samplerate >= 7350 && samplerate < 8000) {
idx = 12;
}
else if (samplerate >= 8000 && samplerate < 11025) {
idx = 11;
}
else if (samplerate >= 11025 && samplerate < 12000) {
idx = 10;
}
else if (samplerate >= 12000 && samplerate < 16000) {
idx = 9;
}
else if (samplerate >= 16000 && samplerate < 22050) {
idx = 8;
}
else if (samplerate >= 22050 && samplerate < 24000) {
idx = 7;
}
else if (samplerate >= 24000 && samplerate < 32000) {
idx = 6;
}
else if (samplerate >= 32000 && samplerate < 44100) {
idx = 5;
}
else if (samplerate >= 44100 && samplerate < 48000) {
idx = 4;
}
else if (samplerate >= 48000 && samplerate < 64000) {
idx = 3;
}
else if (samplerate >= 64000 && samplerate < 88200) {
idx = 2;
}
else if (samplerate >= 88200 && samplerate < 96000) {
idx = 1;
}
else if (samplerate >= 96000) {
idx = 0;
}
return idx;
}
int channelIdxForAdtsHeader(int channelCount)
{
/**
0: Defined in AOT Specifc Config
1: 1 channel: front-center
2: 2 channels: front-left, front-right
3: 3 channels: front-center, front-left, front-right
4: 4 channels: front-center, front-left, front-right, back-center
5: 5 channels: front-center, front-left, front-right, back-left, back-right
6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
8-15: Reserved
*/
int ret = 2;
if (channelCount == 1) {
ret = 1;
}
else if (channelCount == 2) {
ret = 2;
}
return ret;
}
/**
* Add ADTS header at the beginning of each and every AAC packet.
* This is needed as MediaCodec encoder generates a packet of raw
* AAC data.
*
* Note the packetLen must count in the ADTS header itself.
* See: http://wiki.multimedia.cx/index.php?title=ADTS
* Also: http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio#Channel_Configurations
**/
char* newAdtsDataForPacketLength(int packetLength, int samplerate, int channelCount, int* ioHeaderLen) {
int adtsLength = 7;
char *packet = malloc(sizeof(char) * adtsLength);
// Variables Recycled by addADTStoPacket
int profile = 2; //AAC LC
//39=MediaCodecInfo.CodecProfileLevel.AACObjectELD;
int freqIdx = freqIdxForAdtsHeader(samplerate);
int chanCfg = channelIdxForAdtsHeader(channelCount); //MPEG-4 Audio Channel Configuration.
NSUInteger fullLength = adtsLength + packetLength;
// fill in ADTS data
packet[0] = (char)0xFF;
// 11111111 = syncword
packet[1] = (char)0xF9;
// 1111 1 00 1 = syncword MPEG-2 Layer CRC
packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2));
packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11));
packet[4] = (char)((fullLength&0x7FF) >> 3);
packet[5] = (char)(((fullLength&7)<<5) + 0x1F);
packet[6] = (char)0xFC;
*ioHeaderLen = adtsLength;
return packet;
}
以上代码,有两个函数比较重要,一个是初始化函数,这个函数创建了AudioConverterRef,另一个是转换函数,这个函数应该被反复调用,对不同的pcm数据进行转换。
另外,示例中,把pcm转换出来的aac数据,进行了保存,保存出来的文件可以用于播放。
注意,AudioConverter转换出来的都是音频裸数据,至于组合成adts-aac,还是封装成苹果的m4a文件,由程序决定。
这里解释一下,adts-aac是aac数据的一种表示方式,也就是在每帧aac裸数据前面,增加一个帧信息(包括每帧的长度、采样率、声道数等),加上帧信息后,每帧aac可以单独播放。而且,adts-aac是没有封装的,也就是没有特定的文件头以及文件结构等。
adts是Audio Data Transport Stream的缩写。
当然,读者也可以把转换出来的aac数据,封装成m4a格式,这种封装格式,先是文件头,然后就是祼音频数据:
{packet-table}{audio_data}{trailer},头信息之后就是音频裸数据,音频数据不带packet信息。
至此,iOS平台把pcm转换成aac数据的实现就介绍完毕了。
总结一下,本文介绍了如何使用iOS平台提供的AudioConverter接口,把pcm格式的数据转换成aac格式。文章也介绍了怎么保存成adts-aac文件,读者可以通过这个办法检验转换出来的aac数据是否正确。
以上是关于AAC音频编码 相关的原理和设置的主要内容,如果未能解决你的问题,请参考以下文章