在 iOS 中录制语音时如何以编程方式生成音频波形?
Posted
技术标签:
【中文标题】在 iOS 中录制语音时如何以编程方式生成音频波形?【英文标题】:How to generate audio wave form programmatically while recording Voice in iOS? 【发布时间】:2013-05-30 15:20:10 【问题描述】:如何在 ios 中录制语音时以编程方式生成音频波形?
我正在 iOS 中处理语音调制音频...一切正常...只需要一些最好的简单方法来生成检测噪声的音频波形...
请不要向我介绍...speakhere 和 auriotouch...的代码教程...我需要本地应用程序开发人员的一些最佳建议。
我已经录制了音频,并在录制后播放。我已经创建了波形并附上了截图。但它必须在视图中绘制为正在进行的音频录制
-(UIImage *) audioImageGraph:(SInt16 *) samples
normalizeMax:(SInt16) normalizeMax
sampleCount:(NSInteger) sampleCount
channelCount:(NSInteger) channelCount
imageHeight:(float) imageHeight
CGSize imageSize = CGSizeMake(sampleCount, imageHeight);
UIGraphicsBeginImageContext(imageSize);
CGContextRef context = UIGraphicsGetCurrentContext();
CGContextSetFillColorWithColor(context, [UIColor blackColor].CGColor);
CGContextSetAlpha(context,1.0);
CGRect rect;
rect.size = imageSize;
rect.origin.x = 0;
rect.origin.y = 0;
CGColorRef leftcolor = [[UIColor whiteColor] CGColor];
CGColorRef rightcolor = [[UIColor redColor] CGColor];
CGContextFillRect(context, rect);
CGContextSetLineWidth(context, 1.0);
float halfGraphHeight = (imageHeight / 2) / (float) channelCount ;
float centerLeft = halfGraphHeight;
float centerRight = (halfGraphHeight*3) ;
float sampleAdjustmentFactor = (imageHeight/ (float) channelCount) / (float) normalizeMax;
for (NSInteger intSample = 0 ; intSample < sampleCount ; intSample ++ )
SInt16 left = *samples++;
float pixels = (float) left;
pixels *= sampleAdjustmentFactor;
CGContextMoveToPoint(context, intSample, centerLeft-pixels);
CGContextAddLineToPoint(context, intSample, centerLeft+pixels);
CGContextSetStrokeColorWithColor(context, leftcolor);
CGContextStrokePath(context);
if (channelCount==2)
SInt16 right = *samples++;
float pixels = (float) right;
pixels *= sampleAdjustmentFactor;
CGContextMoveToPoint(context, intSample, centerRight - pixels);
CGContextAddLineToPoint(context, intSample, centerRight + pixels);
CGContextSetStrokeColorWithColor(context, rightcolor);
CGContextStrokePath(context);
// Create new image
UIImage *newImage = UIGraphicsGetImageFromCurrentImageContext();
// Tidy up
UIGraphicsEndImageContext();
return newImage;
接下来是一个接受 AVURLAsset 并返回 PNG 数据的方法
- (NSData *) renderPNGAudioPictogramForAssett:(AVURLAsset *)songAsset
NSError * error = nil;
AVAssetReader * reader = [[AVAssetReader alloc] initWithAsset:songAsset error:&error];
AVAssetTrack * songTrack = [songAsset.tracks objectAtIndex:0];
NSDictionary* outputSettingsDict = [[NSDictionary alloc] initWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,
// [NSNumber numberWithInt:44100.0],AVSampleRateKey, /*Not Supported*/
// [NSNumber numberWithInt: 2],AVNumberOfChannelsKey, /*Not Supported*/
[NSNumber numberWithInt:16],AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsBigEndianKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsNonInterleaved,
nil];
AVAssetReaderTrackOutput* output = [[AVAssetReaderTrackOutput alloc] initWithTrack:songTrack outputSettings:outputSettingsDict];
[reader addOutput:output];
[output release];
UInt32 sampleRate,channelCount;
NSArray* formatDesc = songTrack.formatDescriptions;
for(unsigned int i = 0; i < [formatDesc count]; ++i)
CMAudioFormatDescriptionRef item = (CMAudioFormatDescriptionRef)[formatDesc objectAtIndex:i];
const AudioStreamBasicDescription* fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription (item);
if(fmtDesc )
sampleRate = fmtDesc->mSampleRate;
channelCount = fmtDesc->mChannelsPerFrame;
// NSLog(@"channels:%u, bytes/packet: %u, sampleRate %f",fmtDesc->mChannelsPerFrame, fmtDesc->mBytesPerPacket,fmtDesc->mSampleRate);
UInt32 bytesPerSample = 2 * channelCount;
SInt16 normalizeMax = 0;
NSMutableData * fullSongData = [[NSMutableData alloc] init];
[reader startReading];
UInt64 totalBytes = 0;
SInt64 totalLeft = 0;
SInt64 totalRight = 0;
NSInteger sampleTally = 0;
NSInteger samplesPerPixel = sampleRate / 50;
while (reader.status == AVAssetReaderStatusReading)
AVAssetReaderTrackOutput * trackOutput = (AVAssetReaderTrackOutput *)[reader.outputs objectAtIndex:0];
CMSampleBufferRef sampleBufferRef = [trackOutput copyNextSampleBuffer];
if (sampleBufferRef)
CMBlockBufferRef blockBufferRef = CMSampleBufferGetDataBuffer(sampleBufferRef);
size_t length = CMBlockBufferGetDataLength(blockBufferRef);
totalBytes += length;
NSAutoreleasePool *wader = [[NSAutoreleasePool alloc] init];
NSMutableData * data = [NSMutableData dataWithLength:length];
CMBlockBufferCopyDataBytes(blockBufferRef, 0, length, data.mutableBytes);
SInt16 * samples = (SInt16 *) data.mutableBytes;
int sampleCount = length / bytesPerSample;
for (int i = 0; i < sampleCount ; i ++)
SInt16 left = *samples++;
totalLeft += left;
SInt16 right;
if (channelCount==2)
right = *samples++;
totalRight += right;
sampleTally++;
if (sampleTally > samplesPerPixel)
left = totalLeft / sampleTally;
SInt16 fix = abs(left);
if (fix > normalizeMax)
normalizeMax = fix;
[fullSongData appendBytes:&left length:sizeof(left)];
if (channelCount==2)
right = totalRight / sampleTally;
SInt16 fix = abs(right);
if (fix > normalizeMax)
normalizeMax = fix;
[fullSongData appendBytes:&right length:sizeof(right)];
totalLeft = 0;
totalRight = 0;
sampleTally = 0;
[wader drain];
CMSampleBufferInvalidate(sampleBufferRef);
CFRelease(sampleBufferRef);
NSData * finalData = nil;
if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
// Something went wrong. return nil
return nil;
if (reader.status == AVAssetReaderStatusCompleted)
NSLog(@"rendering output graphics using normalizeMax %d",normalizeMax);
UIImage *test = [self audioImageGraph:(SInt16 *)
fullSongData.bytes
normalizeMax:normalizeMax
sampleCount:fullSongData.length / 4
channelCount:2
imageHeight:100];
finalData = imageToData(test);
[fullSongData release];
[reader release];
return finalData;
我有
【问题讨论】:
看看这个,这可能会有所帮助。 developer.apple.com/library/ios/#samplecode/aurioTouch2/… 如果你在实现上有具体问题,很容易得到帮助。但是“我想要波形”让人指向一些标准样本。 @Vignesh:我已在我提出的问题中附上了屏幕截图。这就是我需要在录音进行时必须在那个时刻绘制的输出。谢谢 @iVenky,对不起,我的意思是问你有什么尝试实现它?你在哪里卡住了?。 此链接将对您有所帮助。祝你有美好的一天github.com/ioslovers/ATTabandHoldAudioRecord 【参考方案1】:如果您想要从麦克风输入中获得实时图形,请使用 RemoteIO 音频单元,这是大多数原生 iOS 应用程序开发人员用于低延迟音频的工具,以及用于绘制波形的 Metal 或 Open GL,这将为您提供最高帧率。您需要与问题中提供的代码完全不同的代码才能执行此操作,因为 AVAssetRecording、Core Graphic 线条绘制和 png 渲染太慢而无法使用。
更新:在 iOS 8 和更新版本中,Metal API 可能能够以比 OpenGL 更高的性能呈现图形可视化。
更新 2:这里有一些代码 sn-ps,用于在 Swift 3 中使用音频单元录制实时音频并使用 Metal 绘制位图:https://gist.github.com/hotpaw2/f108a3c785c7287293d7e1e81390c20b
【讨论】:
如果您可以分享一些代码sn-ps以获得使用openGL绘制波形的实时图形会有所帮助。非常感谢您的支持 如果这是 iOS,那么您将使用 openGLes。您可以在固定 (ES1) 或着色器管道 (ES2) 之间进行选择。我不知道使用着色器绘制这种东西是否有好处。 iOS 应用程序示例 aurioTouch 有一个示例,说明如何在 openGL 中从麦克风绘制短长度缓冲区。如果我在那个例子中没记错的话,他们会为像素绘制相同数量的样本。但是,要绘制较长歌曲(数百万个样本)的整个波形,您必须缩放以使绘制的样本适合屏幕。您还应该使用某种峰值或 RMS 计算。 嘿@hotpaw2,我试过这段代码,它可以工作!谢谢分享。我想把它打包成一个 Swift 包,稍微修改一下后让人们更容易使用。想和你确认一下是否可以【参考方案2】:您应该查看 EZAudio (https://github.com/syedhali/EZAudio),特别是 EZRecorder 和 EZAudioPlot(或 GPU 加速的 EZAudioPlotGL)。
还有一个示例项目可以完全满足您的需求,https://github.com/syedhali/EZAudio/tree/master/EZAudioExamples/iOS/EZAudioRecordExample
编辑:这是内联代码
/// In your interface
/**
Use a OpenGL based plot to visualize the data coming in
*/
@property (nonatomic,weak) IBOutlet EZAudioPlotGL *audioPlot;
/**
The microphone component
*/
@property (nonatomic,strong) EZMicrophone *microphone;
/**
The recorder component
*/
@property (nonatomic,strong) EZRecorder *recorder;
...
/// In your implementation
// Create an instance of the microphone and tell it to use this view controller instance as the delegate
-(void)viewDidLoad
self.microphone = [EZMicrophone microphoneWithDelegate:self startsImmediately:YES];
// EZMicrophoneDelegate will provide these callbacks
-(void)microphone:(EZMicrophone *)microphone
hasAudioReceived:(float **)buffer
withBufferSize:(UInt32)bufferSize
withNumberOfChannels:(UInt32)numberOfChannels
dispatch_async(dispatch_get_main_queue(),^
// Updates the audio plot with the waveform data
[self.audioPlot updateBuffer:buffer[0] withBufferSize:bufferSize];
);
-(void)microphone:(EZMicrophone *)microphone hasAudioStreamBasicDescription:(AudioStreamBasicDescription)audioStreamBasicDescription
// The AudioStreamBasicDescription of the microphone stream. This is useful when configuring the EZRecorder or telling another component what audio format type to expect.
// We can initialize the recorder with this ASBD
self.recorder = [EZRecorder recorderWithDestinationURL:[self testFilePathURL]
andSourceFormat:audioStreamBasicDescription];
-(void)microphone:(EZMicrophone *)microphone
hasBufferList:(AudioBufferList *)bufferList
withBufferSize:(UInt32)bufferSize
withNumberOfChannels:(UInt32)numberOfChannels
// Getting audio data as a buffer list that can be directly fed into the EZRecorder. This is happening on the audio thread - any UI updating needs a GCD main queue block. This will keep appending data to the tail of the audio file.
if( self.isRecording )
[self.recorder appendDataFromBufferList:bufferList
withBufferSize:bufferSize];
【讨论】:
嘿 .. 你应该不发布仅链接的答案【参考方案3】:我正在寻找同样的东西。 (根据录音机的数据制作波形)。我发现了一些可能有用且值得检查代码以了解其背后逻辑的库。
计算都是基于sin和数学公式。如果你看一下代码,这很简单!
https://github.com/stefanceriu/SCSiriWaveformView
或
https://github.com/raffael/SISinusWaveView
这只是您可以在网络上找到的几个示例。
【讨论】:
以上是关于在 iOS 中录制语音时如何以编程方式生成音频波形?的主要内容,如果未能解决你的问题,请参考以下文章