I’ve had nasty old time trying to get some audio stuff going on the iPhone, no thanks to Apple’s lack of documentation. If you’re an iPhone developer interested in getting RemoteIO/IO Remote/whatever it’s called working on the iPhone… Do I have good news for you. Read on.
Wanna skip the Core Audio learning curve and start writing code straight away? Check out my new project:
Update: Thanks to Joel Reymont, we now have an explanation for the “CrashIfClientProvidedBogusAudioBufferList” iPhone simulator bug: The simulator doesn’t like mono audio. Thanks, Joel!
Update: Happily, Apple have now created some excellent documentation on Remote IO, with some good sample projects. I recommend using that as a resource, now that it’s there, as that will continue to be updated.
Update: Tom Zicarelli has created a very extensive sample app that demonstrates the use of AUGraph, with all sorts of goodies.
So, we need to obtain an instance of the RemoteIO audio unit, configure it, and hook it up to a recording callback, which is used to notify you that there is data ready to be grabbed, and where you pull the data from the audio unit.
Overview
- Identify the audio component (kAudioUnitType_Output/ kAudioUnitSubType_RemoteIO/ kAudioUnitManufacturerApple)
- Use AudioComponentFindNext(NULL, &descriptionOfAudioComponent) to obtain the AudioComponent, which is like the factory with which you obtain the audio unit
- Use AudioComponentInstanceNew(ourComponent, &audioUnit) to make an instance of the audio unit
- Enable IO for recording and possibly playback with AudioUnitSetProperty
- Describe the audio format in an AudioStreamBasicDescription structure, and apply the format using AudioUnitSetProperty
- Provide a callback for recording, and possibly playback, again using AudioUnitSetProperty
- Allocate some buffers
- Initialise the audio unit
- Start the audio unit
- Rejoice
Here’s my code: I’m using both recording and playback. Use what applies to you!
Initialisation
Initialisation looks like this. We have a member variable of type AudioComponentInstance which will contain our audio unit.
The audio format described below uses SInt16 for samples (i.e. signed, 16 bits per sample)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | #define kOutputBus 0 #define kInputBus 1 // ... OSStatus status; AudioComponentInstance audioUnit; // Describe audio component AudioComponentDescription desc; desc.componentType = kAudioUnitType_Output; desc.componentSubType = kAudioUnitSubType_RemoteIO; desc.componentFlags = 0; desc.componentFlagsMask = 0; desc.componentManufacturer = kAudioUnitManufacturer_Apple; // Get component AudioComponent inputComponent = AudioComponentFindNext(NULL, &desc); // Get audio units status = AudioComponentInstanceNew(inputComponent, &audioUnit); checkStatus(status); // Enable IO for recording UInt32 flag = 1; status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, kInputBus, &flag, sizeof(flag)); checkStatus(status); // Enable IO for playback status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Output, kOutputBus, &flag, sizeof(flag)); checkStatus(status); // Describe format audioFormat.mSampleRate = 44100.00; audioFormat.mFormatID = kAudioFormatLinearPCM; audioFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked; audioFormat.mFramesPerPacket = 1; audioFormat.mChannelsPerFrame = 1; audioFormat.mBitsPerChannel = 16; audioFormat.mBytesPerPacket = 2; audioFormat.mBytesPerFrame = 2; // Apply format status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, kInputBus, &audioFormat, sizeof(audioFormat)); checkStatus(status); status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, kOutputBus, &audioFormat, sizeof(audioFormat)); checkStatus(status); // Set input callback AURenderCallbackStruct callbackStruct; callbackStruct.inputProc = recordingCallback; callbackStruct.inputProcRefCon = self; status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, kInputBus, &callbackStruct, sizeof(callbackStruct)); checkStatus(status); // Set output callback callbackStruct.inputProc = playbackCallback; callbackStruct.inputProcRefCon = self; status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Global, kOutputBus, &callbackStruct, sizeof(callbackStruct)); checkStatus(status); // Disable buffer allocation for the recorder (optional - do this if we want to pass in our own) flag = 0; status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ShouldAllocateBuffer, kAudioUnitScope_Output, kInputBus, &flag, sizeof(flag)); // TODO: Allocate our own buffers if we want // Initialise status = AudioUnitInitialize(audioUnit); checkStatus(status); |
Then, when you’re ready to start:
1 2 | OSStatus status = AudioOutputUnitStart(audioUnit); checkStatus(status); |
And to stop:
1 2 | OSStatus status = AudioOutputUnitStop(audioUnit); checkStatus(status); |
Then, when we’re finished:
1 | AudioUnitUninitialize(audioUnit); |
And now for our callbacks.
Recording
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | static OSStatus recordingCallback(void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData) { // TODO: Use inRefCon to access our interface object to do stuff // Then, use inNumberFrames to figure out how much data is available, and make // that much space available in buffers in an AudioBufferList. AudioBufferList *bufferList; // <- Fill this up with buffers (you will want to malloc it, as it's a dynamic-length list) // Then: // Obtain recorded samples OSStatus status; status = AudioUnitRender([audioInterface audioUnit], ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames, bufferList); checkStatus(status); // Now, we have the samples we just read sitting in buffers in bufferList DoStuffWithTheRecordedAudio(bufferList); return noErr; } |
Playback
1 2 3 4 5 6 7 8 9 10 11 | static OSStatus playbackCallback(void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData) { // Notes: ioData contains buffers (may be more than one!) // Fill them up as much as you can. Remember to set the size value in each buffer to match how // much data is in the buffer. return noErr; } |
Finally, rejoice with me in this discovery ;)
Resources that helped
- http://pastie.org/pastes/219616
- http://developer.apple.com/samplecode/CAPlayThrough/listing8.html
- http://listas.apesol.org/pipermail/svn-libsdl.org/2008-July/000797.html
No thanks at all to Apple for their lack of accessible documentation on this topic – They really have a long way to go here! Also boo to them with their lack of search engine, and refusal to open up their docs to Google. It’s a jungle out there!
Update: You can adjust the latency of RemoteIO (and, in fact, any other audio framework) by setting the kAudioSessionProperty_PreferredHardwareIOBufferDuration property:
float aBufferLength = 0.005; // In seconds AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration, sizeof(aBufferLength), &aBufferLength);
This adjusts the length of buffers that’re passed to you – if buffer length was originally, say, 1024 samples, then halving the number of samples halves the amount of time taken to process them.
Related posts
- Error -12986 and you A customer recently got in touch with me with an...
- Core Audio and freakin’ error -66632 This will only be of interest to a very small...
- A simple, fast circular buffer implementation for audio processing Circular buffers are pretty much what they sound like –...
- Easy AAC compressed audio conversion on iOS From the iPhone 3Gs up, it’s possible to encode compressed...
- Playing audio in time using Remote IO I got an email today with a question about how...




216 Comments
Hi Michael.
after i have put together all this code , how would i get the actual audio data ? where exactly does it saved ?
and when all this code is done ? i have to put it all into 1 method and then call it ? or should i only call -start method ?
what operation should be taken to get the real time data ?
i have spent days to understand it but i couldnt . how exactly i
Hi, Michael – it sounds like another tutorial might be in order. Please stay tuned, I’ll put one together over the next week or two and post it on the blog.
Jake asked back on November 13, 2010 if there was a way to get
“…input from iPhone mic and play back on bluetooth speaker…”
Is there? If not, why not?
Hi Peter,
I’m not certain – I haven’t played with bluetooth much. I do know that iOS’s audio routing capabilities are pretty limited, so it could go either way. My suggestion is to go check out the audio session documentation, and see what’s there. If it lets you connect a bluetooth speaker independently of the input system, then you should be good to go.
I am still looking for a way to play sounds based on the notes of the pentagram. Can you help?
Hello Michael,
This tutorial has helped me a lot. Thank you for that. My question: I am currently working on a VoIP application and I want to use SPEEX as a speech coder. This coder specifically asks for a audio buffer of 20 ms, 160 samples and sample rate of 8000Hz. However, I don’t think it is possible to set the buffer length to exactly 20 ms, or am I missing something? And if I set the sample rate to 8000 for the Remote IO unit I get inNumberFrames = 93 or 92. If you do the math, for a buffer of 20 ms and sample rate of 8000, I should get exactly 160 samples. Important note: I am still working in the simulator. Another think, do you think that maybe Audio Queues would be a better solution for a VoIP application?
Thanks a lot, Stefan
Hi Stefan,
Core Audio’s never that exact – it tries to find the closest parameters to what you request, but it’ll never be exact. If you need exactly 160 samples at a time for the SPEEX conversion, then use a circular buffer to store the audio and process it in chunks of the required size.
No, Audio Queues isn’t as low-latency as Remote IO. You definitely want Remote IO for a VoIP app.
StefanS, I try AudioQueue for VoIP. 160 samples is work for 8000Hz. But callback calls every 1-2 ms, but not every 20ms interval. After 25 calls it paused for 512 ms and calls every 1-2 ms again in circle. Michael is right.
Thank you both. I’ll work with audio units (not audio queues) and I’ll try to implement the circular buffer created by Michael. As my work proceeds, I may have some additional questions, I am fairly new to iOS development. :)
Hello Michael,
I try to use circular buffer to store audio data and process it in chunks of 20ms samples for VoIP app, but I can’t do it because I need accurate interval 20 ms for send packet with data to network and I don’t know how do it because NSTimer is not so accurate. Can you help with this question? Which timer is better for chunking audio data from your buffer with small interval?
Why would you need to use a timer? Why not just process 20 samples at a time, as they become available in the buffer?
My question is, If calling of the callback exactly every 20ms is not possible then it wouldn’t be ok to just put the coder into the callback and process the audio -> there’s the problem of synchronization. The codec should be called more frequently than the callback. Where should I put the codec and how to schedule it? Anyone? :)
Actually, I am still having problems setting the hardware sample rate. I set it as 8000Hz, and when I initialize the audio session I get its value (it says 8000Hz so that’s ok) but then somehow my application changes this value to 44100Hz (I hear it). So, in my callbacks the inNumberFrames is 512 (for 44100) and if I try to set the Audio Unit’s sample rate to 8000 the inNumberFrames becomes 93 93 92 (the value is not constant). Does anyone have any idea how this happens? Could this be a Simulator related problem? For 8000Hz and a buffer duration of 20ms one should get exactly 160 samples.
Thanks a lot, Stefan
The simulator can behave very differently to the device. When working with audio, always have a device handy, because you’ll see dramatically different effects. You can use the simulator sometimes, but unless you’re doing most of your testing using the device, you’re just making life insanely hard for yourself.
As for processing the buffer, just process it in blocks of 20 samples. I don’t really understand why there’s a synchronisation problem…Or why you’re limited to processing just one 20 sample block per callback. Just loop!
Whether you do it on the realtime thread in the callback, or in an offline processing thread is up to you – it depends on whether the coder is suitable for use in a realtime context (ie. whether it holds locks, allocates memory, takes a long time, etc.).
Is audio data passed to the buffer via callback with intervals of 20ms? This solves the problem. Or may be audio data comes to buffer with different intervals (20ms differs +/-5-10ms)?
Hello,
Does anyone here know how the iLBC codec is used? Apparently, my Convertor does not accept when the mFormatID from the AudioStreamBasicDescription is set to kAudioFormatiLBC.
Thanks, Stefan
Im trying to use this code however i get undeclared identifiers for almost all data types. i have looked them up and they seem to be in the AudioUnit.framework, however that framework is added to my link binarys with libraries so i dont understand why the data types arent recognized for example at the very top (the first 2 lines) AudioComponentInstance audioUnit; AudioComponentDescription desc; both are “undeclared identifier”
answer: you not only need to include it in the link libraries page but also add
import
Hello Michael,
About this:
// Disable buffer allocation for the recorder (optional – do this if we want to pass in our own) flag = 0; status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ShouldAllocateBuffer, kAudioUnitScope_Output, kInputBus, &flag, sizeof(flag));
What buffer does it refer to? I see no difference in the behavior of my application if I decide to disable it or not. I use temporary AudioBuffer and AudioBufferList to store the input data and then copy this data to the Circular buffer you have provided.
Another question: About this Voice-Processing IO Audio Unit and its acoustic echo cancellation. Do I simply use it in my code and this nice echo cancellation effect “magically” appears, or should I do some special configuration beforehand? Does it work at all?
Thank you, you’ve been such a help to me and my beginnings in iOS Audio development. Stefan
Hi Stefan,
That refers to the audio unit’s own internal buffer – it’s really quite a minor detail, but it saves a little memory allocation if you’re providing your own buffers instead. If in doubt, it’s save to leave it out, though.
Yep, you’ll get echo cancellation for free, as soon as you start using VPIO.
You’re welcome =)
Hello Michael,
I finally got the chance to try my application on a device and not just the simulator. It all works perfectly, except a minor delay, which I will look into.
My question: how can I output the audio through the speakers (the loud ones, so I can achieve a handsfree functionality)? So far, I can only hear the audio through the headphones or If I press my ear against the phone as in a standard conversation.
As always, Thanks:) Stefan
I hav e struggled to get this work for a while now and its driving me nuts. However now I am kinda worried because after reading thought he comments I see you posted: “Hey Rarejai – The ULaw format is for storage only, for use with things like the Audio File Services. Remote IO only works with PCM.” Which is what im trying to do (stream u-law audio) from the mic. I guess my question is if thats the case then what would this do? AudioStreamBasicDescription audioFormat; audioFormat.mSampleRate = 8000.00;//44100.00; audioFormat.mFormatID = kAudioFormatULaw; // audioFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked; audioFormat.mFramesPerPacket = 1; audioFormat.mChannelsPerFrame = 1; audioFormat.mBitsPerChannel = 16; audioFormat.mBytesPerPacket = 2; audioFormat.mBytesPerFrame = 2;
I also have this question open http://stackoverflow.com/questions/10501236/stream-media-from-iphone because when I use a variation of this code, nothing happens: void audioDataReceiver (AudioBufferList bufferList) { double *q = (double *)(&bufferList)->mBuffers[0].mData;
for(int i=0; i < strlen((const char *)(&bufferList)->mBuffers[0].mData); i++) {
// NSData * dataBuffer =[NSData dataWithBytes:(&bufferList)->mBuffers[0].mData length:sizeof((&bufferList)->mBuffers[0].mData)];
// [formData appendPartWithFileData:self.audioHandler.dataBuffer name:@"micaudio" fileName:@"sound.caf" mimeType:@"audio/basic"];
// NSLog(@”request: %@”,request); // NSLog(@”client: %@”,client); }]; [request setValue:@"audio/basic" forHTTPHeaderField:@"content-type"]; [request setValue:@"99999" forHTTPHeaderField:@"Content-Length"]; [request setValue:@"Keep-Alive" forHTTPHeaderField:@"Connection"]; [request setValue:@"no-cache" forHTTPHeaderField:@"Cache-Control"];
// NSLog(@”queue: %@”,queue); }
}
Sorry here is the pastebin for easier reading http://pastebin.com/hFSNnJct
3 Trackbacks
[...] complexity and move on to Audio Toolbox (or perhaps even Core Audio… a DevForums thread and a blog by developer Michael Tyson report extremely low latency by using the RemoteIO audio unit [...]
[...] I also wouldn’t have gotten anywhere on VocaForm without Michael Tyson’s post on using the remoteIO AU. [...]
[...] where you can do a lot of useful stuff, and then at the lowest level there are two types of Audio Unit: Remote I/O (or remoteio) and the Voice Processing Audio Unit [...]