I’ve had nasty old time trying to get some audio stuff going on the iPhone, no thanks to Apple’s lack of documentation. If you’re an iPhone developer interested in getting RemoteIO/IO Remote/whatever it’s called working on the iPhone… Do I have good news for you. Read on.
<
p style=”font-style: italic”>Wanna skip the Core Audio learning curve and start writing code straight away? Check out my new project:
Update: Thanks to Joel Reymont, we now have an explanation for the “CrashIfClientProvidedBogusAudioBufferList” iPhone simulator bug: The simulator doesn’t like mono audio. Thanks, Joel!
Update: Happily, Apple have now created some excellent documentation on Remote IO, with some good sample projects. I recommend using that as a resource, now that it’s there, as that will continue to be updated.
Update: Tom Zicarelli has created a very extensive sample app that demonstrates the use of AUGraph, with all sorts of goodies.
So, we need to obtain an instance of the RemoteIO audio unit, configure it, and hook it up to a recording callback, which is used to notify you that there is data ready to be grabbed, and where you pull the data from the audio unit.
Overview
- Identify the audio component (kAudioUnitType_Output/ kAudioUnitSubType_RemoteIO/ kAudioUnitManufacturerApple)
- Use AudioComponentFindNext(NULL, &descriptionOfAudioComponent) to obtain the AudioComponent, which is like the factory with which you obtain the audio unit
- Use AudioComponentInstanceNew(ourComponent, &audioUnit) to make an instance of the audio unit
- Enable IO for recording and possibly playback with AudioUnitSetProperty
- Describe the audio format in an AudioStreamBasicDescription structure, and apply the format using AudioUnitSetProperty
- Provide a callback for recording, and possibly playback, again using AudioUnitSetProperty
- Allocate some buffers
- Initialise the audio unit
- Start the audio unit
- Rejoice
Here’s my code: I’m using both recording and playback. Use what applies to you!
Initialisation
Initialisation looks like this. We have a member variable of type AudioComponentInstance which will contain our audio unit.
The audio format described below uses SInt16 for samples (i.e. signed, 16 bits per sample)
#define kOutputBus 0 #define kInputBus 1 // ... OSStatus status; AudioComponentInstance audioUnit; // Describe audio component AudioComponentDescription desc; desc.componentType = kAudioUnitType_Output; desc.componentSubType = kAudioUnitSubType_RemoteIO; desc.componentFlags = 0; desc.componentFlagsMask = 0; desc.componentManufacturer = kAudioUnitManufacturer_Apple; // Get component AudioComponent inputComponent = AudioComponentFindNext(NULL, &desc); // Get audio units status = AudioComponentInstanceNew(inputComponent, &audioUnit); checkStatus(status); // Enable IO for recording UInt32 flag = 1; status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, kInputBus, &flag, sizeof(flag)); checkStatus(status); // Enable IO for playback status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Output, kOutputBus, &flag, sizeof(flag)); checkStatus(status); // Describe format audioFormat.mSampleRate = 44100.00; audioFormat.mFormatID = kAudioFormatLinearPCM; audioFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked; audioFormat.mFramesPerPacket = 1; audioFormat.mChannelsPerFrame = 1; audioFormat.mBitsPerChannel = 16; audioFormat.mBytesPerPacket = 2; audioFormat.mBytesPerFrame = 2; // Apply format status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, kInputBus, &audioFormat, sizeof(audioFormat)); checkStatus(status); status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, kOutputBus, &audioFormat, sizeof(audioFormat)); checkStatus(status); // Set input callback AURenderCallbackStruct callbackStruct; callbackStruct.inputProc = recordingCallback; callbackStruct.inputProcRefCon = self; status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, kInputBus, &callbackStruct, sizeof(callbackStruct)); checkStatus(status); // Set output callback callbackStruct.inputProc = playbackCallback; callbackStruct.inputProcRefCon = self; status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Global, kOutputBus, &callbackStruct, sizeof(callbackStruct)); checkStatus(status); // Disable buffer allocation for the recorder (optional - do this if we want to pass in our own) flag = 0; status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ShouldAllocateBuffer, kAudioUnitScope_Output, kInputBus, &flag, sizeof(flag)); // TODO: Allocate our own buffers if we want // Initialise status = AudioUnitInitialize(audioUnit); checkStatus(status); |
Then, when you’re ready to start:
OSStatus status = AudioOutputUnitStart(audioUnit); checkStatus(status); |
And to stop:
OSStatus status = AudioOutputUnitStop(audioUnit); checkStatus(status); |
Then, when we’re finished:
AudioComponentInstanceDispose(audioUnit); |
And now for our callbacks.
Recording
static OSStatus recordingCallback(void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData) { // TODO: Use inRefCon to access our interface object to do stuff // Then, use inNumberFrames to figure out how much data is available, and make // that much space available in buffers in an AudioBufferList. AudioBufferList *bufferList; // <- Fill this up with buffers (you will want to malloc it, as it's a dynamic-length list) // Then: // Obtain recorded samples OSStatus status; status = AudioUnitRender([audioInterface audioUnit], ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames, bufferList); checkStatus(status); // Now, we have the samples we just read sitting in buffers in bufferList DoStuffWithTheRecordedAudio(bufferList); return noErr; } |
Playback
static OSStatus playbackCallback(void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData) { // Notes: ioData contains buffers (may be more than one!) // Fill them up as much as you can. Remember to set the size value in each buffer to match how // much data is in the buffer. return noErr; } |
Finally, rejoice with me in this discovery ;)
Resources that helped
- http://pastie.org/pastes/219616
- http://developer.apple.com/samplecode/CAPlayThrough/listing8.html
- http://listas.apesol.org/pipermail/svn-libsdl.org/2008-July/000797.html
No thanks at all to Apple for their lack of accessible documentation on this topic – They really have a long way to go here! Also boo to them with their lack of search engine, and refusal to open up their docs to Google. It’s a jungle out there!
Update: You can adjust the latency of RemoteIO (and, in fact, any other audio framework) by setting the kAudioSessionProperty_PreferredHardwareIOBufferDuration
property:
float aBufferLength = 0.005; // In seconds AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration, sizeof(aBufferLength), &aBufferLength); |
This adjusts the length of buffers that’re passed to you – if buffer length was originally, say, 1024 samples, then halving the number of samples halves the amount of time taken to process them.
Another Update: In the comments below, Florian Bomers pointed out that I was incorrectly using the AudioUnitUninitialize
to clean up the Audio Unit. This is incorrect, and should in fact be AudioComponentInstanceDispose
. Further discussion here. Cheers Florian!
i’d like to meter the volume of some people speaking near mic of iphone.
i thought to use peakPowerForChannel or averegePowerForChannel…
i have made some tests, the maximum (0) is returned very soon, i say a letter with normal voice in the mic, and 0 is very soon reached…
if i shout laud 0 is returned…
i have -160 as a minimum (silence), 0 as max…
it’s ok for me but the max is reached as i talk.
i’d like to have something like:
-160 noone is talking
-100 somone is talking
-50 somone is shouting
0 someone is shouting laud
while i have now
-160 silence
0 somone is talking
will i have better luck with the method described here?
or mic will go to saturation and sound metering will be always at max?
Hi, without giving too much code I wonder if you are taking in consideration the value of the samples to the square or at least their absolute value. Summing all the samples without doing this will eventually result to a mean of 0. Also think to normalize and cast the values at the right time. i.e.
int sum = 0; sum+= (short)audioBuffer[frameIndex];
thensum/(nbFrames);
(trivial I know…) btw I don’t know if the iPhone does a limiter on the input per default, you should check this too… but I don’t think so. Btw: if I had to do such a talk detector I would do this as follow: 1. take the input, 2. absolute value it 3. filter it through a simple lowpass 1 pole i.e:value[n] = alphavalue[n]+(1.-alpha)value[n-1];
or in integer values you can try :value[x] = (value[x]+value[x-1]*15)>>4;
alpha being [0.0 , 1.0]. If alpha is near to 0 the inertia is higher i.e. more noise robust but slower the talk detection. Just put a thresh or a better a schmitt trigger on this new filtered buffer in order to detect if the people is talking or not…Good luck
How to change volume ?
Hi, unlike OpenAL, there is no settings in order to change the volume by itself. You need to do it by your own. This is obviously achieved by multiplying the sound stream by a constant usually bounded by [0, 1]. For example you can simply do something like:
audioStream[i] = (short)((float)audioStream[i]volume); /volume being a float bounded by [0,1]/
but still this is bad in term of optimizations as this process involves casting a short into a float. I would strongly recommend to keep into the “integer space” and find your appropriate ratio in order to match your volume. Then do something like this:audioStream[i] = (short)((int)audioStream[i]65536/invRatio); /where invRatio ~= volume65536 */
Hi,
I tried with Remote I/O audio unit sample code. But in callback function, I am getting null value for AudioBufferList. Why we will get null value ? Please help me to resolve it.
Thanks.
It may be best if you post your code, to save us guessing – you could put it on pastebin.com, for example. You may be using an incorrect sample format, or perhaps haven’t properly initialised the audio unit, or a number of other possibilities.
Hi Michael,
Your post looks very useful, but I am a complete novice when it comes to CoreAudio. I have been teaching myself for a few months now, and I am at the stage where I understand what you’re doing in your code. But I cannot figure out how to implement the code within a project, and how it all links together. I’ve looked at the SpeakHere example, but this does not seem to help. I was wondering if you could upload an xcode project, just so I can teach myself how you’ve implemented the code?
My situation is, that i need to record (up to 30 seconds) of audio directly into one audio buffer, so that i can process the signal (performing convolution algorithms). Looking at your post above, this will help me learn how to record audio into a buffer, which would be extremely useful in my case.
If you don’t want to upload a project I completely understand.
Many Thanks
Sam
Sorry…but I don’t see a declaration for audioFormat…can you let us know what type it is?
Looks like it’s an AudioStreamBasicDescription?
Hi
Thanks for the great post.
if I just want to stream from the microphone and process it but never play it, do I still need an output? Hmm, the render call back would be on the input of the output wouldn’t it?
Have I just answered my own question lol?
Thanks
Anim
That’s right, Anim, you don’t need an output – you can just drop that half (the output callback and the output enable call).
Thanks, will give it a go.
Anim
Don’t know if anyone has tried this, but I want to use remoteIO along with the AVFoundation library.
Basically I have an app that captures the audio from the mic, does some effect in the callback etc. But I want to be able to record this post effect audio buffer in an AVCaptureSession. I am not sure AVFoundation is compatible with audio units and from what I can see you can only use a fixed AVCaptureDeviceInput with AVCaptureSession rather than specifying your own buffer.
Has anyone else tried something similar?
Thanks
help please!!
Has no effect in simulator as well as on iphone. inNumberFrames is still 2048. tried framesperslice, no effect either. iphone firmware 4.0.
Please help,
thank you,
nonono
Hullo,
I would like to play a number of contemporary sounds of different frequencies at volumes being ruled programmatically. I would like to have an object for each frequency to play its own sound and the given volume: how do I attain this simple task without going deep in the Core Audio technology?
Thanks, Fabrizio
When I use, Remote IO to record and Playback, I get a noise always. Some suggested me to switch off speaker. But I dont know where to do it. Please explain what the problem could be?
Awesome thank you so much for this! After messing round with AudioQueue for days attempting to make it responsive, I managed to get things going with RemoteIO in a couple of hours. In comparison it’s been a breeze.
Much appreciated thanks!
Awesome!! Thanks for you great work!! I’ve been working on this for ages!!
The only problem left is, how can I keep playing the recoring (realtime) in background??
Any ideas?
Is there any sample code on how to add start/stop recording to file? I’m aiming at taking the playback buffer and saving it to file with a start/stop record button. I’m confused about how to go about getting the buffer to a saved file though.
The best example is IOHost example available from WWDC session .
And how/where can I find/get that?
I found the WWDC session that shows IOHost, but IOHost only shows panning of audio, it doesn’t relate to saving audio to a file at all.
Any specific reason why you want to record and store in file using Remote IO?
Is there any specific reason you are not opting AVAudioRecorder?
Yeah, AVAudioRecorder records input only whereas I’m trying to record output. A lot of music audio applications seem to have this ability where a user creates beats/sequences/loops, etc and is able then to render to file. This can’t be done with AVAudioRecorder though and since the buffer is already being filed by RemoteIO/Audio Units, I imagine somehow the buffer can be passed to file, but I’m confused in how to do so.
Hi, how do I to play the recorded sound?
Hi Michael,
I tried using RemoteIO unit but I do not get any ouput.
What I want to achieve is simply – Hear through speaker whatevr I speak in microphone.
I have uploaded my code to Pastebin :
http://pastebin.com/UX9vVpbr
Please can you have a look at it and let me know any hints to make it work?
Thanks,
Akshay.
Hi Akshay,
AUGraphConnectNodeInput
to make a connection from the input of the remote IO node, to its output.Hi Michael,
Thanks for your input !
That pin pointed the problem.
However I made following changes to make it work:
1. stereoStreamFormat.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical;
Added
AUGraphConnectNodeInput(mProcessingGraph, mIONode, 1, mIONode, 0);
Did not changed bytesPerPacket value.
And it worked!!
Glad to hear you got it sorted.
Beware though that if you specify an invalid audio stream format, as you have, it’s not at all guaranteed to work on all devices, even if the device you’re currently testing on can figure out what you mean.
It’s always safest to just provide a valid value, instead of taking a stab in the dark and hoping for the best – just multiply your bytesPerPacket by two, and it’ll be much safer, trust me.
Hi Michael,
If I change to
bytesPerPacket = bytesPerSample*2
I do not get any audio.
Any idea why?
Oh, yes, you have to set bytesPerFrame to bytesPerSample*2 as well
Still the same result…
Ah, I hate mucking about with these settings; it’s always finnicky. I suggest either sticking at it until you get the settings right (maybe I missed something), or take the easier path of grabbing a sample format from the sample projects and using that. Here’s the format I use:
Hi Michael,
I am trying to achieve this:
Input from Mic and a guitar file ————–> Multichannel Mixer Unit –> Speaker
I get both of them on speaker.. but with lot of noise with audio coming from mic.
From the debugging and analyzing the code, I found that:
For Input from Mic using VoiceProcessing AudioUnit, ASBD requires following parameter set as
mStereoStreamFormatForVPUnit.mFormatFlags = kAudioFormatFlagsCanonical;
mStereoStreamFormatForVPUnit.mBytesPerPacket = bytesPerSample * 2;
mStereoStreamFormatForVPUnit.mBytesPerFrame = bytesPerSample * 2;
And For Guitar file the ASBD requires following parameters set
mStereoStreamFormat.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical;
mStereoStreamFormat.mBytesPerPacket = bytesPerSample;
mStereoStreamFormat.mBytesPerFrame = bytesPerSample;
If any of the above values are altered, the graph initialization fails.
And providing mStereoStreamFormat to mixer input to 2 buses gives a lot of noise.
I have uploaded my code at “http://pastebin.com/N1zfFmqz”
In this code if you change (1) (2) (3) and (4) to mStereoStreamFormatForVPUnit,
The graph initialization would give error.
Can you please throw some light on this?
Help appreciated.
Thanks,
Akshay Shah.
Hi Michael,
Can you please help me on this?
Appreciate your help..
Thanks,
Akshay Shah.
Hi Akshay,
Off the top of my head, I think you should be setting kAudioUnitProperty_StreamFormat for the VP unit’s input element/output bus, to tell it what format you want the incoming audio in. Also, it’s mono, so you shouldn’t need to be specifying a stereo format, unless something else is wrong. I’d be experimenting with those stream formats some more, but other than that, I can’t think of anything else.
If all else fails, perhaps try posting the question on one of the dev forums where more people’ll see it. I’m currently driving from Scotland to Wales so I’m a bit distracted currently =)
What is mono? and where is it specified?
I assume to keep both the guitar as well as the input audio to be stereo.
Mono: http://en.wikipedia.org/wiki/Monaural
Just specify 1 channel instead of 2 (with the appropriate adjustments to the fields) for the input.
The input’s not stereo, cos the mic isn’t
Crystal clear. Definitely the best kick off.
Thanks.
Thanks for the good example can we play wav file using audio queue method
Hi.
I notice in your example that you are using just one audio-unit to do both capture and playout.
I have been experimenting with opening up two audio-units, one for capture and one for playout, but following your approach, I only get static on my capture. (The playout is OK).
Do you think it is basically unsupported to use two audio-units for this? Should I only use one? Can you make your example work with two?
Thanks!
Hi Havard,
Honestly, I’ve never seen two audio units used for the one device – I doubt it’s supported usage. Why not just use one, the normal way?
Any chance someone could post an example on how to get this working with background audio (when the screen is locked and when the is closed in the background)?
How do I modify this sample to play a .caf file?
Recording from mic input appears to be broken on devices running iOS 5. In recordingCallback, AudioUnitRender returns an OSStatus of -50.
Hi Michael,
This is a little off-topic, apologies.
I’m trying to avoid hard-coding samplerate values in my AU. Do you know how to query the hosting DAW to obtain the current samplerate setting?
Sorry if this is an obvious question – I can’t find documentation on this anywhere!
Cheers
Hi James,
kAudioSessionProperty_CurrentHardwareSampleRate is what you want (use it with AudioSessionGetProperty)
Ah great! However, I’m writing my audio unit in c++ as purely an audio unit, do you know how to access kAudioSessionProperty_CurrentHardwareSampleRate in that context?
Oh, if you’re writing an audio unit, I can’t help you – I’ve only got experience with using audio units. Maybe you should seek help in the apple dev forums?
Okay Michael, thanks again! I’ll check there…
Is it possible capture the OUTPUT (say from the iPod player or another Audio App) and direct that through my AudioUnit graph?
Currently I have a AUFilePlayer->Mixer->iPodSpeed->RemoteIO-Output… this works well, but I want to use files from the iPod Library, or another app.
Thanks
I’m afraid not, muzzkat, no. The only way to capture device output is to use a loopback audio device to run the output back through into the input.
My code seems to work on the simulator but not on a device and I cannot work out why. AudioUnitRender returns -50 on a device. I am pretty sure it has something to do with the buffers but I am not sure. My init code:
AudioStreamBasicDescription audioFormat;
audioFormat.mSampleRate = 44100.0;
audioFormat.mFormatID = kAudioFormatLinearPCM;
audioFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
audioFormat.mFramesPerPacket = 1;
audioFormat.mChannelsPerFrame = 1;
audioFormat.mBitsPerChannel = 16;
audioFormat.mBytesPerPacket = 2;
audioFormat.mBytesPerFrame = 2;
And my render code:
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0].mNumberChannels = 1;
bufferList.mBuffers[0].mData = NULL;
bufferList.mBuffers[0].mDataByteSize = inNumberFrames * sizeof(SInt16) * 2;
Ah, -50, my favourite error ;-)
Firstly, I notice that you’re setting mDataByteSize of the render buffer to inNumberFrames * sizeof(SInt16) * 2 – it may make no difference, but that implies stereo audio (that is, frames * bytes per sample * 2 samples per frame). Either set your mChannelsPerFrame to 2 if stereo audio is what you actually want, or get rid of the * 2.
Also, it couldn’t hurt to make absolutely certain viewController.ioUnit is initialised.
Otherwise, if that doesn’t do it, perhaps it’s worth asking on the Core Audio mailing list, where they’re always very helpful.
Hi Michael,
First of all, thanks for the great article.
Just want to tell that when you’re finished it is also good to call
AudioComponentInstanceDispose(audioUnit);