Blog

Using RemoteIO audio unit

I’ve had nasty old time trying to get some audio stuff going on the iPhone, no thanks to Apple’s lack of documentation. If you’re an iPhone developer interested in getting RemoteIO/IO Remote/whatever it’s called working on the iPhone… Do I have good news for you. Read on.

Wanna skip the Core Audio learning curve and start writing code straight away? Check out my new project:

The Amazing Audio Engine: Core Audio, Cordially

Update: Thanks to Joel Reymont, we now have an explanation for the “CrashIfClientProvidedBogusAudioBufferList” iPhone simulator bug: The simulator doesn’t like mono audio. Thanks, Joel!

Update: Happily, Apple have now created some excellent documentation on Remote IO, with some good sample projects. I recommend using that as a resource, now that it’s there, as that will continue to be updated.

Update: Tom Zicarelli has created a very extensive sample app that demonstrates the use of AUGraph, with all sorts of goodies.

So, we need to obtain an instance of the RemoteIO audio unit, configure it, and hook it up to a recording callback, which is used to notify you that there is data ready to be grabbed, and where you pull the data from the audio unit.


Overview

  1. Identify the audio component (kAudioUnitType_Output/ kAudioUnitSubType_RemoteIO/ kAudioUnitManufacturerApple)
  2. Use AudioComponentFindNext(NULL, &descriptionOfAudioComponent) to obtain the AudioComponent, which is like the factory with which you obtain the audio unit
  3. Use AudioComponentInstanceNew(ourComponent, &audioUnit) to make an instance of the audio unit
  4. Enable IO for recording and possibly playback with AudioUnitSetProperty
  5. Describe the audio format in an AudioStreamBasicDescription structure, and apply the format using AudioUnitSetProperty
  6. Provide a callback for recording, and possibly playback, again using AudioUnitSetProperty
  7. Allocate some buffers
  8. Initialise the audio unit
  9. Start the audio unit
  10. Rejoice

Here’s my code: I’m using both recording and playback. Use what applies to you!

Initialisation

Initialisation looks like this. We have a member variable of type AudioComponentInstance which will contain our audio unit.

The audio format described below uses SInt16 for samples (i.e. signed, 16 bits per sample)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
#define kOutputBus 0
#define kInputBus 1
 
// ...
 
 
OSStatus status;
AudioComponentInstance audioUnit;
 
// Describe audio component
AudioComponentDescription desc;
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_RemoteIO;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;
 
// Get component
AudioComponent inputComponent = AudioComponentFindNext(NULL, &desc);
 
// Get audio units
status = AudioComponentInstanceNew(inputComponent, &audioUnit);
checkStatus(status);
 
// Enable IO for recording
UInt32 flag = 1;
status = AudioUnitSetProperty(audioUnit, 
                              kAudioOutputUnitProperty_EnableIO, 
                              kAudioUnitScope_Input, 
                              kInputBus,
                              &flag, 
                              sizeof(flag));
checkStatus(status);
 
// Enable IO for playback
status = AudioUnitSetProperty(audioUnit, 
                              kAudioOutputUnitProperty_EnableIO, 
                              kAudioUnitScope_Output, 
                              kOutputBus,
                              &flag, 
                              sizeof(flag));
checkStatus(status);
 
// Describe format
audioFormat.mSampleRate			= 44100.00;
audioFormat.mFormatID			= kAudioFormatLinearPCM;
audioFormat.mFormatFlags		= kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
audioFormat.mFramesPerPacket	= 1;
audioFormat.mChannelsPerFrame	= 1;
audioFormat.mBitsPerChannel		= 16;
audioFormat.mBytesPerPacket		= 2;
audioFormat.mBytesPerFrame		= 2;
 
// Apply format
status = AudioUnitSetProperty(audioUnit, 
                              kAudioUnitProperty_StreamFormat, 
                              kAudioUnitScope_Output, 
                              kInputBus, 
                              &audioFormat, 
                              sizeof(audioFormat));
checkStatus(status);
status = AudioUnitSetProperty(audioUnit, 
                              kAudioUnitProperty_StreamFormat, 
                              kAudioUnitScope_Input, 
                              kOutputBus, 
                              &audioFormat, 
                              sizeof(audioFormat));
checkStatus(status);
 
 
// Set input callback
AURenderCallbackStruct callbackStruct;
callbackStruct.inputProc = recordingCallback;
callbackStruct.inputProcRefCon = self;
status = AudioUnitSetProperty(audioUnit, 
                              kAudioOutputUnitProperty_SetInputCallback, 
                              kAudioUnitScope_Global, 
                              kInputBus, 
                              &callbackStruct, 
                              sizeof(callbackStruct));
checkStatus(status);
 
// Set output callback
callbackStruct.inputProc = playbackCallback;
callbackStruct.inputProcRefCon = self;
status = AudioUnitSetProperty(audioUnit, 
                              kAudioUnitProperty_SetRenderCallback, 
                              kAudioUnitScope_Global, 
                              kOutputBus,
                              &callbackStruct, 
                              sizeof(callbackStruct));
checkStatus(status);
 
// Disable buffer allocation for the recorder (optional - do this if we want to pass in our own)
flag = 0;
status = AudioUnitSetProperty(audioUnit, 
                              kAudioUnitProperty_ShouldAllocateBuffer,
                              kAudioUnitScope_Output, 
                              kInputBus,
                              &flag, 
                              sizeof(flag));
 
// TODO: Allocate our own buffers if we want
 
// Initialise
status = AudioUnitInitialize(audioUnit);
checkStatus(status);

Then, when you’re ready to start:

1
2
OSStatus status = AudioOutputUnitStart(audioUnit);
checkStatus(status);

And to stop:

1
2
OSStatus status = AudioOutputUnitStop(audioUnit);
checkStatus(status);

Then, when we’re finished:

1
AudioUnitUninitialize(audioUnit);

And now for our callbacks.

Recording

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
static OSStatus recordingCallback(void *inRefCon, 
                                  AudioUnitRenderActionFlags *ioActionFlags, 
                                  const AudioTimeStamp *inTimeStamp, 
                                  UInt32 inBusNumber, 
                                  UInt32 inNumberFrames, 
                                  AudioBufferList *ioData) {
 
    // TODO: Use inRefCon to access our interface object to do stuff
    // Then, use inNumberFrames to figure out how much data is available, and make
    // that much space available in buffers in an AudioBufferList.
 
    AudioBufferList *bufferList; // <- Fill this up with buffers (you will want to malloc it, as it's a dynamic-length list)
 
    // Then:
    // Obtain recorded samples
 
    OSStatus status;
 
    status = AudioUnitRender([audioInterface audioUnit], 
                             ioActionFlags, 
                             inTimeStamp, 
                             inBusNumber, 
                             inNumberFrames, 
                             bufferList);
    checkStatus(status);
 
    // Now, we have the samples we just read sitting in buffers in bufferList
    DoStuffWithTheRecordedAudio(bufferList);
    return noErr;
}

Playback

1
2
3
4
5
6
7
8
9
10
11
static OSStatus playbackCallback(void *inRefCon, 
                                  AudioUnitRenderActionFlags *ioActionFlags, 
                                  const AudioTimeStamp *inTimeStamp, 
                                  UInt32 inBusNumber, 
                                  UInt32 inNumberFrames, 
                                  AudioBufferList *ioData) {    
    // Notes: ioData contains buffers (may be more than one!)
    // Fill them up as much as you can. Remember to set the size value in each buffer to match how
    // much data is in the buffer.
    return noErr;
}

Finally, rejoice with me in this discovery ;)

Resources that helped

No thanks at all to Apple for their lack of accessible documentation on this topic – They really have a long way to go here! Also boo to them with their lack of search engine, and refusal to open up their docs to Google. It’s a jungle out there!

Update: You can adjust the latency of RemoteIO (and, in fact, any other audio framework) by setting the kAudioSessionProperty_PreferredHardwareIOBufferDuration property:

float aBufferLength = 0.005; // In seconds
AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration, 
                        sizeof(aBufferLength), &aBufferLength);

This adjusts the length of buffers that’re passed to you – if buffer length was originally, say, 1024 samples, then halving the number of samples halves the amount of time taken to process them.

Related posts

Tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

216 Comments

  1. Michael
    Posted January 1, 2012 at 3:50 pm | Permalink

    Hi Michael.

    after i have put together all this code , how would i get the actual audio data ? where exactly does it saved ?

    and when all this code is done ? i have to put it all into 1 method and then call it ? or should i only call -start method ?

    what operation should be taken to get the real time data ?

    i have spent days to understand it but i couldnt . how exactly i

    • Posted January 8, 2012 at 12:02 pm | Permalink

      Hi, Michael – it sounds like another tutorial might be in order. Please stay tuned, I’ll put one together over the next week or two and post it on the blog.

  2. Peter Kramer
    Posted March 12, 2012 at 11:35 pm | Permalink

    Jake asked back on November 13, 2010 if there was a way to get

    “…input from iPhone mic and play back on bluetooth speaker…”

    Is there? If not, why not?

    • Posted March 13, 2012 at 10:07 am | Permalink

      Hi Peter,

      I’m not certain – I haven’t played with bluetooth much. I do know that iOS’s audio routing capabilities are pretty limited, so it could go either way. My suggestion is to go check out the audio session documentation, and see what’s there. If it lets you connect a bluetooth speaker independently of the input system, then you should be good to go.

  3. Posted March 21, 2012 at 1:31 pm | Permalink

    I am still looking for a way to play sounds based on the notes of the pentagram. Can you help?

  4. StefanS
    Posted April 6, 2012 at 9:44 am | Permalink

    Hello Michael,

    This tutorial has helped me a lot. Thank you for that. My question: I am currently working on a VoIP application and I want to use SPEEX as a speech coder. This coder specifically asks for a audio buffer of 20 ms, 160 samples and sample rate of 8000Hz. However, I don’t think it is possible to set the buffer length to exactly 20 ms, or am I missing something? And if I set the sample rate to 8000 for the Remote IO unit I get inNumberFrames = 93 or 92. If you do the math, for a buffer of 20 ms and sample rate of 8000, I should get exactly 160 samples. Important note: I am still working in the simulator. Another think, do you think that maybe Audio Queues would be a better solution for a VoIP application?

    Thanks a lot, Stefan

    • Posted April 6, 2012 at 10:40 am | Permalink

      Hi Stefan,

      Core Audio’s never that exact – it tries to find the closest parameters to what you request, but it’ll never be exact. If you need exactly 160 samples at a time for the SPEEX conversion, then use a circular buffer to store the audio and process it in chunks of the required size.

      No, Audio Queues isn’t as low-latency as Remote IO. You definitely want Remote IO for a VoIP app.

  5. XXX
    Posted April 8, 2012 at 12:30 am | Permalink

    StefanS, I try AudioQueue for VoIP. 160 samples is work for 8000Hz. But callback calls every 1-2 ms, but not every 20ms interval. After 25 calls it paused for 512 ms and calls every 1-2 ms again in circle. Michael is right.

    • StefanS
      Posted April 8, 2012 at 9:42 am | Permalink

      Thank you both. I’ll work with audio units (not audio queues) and I’ll try to implement the circular buffer created by Michael. As my work proceeds, I may have some additional questions, I am fairly new to iOS development. :)

  6. XXX
    Posted April 11, 2012 at 10:43 am | Permalink

    Hello Michael,

    I try to use circular buffer to store audio data and process it in chunks of 20ms samples for VoIP app, but I can’t do it because I need accurate interval 20 ms for send packet with data to network and I don’t know how do it because NSTimer is not so accurate. Can you help with this question? Which timer is better for chunking audio data from your buffer with small interval?

    • Posted April 11, 2012 at 10:45 am | Permalink

      Why would you need to use a timer? Why not just process 20 samples at a time, as they become available in the buffer?

      • StefanS
        Posted April 11, 2012 at 11:20 am | Permalink

        My question is, If calling of the callback exactly every 20ms is not possible then it wouldn’t be ok to just put the coder into the callback and process the audio -> there’s the problem of synchronization. The codec should be called more frequently than the callback. Where should I put the codec and how to schedule it? Anyone? :)

        Actually, I am still having problems setting the hardware sample rate. I set it as 8000Hz, and when I initialize the audio session I get its value (it says 8000Hz so that’s ok) but then somehow my application changes this value to 44100Hz (I hear it). So, in my callbacks the inNumberFrames is 512 (for 44100) and if I try to set the Audio Unit’s sample rate to 8000 the inNumberFrames becomes 93 93 92 (the value is not constant). Does anyone have any idea how this happens? Could this be a Simulator related problem? For 8000Hz and a buffer duration of 20ms one should get exactly 160 samples.

        Thanks a lot, Stefan

        • Posted April 11, 2012 at 11:26 am | Permalink

          The simulator can behave very differently to the device. When working with audio, always have a device handy, because you’ll see dramatically different effects. You can use the simulator sometimes, but unless you’re doing most of your testing using the device, you’re just making life insanely hard for yourself.

          As for processing the buffer, just process it in blocks of 20 samples. I don’t really understand why there’s a synchronisation problem…Or why you’re limited to processing just one 20 sample block per callback. Just loop!

          Whether you do it on the realtime thread in the callback, or in an offline processing thread is up to you – it depends on whether the coder is suitable for use in a realtime context (ie. whether it holds locks, allocates memory, takes a long time, etc.).

      • XXX
        Posted April 11, 2012 at 12:31 pm | Permalink

        Is audio data passed to the buffer via callback with intervals of 20ms? This solves the problem. Or may be audio data comes to buffer with different intervals (20ms differs +/-5-10ms)?

  7. StefanS
    Posted April 17, 2012 at 9:54 am | Permalink

    Hello,

    Does anyone here know how the iLBC codec is used? Apparently, my Convertor does not accept when the mFormatID from the AudioStreamBasicDescription is set to kAudioFormatiLBC.

    Thanks, Stefan

  8. owen
    Posted May 10, 2012 at 4:03 pm | Permalink

    Im trying to use this code however i get undeclared identifiers for almost all data types. i have looked them up and they seem to be in the AudioUnit.framework, however that framework is added to my link binarys with libraries so i dont understand why the data types arent recognized for example at the very top (the first 2 lines) AudioComponentInstance audioUnit; AudioComponentDescription desc; both are “undeclared identifier”

    • owen
      Posted May 10, 2012 at 6:10 pm | Permalink

      answer: you not only need to include it in the link libraries page but also add

      import

  9. StefanS
    Posted May 11, 2012 at 8:37 am | Permalink

    Hello Michael,

    About this:

    // Disable buffer allocation for the recorder (optional – do this if we want to pass in our own) flag = 0; status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ShouldAllocateBuffer, kAudioUnitScope_Output, kInputBus, &flag, sizeof(flag));

    What buffer does it refer to? I see no difference in the behavior of my application if I decide to disable it or not. I use temporary AudioBuffer and AudioBufferList to store the input data and then copy this data to the Circular buffer you have provided.

    Another question: About this Voice-Processing IO Audio Unit and its acoustic echo cancellation. Do I simply use it in my code and this nice echo cancellation effect “magically” appears, or should I do some special configuration beforehand? Does it work at all?

    Thank you, you’ve been such a help to me and my beginnings in iOS Audio development. Stefan

    • Posted May 11, 2012 at 10:12 am | Permalink

      Hi Stefan,

      That refers to the audio unit’s own internal buffer – it’s really quite a minor detail, but it saves a little memory allocation if you’re providing your own buffers instead. If in doubt, it’s save to leave it out, though.

      Yep, you’ll get echo cancellation for free, as soon as you start using VPIO.

      You’re welcome =)

  10. StefanS
    Posted May 15, 2012 at 3:12 pm | Permalink

    Hello Michael,

    I finally got the chance to try my application on a device and not just the simulator. It all works perfectly, except a minor delay, which I will look into.

    My question: how can I output the audio through the speakers (the loud ones, so I can achieve a handsfree functionality)? So far, I can only hear the audio through the headphones or If I press my ear against the phone as in a standard conversation.

    As always, Thanks:) Stefan

  11. owen
    Posted May 15, 2012 at 4:37 pm | Permalink

    I hav e struggled to get this work for a while now and its driving me nuts. However now I am kinda worried because after reading thought he comments I see you posted: “Hey Rarejai – The ULaw format is for storage only, for use with things like the Audio File Services. Remote IO only works with PCM.” Which is what im trying to do (stream u-law audio) from the mic. I guess my question is if thats the case then what would this do? AudioStreamBasicDescription audioFormat; audioFormat.mSampleRate = 8000.00;//44100.00; audioFormat.mFormatID = kAudioFormatULaw; // audioFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked; audioFormat.mFramesPerPacket = 1; audioFormat.mChannelsPerFrame = 1; audioFormat.mBitsPerChannel = 16; audioFormat.mBytesPerPacket = 2; audioFormat.mBytesPerFrame = 2;

    I also have this question open http://stackoverflow.com/questions/10501236/stream-media-from-iphone because when I use a variation of this code, nothing happens: void audioDataReceiver (AudioBufferList bufferList) { double *q = (double *)(&bufferList)->mBuffers[0].mData;

    queue = [[NSOperationQueue alloc]init];

    for(int i=0; i < strlen((const char *)(&bufferList)->mBuffers[0].mData); i++) {

    NSData * dataBuffer = [NSData dataWithBytes:&amp;q[i] length:sizeof(double)];
    

    // NSData * dataBuffer =[NSData dataWithBytes:(&bufferList)->mBuffers[0].mData length:sizeof((&bufferList)->mBuffers[0].mData)];

        client= [AFHTTPClient clientWithBaseURL:[NSURL URLWithString:[NSString stringWithFormat:@"https://%@/",serverAddress]]];

    NSMutableDictionary *parameters = [NSMutableDictionary dictionary];
    
            NSMutableURLRequest * request = [client multipartFormRequestWithMethod:@"POST" path:[NSString stringWithFormat:@"transmitaudio?id=%@", sessionID] parameters:parameters constructingBodyWithBlock: ^(id &lt;AFMultipartFormData&gt;formData) 
                     {
    

    // [formData appendPartWithFileData:self.audioHandler.dataBuffer name:@"micaudio" fileName:@"sound.caf" mimeType:@"audio/basic"];

                             [formData appendData:dataBuffer];

    // NSLog(@”request: %@”,request); // NSLog(@”client: %@”,client); }]; [request setValue:@"audio/basic" forHTTPHeaderField:@"content-type"]; [request setValue:@"99999" forHTTPHeaderField:@"Content-Length"]; [request setValue:@"Keep-Alive" forHTTPHeaderField:@"Connection"]; [request setValue:@"no-cache" forHTTPHeaderField:@"Cache-Control"];

        AFHTTPRequestOperation *operation = [[AFHTTPRequestOperation alloc] initWithRequest:request];
        [queue addOperation:operation];
    

    // NSLog(@”queue: %@”,queue); }

    }

  12. owen
    Posted May 15, 2012 at 4:41 pm | Permalink

    Sorry here is the pastebin for easier reading http://pastebin.com/hFSNnJct

3 Trackbacks

  1. By [Time code]; on January 17, 2009 at 7:12 am

    [...] complexity and move on to Audio Toolbox (or perhaps even Core Audio… a DevForums thread and a blog by developer Michael Tyson report extremely low latency by using the RemoteIO audio unit [...]

  2. [...] I also wouldn’t have gotten anywhere on VocaForm without Michael Tyson’s post on using the remoteIO AU. [...]

  3. [...] where you can do a lot of useful stuff, and then at the lowest level there are two types of Audio Unit: Remote I/O (or remoteio) and the Voice Processing Audio Unit [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use Markdown (surround code in `back-ticks`), or these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

Subscribe without commenting