A Tasty Pixel » Audio

Some updates to TPCircularBuffer

I’ve recently made some updates to TPCircularBuffer (on GitHub), my C circular/ring buffer implementation, which add a memory barrier on read and write, inline the main functions for a potential performance boost, and add support for use within C++ projects.

If you’re using TPCircularBuffer at all, I recommend updating!

Circular (ring) buffer plus neat virtual memory mapping trick

I’ve just updated my C circular buffer implementation, adopting the trick originally proposed by Philip Howard and adapted to Darwin by Kurt Revis: A virtual copy of the buffer is inserted directly after the end of the buffer, so that you can write past the end of the buffer, but have your writes automatically wrapped around to the start — no need to manually implement buffer wrapping logic.

This dramatically simplifies the use of a circular buffer — you can use chunks of the buffer without any need to worry about where the wrap point is.

See the new implementation, which is thread-safe with one consumer and one producer, with no need for locks, making it perfect for use with high-priority Core Audio threads, on GitHub: TPCircularBuffer.

There’s a basic example of its use over on the original post.

Experiments with precise timing in iOS

iOS is by no means a realtime operating system, but I’m aware that NSTimer and NSObject’s performSelector:withObject:afterDelay: mechanism aren’t particularly accurate, and I was curious to see whether I could do better.

Hands up, backing away

Disclaimer: I am not at all an expert in realtime programming, or Mach, or iOS-device optimisation, so this is pretty much a fumble in the dark. I won’t be at all offended if anyone wishes to shoot me down and offer a more sensible solution — in fact, please do! Until then, watch as I stumble on…

Also note that there are often ways to eliminate the need for precise timing of this nature, by architecting code appropriately — when it comes to audio, for example, CoreAudio provides a very accurate time base in render callbacks. For things like metronomes or audio synthesizers, it’s always better to establish a starting time, and use the difference between the current time and the starting time in order to determine state, rather than using a timer to advance the state. Still, sometimes, you just need a timer…

What the blazes?

So, I’m working on an update to Loopy, which uses a shared clock object to synchronise tracks and a variety of events (like user interface updates or timed track manipulations). A tester noted that the mute/unmute quantisation feature that I’ve recently implemented, which will mute or unmute a loop at its starting point (rather than whenever you tap it), tends to overshoot a little, resulting in a small part of the beginning of the loop being audible.

Of course, there are other solutions to this particular problem (like stopping or starting playback from the audio render callback, and using Core Audio’s timestamps for exact timing), but I use timers in other places outside Core Audio’s domain, which makes Core Audio’s timing mechanism unavailable, and I wanted to see how accurate I could get the timing.

Our friend, mach_wait_until

I read in several places mention of the Mach API utility mach_wait_until (from mach/mach_time.h), which is very low-level and supposedly fairly accurate. So, based on that lead, I put together an Objective-C singleton class that launches a high-priority thread, and uses said thread to schedule events.

An NSArray of events are maintained, and a scheduleAction:target:inTimeInterval: routine creates and adds events to this array, then pokes the thread.

The thread grabs the next event in sequence, then uses mach_wait_until to sleep until the time of the next event arrives, then performs the specified action on the target. It’s kinda a DIY NSRunLoop.

Here’s a comparison between this technique, and just using performSelector:withObject:afterDelay: (which schedules a timer on the NSRunLoop), observed while performing various scheduled events within Loopy running on my iPhone 4 with the debugger, and derived by comparing the time of event execution with the event’s scheduled time:

Mechanism	Average discrepancy	Minimum discrepancy	Maximum discrepancy
NSRunLoop	16.9ms	0.25ms	153.7ms
TPPreciseTimer	5.5ms	0.033ms	72.0ms

That was attempt number 1: This seems to give us about 11.4ms better accuracy on average (three times more accurate).

Not bad, but it turns out mach_wait_until isn’t really that accurate, particularly if there’s a bunch of other stuff going on in other threads.

Spinning, for fun and profit

For my second attempt, the thread performs a mach_wait_until until just before the event is due, then performs a spin lock until the time arrives, using mach_absolute_time to compare the current time with the target time.

This gave further improved results — here’s that table again, but with the new scheme added, with a few different spin lock times:

Mechanism	Average discrepancy	Minimum discrepancy	Maximum discrepancy
NSRunLoop	16.9ms	0.25ms	153.7ms
TPPreciseTimer (original)	5.5ms	0.033ms	72.0ms
TPPreciseTimer (10ms spinlock)	6.0ms	0.002ms	76.5ms
TPPreciseTimer (100ms spinlock)	3.7ms	0.002ms	44.8ms
TPPreciseTimer (200ms spinlock)	2.91ms	0.002ms	74.1ms

It appears that the more stuff there is going on in other threads, the more likely the mach_absolute_time call is to overshoot. So, the more time spent in the spin lock, the more leeway mach_absolute_time has to wait too long. Of course, that’s at the cost of making the CPU twiddle its thumbs for the duration.

Better than a punch in the knee

The results weren’t quite as fantastic as I’d hoped — still within the same order of magnitude, that’s for sure — but the average case for the 200ms spinlock approach is 14ms, or 5.8 times, more accurate than the traditional approach, and the minimum case is dramatically better.

You know, I think if I was aware of the results in advance, I might not bother, but I’ll stick with my hard-won 14ms now that I’m here (that’s 617 audio samples, I’ll have you know).

If anyone’s curious about the implementation (or wants to take a stab at doing better), here it is, along with a wildly simplistic commandline test app: TPPreciseTimer.zip

Now to get back to some real work.

Addendum: GCD follow-up

Chris in the comments below suggested trying a GCD-based approach, using dispatch_after. Curious, I rigged it up, and these are the stats, collected the same way as above, added to the prior table:

Mechanism	Average discrepancy	Minimum discrepancy	Maximum discrepancy
NSRunLoop	16.9ms	0.25ms	153.7ms
TPPreciseTimer (original)	5.5ms	0.033ms	72.0ms
TPPreciseTimer (10ms spinlock)	6.0ms	0.002ms	76.5ms
TPPreciseTimer (100ms spinlock)	3.7ms	0.002ms	44.8ms
TPPreciseTimer (200ms spinlock)	2.91ms	0.002ms	74.1ms
dispatch_after (main queue)	14.8ms	0.16ms	161.2ms
dispatch_after (dedicated queue)	19.2ms	0.1ms	174.9ms
dispatch_after (dedicated queue + 100ms spinlock)	22.4ms	0.002ms	306.8ms

So, they appear pretty much the same as the NSRunLoop stats.

Easy AAC compressed audio conversion on iOS

From the iPhone 3Gs up, it’s possible to encode compressed AAC audio from PCM audio data. That means great things for apps that deal with audio sharing and transmission, as the audio can be sent in compressed form, rather than sending huge PCM audio files over the network.

Apple’s produced some sample code (iPhoneExtAudioFileConvertTest), which demonstrates how it’s done, but their implementation isn’t particularly easy to use in existing projects, as it requires some wrapping to make it play nice.

For my upcoming looper app Loopy, I’ve put together a simple Objective-C class that performs the conversion of any audio file to an AAC-encoded m4a, asynchronously with a delegate, or converts any audio provided by a data source class (which provides for recording straight to AAC) and I thought I’d share it.

A simple, fast circular buffer implementation for audio processing

Circular buffers are pretty much what they sound like – arrays that wrap around. They’re fantastically useful as scratch space for audio processing, and generally passing audio around efficiently.

They’re designed for FIFO (first-in-first-out) use, like storing audio coming in the microphone for later playback or processing.

Consider a naive alternative: You copy the incoming audio into an NSData you allocate, and then pass that NSData off. This means you’re allocating memory each time, and deallocating the memory later once you’re done processing. That allocation incurs a penalty, which can be a show-stopper when part of an audio pipeline – The Core Audio documentation advises against any allocations when within a render callback, for example.

Alternatively, you can allocate space in advance, and write to that, but that has problems too: Either you have a synchronisation nightmare, or you spend lots of time moving bytes around so that the unprocessed audio is always at the beginning of the array.

A better solution is to use a circular buffer, where data goes in at the head, and is read from the tail. When you produce data at the head, the head moves up the array, and wraps around at the end. When you consume at the tail, the tail moves up too, so the tail chases the head around the circle.

Here’s a simple C implementation I recently put together for my app Loopy: TPCircularBuffer

What I’ve been up to: Loopy 2 (track importing)

Here’s the result of the last few days’ work: Loopy 2 now has track importing. Drag audio files into Loopy’s documents folder in iTunes, then import into tracks. Loops are automatically time-fitted for perfect synchronisation, using the frankly awesome Dirac audio processing library.

A quick-and-dirty audio sample mixing technique to avoid clipping

In the real world, when you hear two sounds at once, what you’re hearing is the combination (in the “+” sense) of the two noises. If you put five hundred drummers in the same room and, avoiding the obvious drummer jokes for now, told them all to play, you’d get drummer 1 + drummer 2 + … + drummer 500 (also bleeding ears).

With digital audio though, the volume doesn’t go up to oh-god-please-make-them-stop – it’s limited to a small dynamic range.

So, digital mixing actually requires a little thought in order to avoid overflowing these bounds and clipping. I recently came across this when writing some mixing routines for my upcoming app Loopy 2, and found a very useful discussion on mixing digital audio by software developer and author Viktor Toth.

The basic concept is to mix in such a way that we stay within the dynamic range of the target audio format, while representing the dynamics of the mixed signals as faithfully as possible.

Pushing MultiChannelMixer to the limit

A friend made an interesting suggestion to an issue I’m facing in the upcoming Loopy 2, and I thought I’d do some investigation: How many tracks can the MultiChannelMixer (kAudioUnitSubType_MultiChannelMixer) manage at once?

He was quite optimistic, and as it turns out, he was right: It’s rather capable.

I modified the iPhoneMultichannelMixerTest sample project to add a bunch of channels, and measured how my iPhone 4 performed. It looks pretty linear: there’s pretty much a 1:1 relationship between number of channels, and the CPU usage, actually.

Of course, this is on the newest-most powerful iPhone, but there was no stuttering, and the interface (admittedly simple as it is) was fully responsive, including setting output volume, even with 100 channels. You’d probably want to stick with a maximum number of channels around the 75-100 mark, less for targeting lesser devices, but that’s a pretty generous limit.

Not bad.

Update: Not such great news for the iPhone 3G I just tested this on, though — it freaks at anything more than 20 channels, and isn’t too responsive with 20. The 3Gs seems to behave almost as well as the iPhone 4, but the CPU:channels relationship is more like 2:1.