Experiments with precise timing in iOS
iOS is by no means a realtime operating system, but I’m aware that NSTimer and NSObject’s performSelector:withObject:afterDelay: mechanism aren’t particularly accurate, and I was curious to see whether I could do better.
Hands up, backing away
Disclaimer: I am not at all an expert in realtime programming, or Mach, or iOS-device optimisation, so this is pretty much a fumble in the dark. I won’t be at all offended if anyone wishes to shoot me down and offer a more sensible solution — in fact, please do! Until then, watch as I stumble on…
Also note that there are often ways to eliminate the need for precise timing of this nature, by architecting code appropriately — when it comes to audio, for example, CoreAudio provides a very accurate time base in render callbacks. For things like metronomes or audio synthesizers, it’s always better to establish a starting time, and use the difference between the current time and the starting time in order to determine state, rather than using a timer to advance the state. Still, sometimes, you just need a timer…
What the blazes?
So, I’m working on an update to [Loopy](http://loopyapp.com), which uses a shared clock object to synchronise tracks and a variety of events (like user interface updates or timed track manipulations). A tester noted that the mute/unmute quantisation feature that I’ve recently implemented, which will mute or unmute a loop at its starting point (rather than whenever you tap it), tends to overshoot a little, resulting in a small part of the beginning of the loop being audible.
Of course, there are other solutions to this particular problem (like stopping or starting playback from the audio render callback, and using Core Audio’s timestamps for exact timing), but I use timers in other places outside Core Audio’s domain, which makes Core Audio’s timing mechanism unavailable, and I wanted to see how accurate I could get the timing.
Our friend, mach_wait_until
I read in several places mention of the Mach API utility mach_wait_until (from mach/mach_time.h), which is very low-level and supposedly fairly accurate. So, based on that lead, I put together an Objective-C singleton class that launches a high-priority thread, and uses said thread to schedule events.
An NSArray of events are maintained, and a scheduleAction:target:inTimeInterval: routine creates and adds events to this array, then pokes the thread.
The thread grabs the next event in sequence, then uses mach_wait_until to sleep until the time of the next event arrives, then performs the specified action on the target. It’s kinda a DIY NSRunLoop.
Here’s a comparison between this technique, and just using performSelector:withObject:afterDelay: (which schedules a timer on the NSRunLoop), observed while performing various scheduled events within Loopy running on my iPhone 4 with the debugger, and derived by comparing the time of event execution with the event’s scheduled time:
Mechanism | Average discrepancy | Minimum discrepancy | Maximum discrepancy |
---|---|---|---|
NSRunLoop | 16.9ms | 0.25ms | 153.7ms |
TPPreciseTimer | 5.5ms | 0.033ms | 72.0ms |
That was attempt number 1: This seems to give us about 11.4ms better accuracy on average (three times more accurate).
Not bad, but it turns out mach_wait_until isn’t really that accurate, particularly if there’s a bunch of other stuff going on in other threads.
Spinning, for fun and profit
For my second attempt, the thread performs a mach_wait_until until just before the event is due, then performs a spin lock until the time arrives, using mach_absolute_time to compare the current time with the target time.
This gave further improved results — here’s that table again, but with the new scheme added, with a few different spin lock times:
Mechanism | Average discrepancy | Minimum discrepancy | Maximum discrepancy |
---|---|---|---|
NSRunLoop | 16.9ms | 0.25ms | 153.7ms |
TPPreciseTimer (original) | 5.5ms | 0.033ms | 72.0ms |
TPPreciseTimer (10ms spinlock) | 6.0ms | 0.002ms | 76.5ms |
TPPreciseTimer (100ms spinlock) | 3.7ms | 0.002ms | 44.8ms |
TPPreciseTimer (200ms spinlock) | 2.91ms | 0.002ms | 74.1ms |
It appears that the more stuff there is going on in other threads, the more likely the mach_absolute_time call is to overshoot. So, the more time spent in the spin lock, the more leeway mach_absolute_time has to wait too long. Of course, that’s at the cost of making the CPU twiddle its thumbs for the duration.
Better than a punch in the knee
The results weren’t quite as fantastic as I’d hoped — still within the same order of magnitude, that’s for sure — but the average case for the 200ms spinlock approach is 14ms, or 5.8 times, more accurate than the traditional approach, and the minimum case is dramatically better.
You know, I think if I was aware of the results in advance, I might not bother, but I’ll stick with my hard-won 14ms now that I’m here (that’s 617 audio samples, I’ll have you know).
If anyone’s curious about the implementation (or wants to take a stab at doing better), here it is, along with a wildly simplistic commandline test app: TPPreciseTimer.zip
Now to get back to some real work.
Addendum: GCD follow-up
Chris in the comments below suggested trying a GCD-based approach, using dispatch_after. Curious, I rigged it up, and these are the stats, collected the same way as above, added to the prior table:
Mechanism | Average discrepancy | Minimum discrepancy | Maximum discrepancy |
---|---|---|---|
NSRunLoop | 16.9ms | 0.25ms | 153.7ms |
TPPreciseTimer (original) | 5.5ms | 0.033ms | 72.0ms |
TPPreciseTimer (10ms spinlock) | 6.0ms | 0.002ms | 76.5ms |
TPPreciseTimer (100ms spinlock) | 3.7ms | 0.002ms | 44.8ms |
TPPreciseTimer (200ms spinlock) | 2.91ms | 0.002ms | 74.1ms |
dispatch_after (main queue) | 14.8ms | 0.16ms | 161.2ms |
dispatch_after (dedicated queue) | 19.2ms | 0.1ms | 174.9ms |
dispatch_after (dedicated queue + 100ms spinlock) | 22.4ms | 0.002ms | 306.8ms |
So, they appear pretty much the same as the NSRunLoop stats.
Read More