Let’s consider the demands of a phase vocoder, commonly used for audio time-stretching (in the Rubber Band Library for example). To make a phase vocoder you chop up a sampled audio signal into short overlapping frames, take the short-time Fourier transform of each, convert its complex output to polar form, modify the phases, transform back again, and glue back together with a different amount of overlap from the original, thus changing the overall duration. Most of the overhead typically is in the forward and inverse Fourier transforms and in Cartesian-to-polar conversion (arctangents are slow).
If you have a mono signal sampled at 44.1kHz and you’re using a simple design with a 2048-point Fast Fourier Transform (FFT) with 75% overlap, you’ll need to go through this process at least 86 times a second to run at the equivalent of real-time. Budgeting maybe a third of the overall time for FFTs, and allowing equal time for forward and inverse transforms, I suppose you want to be able to do at least 500 transforms per second to have a decent chance of running nicely.
These are the implementations I tried. (In parentheses are the labels I’ve used to identify them in the following tables.)
- Multi-lingual FFT by Nayuki (nayuki)
- The Nayuki code, but rearranged into an object that does sin/cos table generation once on construction: code is here (nayuki-obj)
- FFT.js by Jens Nockert (nockert)
- JSFFT by Nick Jones (dntj)
- KissFFT, a small but reasonably sophisticated implementation by Mark Borgerding (kissfft)
- Don Cross’s simple public domain implementation (cross)
Plus a couple that came up in searches but that I did not end up testing:
- Timbre.js by mohayonao — this is designed to pipeline with other classes in the same framework rather than running standalone
- dsp.js by Corban Brook — only provides an API to retrieve the magnitude spectrum rather than complex frequency-domain output
I timed only the real-to-complex forward transform. This favours those implementations that optimise for real inputs, which of the above I think means only KissFFT and FFTW3. You can find all the test code in this code project, and I put up a page here which runs the tests in your browser.
These figures are for the number of 2048-point forward transforms per second: higher numbers are better.
Most of these results are from desktop browsers, but I’ve included a couple of mobile devices (thanks to James and Andy for the iPhone numbers). I’ve also included timings of a sample native-code build, made using FFTW3 on Linux, Accelerate on OSX and iOS, and a 32-bit IPP build on Windows.
|Lenovo Yoga 2 Pro, Firefox 41, Linux||5200||8000||2900||850||23000||18000||10000||94000|
|Lenovo Yoga 2 Pro, Chromium 46, Linux||5200||6500||3600||2500||15000||7000||5800||94000|
|Dell XPS 13 2015, Firefox 41, Windows 10||4600||7300||2300||660||18000||11000||7400||43000|
|Dell XPS 13 2015, MS Edge, Windows 10||5000||6900||1800||990||13000||1900||4900||43000|
|MacBook Air 11″ 2012, Firefox 40, OSX 10.10||4000||5200||2200||720||8000||12000||7400||54000|
|MacBook Air 11″ 2012, Safari 8, OSX 10.10||3000||3500||1300||890||10000||2800||6000||54000|
|Nokia Lumia 620, IE, WP8.1||380||390||190||94||710||310|
|Alcatel OneTouch Fire E, Firefox OS 2.0||270||310||150||87||790||270||320|
|Apple iPad mini 2012, Safari, iOS 9||300||340||150||120||710||330||390||2000|
|Apple iPhone 6, iOS UIWebView, iOS 9||72||72||90||71||180||95||61|
|Apple iPhone 6, Safari, iOS 9||3800||4300||960||710||8700||2000||4800|
|Apple iPhone 6S, Safari, iOS 9||6000||7000||1600||1000||9900||2500||7400|
The FFTW run didn’t seem to want to produce a result on the WP8 phone.
If code size is a primary concern, the Nayuki code appears to be an effective implementation of the core algorithm.
Overall we’re seeing between 2.4 and 5.4 times slowdown compared to a fast native-code version. This may not be a great use of CPU and battery if signal processing is your application’s main job, but good enough to make some interesting things.