Let’s consider the demands of a phase vocoder, commonly used for audio time-stretching (in the Rubber Band Library for example). To make a phase vocoder you chop up a sampled audio signal into short overlapping frames, take the short-time Fourier transform of each, convert its complex output to polar form, modify the phases, transform back again, and glue back together with a different amount of overlap from the original, thus changing the overall duration. Most of the overhead typically is in the forward and inverse Fourier transforms and in Cartesian-to-polar conversion (arctangents are slow).
If you have a mono signal sampled at 44.1kHz and you’re using a simple design with a 2048-point Fast Fourier Transform (FFT) with 75% overlap, you’ll need to go through this process at least 86 times a second to run at the equivalent of real-time. Budgeting maybe a third of the overall time for FFTs, and allowing equal time for forward and inverse transforms, I suppose you want to be able to do at least 500 transforms per second to have a decent chance of running nicely.
These are the implementations I tried. (In parentheses are the labels I’ve used to identify them in the following tables.)
- Multi-lingual FFT by Nayuki (nayuki)
- The Nayuki code, but rearranged into an object that does sin/cos table generation once on construction: code is here (nayuki-obj)
- FFT.js by Jens Nockert (nockert)
- JSFFT by Nick Jones (dntj)
- KissFFT, a small but reasonably sophisticated implementation by Mark Borgerding (kissfft)
- Don Cross’s simple public domain implementation (cross)
Plus a couple that came up in searches but that I did not end up testing:
- Timbre.js by mohayonao — this is designed to pipeline with other classes in the same framework rather than running standalone
- dsp.js by Corban Brook — only provides an API to retrieve the magnitude spectrum rather than complex frequency-domain output
I timed only the real-to-complex forward transform. This favours those implementations that optimise for real inputs, which of the above I think means only KissFFT and FFTW3. You can find all the test code in this code project, and I put up a page here which runs the tests in your browser.
These figures are for the number of 2048-point forward transforms per second: higher numbers are better.
Most of these results are from desktop browsers, but I’ve included a couple of mobile devices (thanks to James and Andy for the iPhone numbers). I’ve also included timings of a sample native-code build, made using FFTW3 on Linux, Accelerate on OSX and iOS, and a 32-bit IPP build on Windows.
|Lenovo Yoga 2 Pro, Firefox 41, Linux||5200||8000||2900||850||23000||18000||10000||94000|
|Lenovo Yoga 2 Pro, Chromium 46, Linux||5200||6500||3600||2500||15000||7000||5800||94000|
|Dell XPS 13 2015, Firefox 41, Windows 10||4600||7300||2300||660||18000||11000||7400||43000|
|Dell XPS 13 2015, MS Edge, Windows 10||5000||6900||1800||990||13000||1900||4900||43000|
|MacBook Air 11″ 2012, Firefox 40, OSX 10.10||4000||5200||2200||720||8000||12000||7400||54000|
|MacBook Air 11″ 2012, Safari 8, OSX 10.10||3000||3500||1300||890||10000||2800||6000||54000|
|Nokia Lumia 620, IE, WP8.1||380||390||190||94||710||310|
|Alcatel OneTouch Fire E, Firefox OS 2.0||270||310||150||87||790||270||320|
|Apple iPad mini 2012, Safari, iOS 9||300||340||150||120||710||330||390||2000|
|Apple iPhone 6, iOS UIWebView, iOS 9||72||72||90||71||180||95||61|
|Apple iPhone 6, Safari, iOS 9||3800||4300||960||710||8700||2000||4800|
|Apple iPhone 6S, Safari, iOS 9||6000||7000||1600||1000||9900||2500||7400|
The FFTW run didn’t seem to want to produce a result on the WP8 phone.
If code size is a primary concern, the Nayuki code appears to be an effective implementation of the core algorithm.
Overall we’re seeing between 2.4 and 5.4 times slowdown compared to a fast native-code version. This may not be a great use of CPU and battery if signal processing is your application’s main job, but good enough to make some interesting things.
Thanks for benchmarking my FFT code. I appreciate the results and analysis that you have published.
Now you made me curious – what would happen if you benchmarked my fft.c code compiled with Emscripten? I wonder how much of the speedup of the C versions are due to type arrays (etc.), or actually due to better algorithms.
That’s a really good idea. I might be able to find a moment to do that tomorrow.
OK, I’ve added that to the test page at http://all-day-breakfast.com/js-dsp-test/fft/. I haven’t run it on all the devices listed above, but the first couple of tests are:
This is with the sin/cos tables precalculated (through a quick & dirty hack). A difference from KissFFT is that the latter is built with single-precision floats by default; I did do a quick comparison (against a version of fft.c compiled with floats) and reckoned that it didn’t make a significant difference, at least on a 64-bit host, but the results above are with a double-precision fft.c.
So the difference is largely algorithmic it seems. A big chunk of that can be accounted for by the use of KissFFT’s dedicated real-to-complex forward transform, which the docs suggest should “save about 45% cpu” over complex-complex. It would be interesting to test the KissFFT complex-complex as well and check this figure (maybe another time).
It turned out I couldn’t just leave the two questions hanging in that last comment… so I’ve added single-precision fft.c and complex-complex KissFFT to the test page as well.
The results are much as expected, single-precision is a bit slower than double (at least in a desktop browser) and real-complex is a bit less than twice as fast as complex-complex.
did you test the web audio api’s built in ffts using the AnalyserNode? they should run at native speed if I’m not mistaken
I did not — if I ever knew that existed, I’d forgotten. Thanks for the prompt.
Looking at the API docs, it seems it only returns FFT magnitudes rather than the original complex values? It’s often necessary to have the phase information as well. (I mentioned above that I excluded another implementation because it only returned the magnitudes.)
Comments are closed.