Code · Work

… and an FFT in Standard ML

While writing my earlier post on Javascript FFTs, I also (for fun) adapted the Nayuki FFT code into Standard ML. You can find it here.

The original idea was to see how performance of SML compiled to native code, and SML compiled to Javascript using smltojs, compared with the previously-tested Javascript implementations and with any other SML versions I could find. (There’s FFT and DFT code from Rory McGuire here, probably written for clarity rather than speed, plus versions I haven’t tried in the SML/NJ and MLKit test libraries.)

I didn’t get as far as doing a real comparison, but I did note that it ran at more or less the same speed when compiled natively with MLton as the Javascript version does when run in Firefox, and that compiling to JS with smltojs produced much slower code. I haven’t checked where the overhead lies.

Code · Programs for Music

FFTs in Javascript

Javascript engines are quite fast these days. Can we get away with doing serious signal-processing in Javascript yet?

People are doing things like image processing tools and audio spectrum visualisation in Javascript, so the answer must sometimes be yes, but I wanted to get an idea how well you’d get on with more demanding tasks like spectral audio processing. Although I’ve done this in many languages, Javascript has not yet been among them. And given that one reason you would do this would be for portability, the question is connected with the performance of target platforms such as mobile browsers.

Let’s consider the demands of a phase vocoder, commonly used for audio time-stretching (in the Rubber Band Library for example). To make a phase vocoder you chop up a sampled audio signal into short overlapping frames, take the short-time Fourier transform of each, convert its complex output to polar form, modify the phases, transform back again, and glue back together with a different amount of overlap from the original, thus changing the overall duration. Most of the overhead typically is in the forward and inverse Fourier transforms and in Cartesian-to-polar conversion (arctangents are slow).

If you have a mono signal sampled at 44.1kHz and you’re using a simple design with a 2048-point Fast Fourier Transform (FFT) with 75% overlap, you’ll need to go through this process at least 86 times a second to run at the equivalent of real-time. Budgeting maybe a third of the overall time for FFTs, and allowing equal time for forward and inverse transforms, I suppose you want to be able to do at least 500 transforms per second to have a decent chance of running nicely.

I decided to try some existing FFT implementations in Javascript and see how they went.

The candidates

These are the implementations I tried. (In parentheses are the labels I’ve used to identify them in the following tables.)

  • Multi-lingual FFT by Nayuki (nayuki)
  • The Nayuki code, but rearranged into an object that does sin/cos table generation once on construction: code is here (nayuki-obj)
  • FFT.js by Jens Nockert (nockert)
  • JSFFT by Nick Jones (dntj)

I also took three existing C libraries and compiled them to Javascript with Emscripten:

  • KissFFT, a small but reasonably sophisticated implementation by Mark Borgerding (kissfft)
  • FFTW3, a big library by Matteo Frigo et al (fftw). This library is so focused on native platform-specific performance tricks that it seems crazy to cross-compile it to Javascript, and I would never have thought to do so if it wasn’t for the fact that KissFFT worked better than I’d expected
  • Don Cross’s simple public domain implementation (cross)

Plus a couple that came up in searches but that I did not end up testing:

  • Timbre.js by mohayonao — this is designed to pipeline with other classes in the same framework rather than running standalone
  • dsp.js by Corban Brook — only provides an API to retrieve the magnitude spectrum rather than complex frequency-domain output

I timed only the real-to-complex forward transform. This favours those implementations that optimise for real inputs, which of the above I think means only KissFFT and FFTW3. You can find all the test code in this code project, and I put up a page here which runs the tests in your browser.

Some numbers

These figures are for the number of 2048-point forward transforms per second: higher numbers are better.

Most of these results are from desktop browsers, but I’ve included a couple of mobile devices (thanks to James and Andy for the iPhone numbers). I’ve also included timings of a sample native-code build, made using FFTW3 on Linux, Accelerate on OSX and iOS, and a 32-bit IPP build on Windows.

For each platform I’ve highlighted the fastest “real Javascript” version, and the fastest version overall apart from the native code, in bold.

Javascript C/Emscripten Native
nayuki nayuki-obj nockert dntj kissfft fftw cross
Lenovo Yoga 2 Pro, Firefox 41, Linux 5200 8000 2900 850 23000 18000 10000 94000
Lenovo Yoga 2 Pro, Chromium 46, Linux 5200 6500 3600 2500 15000 7000 5800 94000
Dell XPS 13 2015, Firefox 41, Windows 10 4600 7300 2300 660 18000 11000 7400 43000
Dell XPS 13 2015, MS Edge, Windows 10 5000 6900 1800 990 13000 1900 4900 43000
MacBook Air 11″ 2012, Firefox 40, OSX 10.10 4000 5200 2200 720 8000 12000 7400 54000
MacBook Air 11″ 2012, Safari 8, OSX 10.10 3000 3500 1300 890 10000 2800 6000 54000
Nokia Lumia 620, IE, WP8.1 380 390 190 94 710 310
Alcatel OneTouch Fire E, Firefox OS 2.0 270 310 150 87 790 270 320
Apple iPad mini 2012, Safari, iOS 9 300 340 150 120 710 330 390 2000
Apple iPhone 6, iOS UIWebView, iOS 9 72 72 90 71 180 95 61
Apple iPhone 6, Safari, iOS 9 3800 4300 960 710 8700 2000 4800
Apple iPhone 6S, Safari, iOS 9 6000 7000 1600 1000 9900 2500 7400

The FFTW run didn’t seem to want to produce a result on the WP8 phone.

Some observations

On every platform there is at least one implementation that makes our 500 transforms/sec budget, although some have little to spare. (The exception is the iOS UIWebView, which doesn’t include a modern Javascript engine, but since iOS 8 it seems hybrid apps are allowed to use the fast WebKit web view instead. I haven’t included any Android results, but a similar divide exists between Google Chrome and the built-in browser on earlier Android versions.)

Usually the cross-compiled KissFFT is the fastest option. Although it seems culturally strange for the fastest Javascript version to be one that is not written in Javascript at all, I think this is a positive result: KissFFT simply has faster algorithms than any of the plain Javascript versions, and the compilation process is robust enough to preserve this advantage. The case of FFTW is stranger, but may have something to do with its relatively large girth.

If code size is a primary concern, the Nayuki code appears to be an effective implementation of the core algorithm.

The code produced by Emscripten, in the versions compiled from C, is asm.js. Firefox gives this restricted subset of Javascript special treatment, and it appears to be possible to see how much difference that makes: if you enable the developer console, a message appears warning you that asm.js optimisations are disabled while the console is active. The results are then 10% or so slower.

Overall we’re seeing between 2.4 and 5.4 times slowdown compared to a fast native-code version. This may not be a great use of CPU and battery if signal processing is your application’s main job, but good enough to make some interesting things.

Any thoughts?