FFTs in Javascript

Javascript engines are quite fast these days. Can we get away with doing serious signal-processing in Javascript yet?

People are doing things like image processing tools and audio spectrum visualisation in Javascript, so the answer must sometimes be yes, but I wanted to get an idea how well you’d get on with more demanding tasks like spectral audio processing. Although I’ve done this in many languages, Javascript has not yet been among them. And given that one reason you would do this would be for portability, the question is connected with the performance of target platforms such as mobile browsers.

Let’s consider the demands of a phase vocoder, commonly used for audio time-stretching (in the Rubber Band Library for example). To make a phase vocoder you chop up a sampled audio signal into short overlapping frames, take the short-time Fourier transform of each, convert its complex output to polar form, modify the phases, transform back again, and glue back together with a different amount of overlap from the original, thus changing the overall duration. Most of the overhead typically is in the forward and inverse Fourier transforms and in Cartesian-to-polar conversion (arctangents are slow).

If you have a mono signal sampled at 44.1kHz and you’re using a simple design with a 2048-point Fast Fourier Transform (FFT) with 75% overlap, you’ll need to go through this process at least 86 times a second to run at the equivalent of real-time. Budgeting maybe a third of the overall time for FFTs, and allowing equal time for forward and inverse transforms, I suppose you want to be able to do at least 500 transforms per second to have a decent chance of running nicely.

I decided to try some existing FFT implementations in Javascript and see how they went.

The candidates

These are the implementations I tried. (In parentheses are the labels I’ve used to identify them in the following tables.)

  • Multi-lingual FFT by Nayuki (nayuki)
  • The Nayuki code, but rearranged into an object that does sin/cos table generation once on construction: code is here (nayuki-obj)
  • FFT.js by Jens Nockert (nockert)
  • JSFFT by Nick Jones (dntj)

I also took three existing C libraries and compiled them to Javascript with Emscripten:

  • KissFFT, a small but reasonably sophisticated implementation by Mark Borgerding (kissfft)
  • FFTW3, a big library by Matteo Frigo et al (fftw). This library is so focused on native platform-specific performance tricks that it seems crazy to cross-compile it to Javascript, and I would never have thought to do so if it wasn’t for the fact that KissFFT worked better than I’d expected
  • Don Cross’s simple public domain implementation (cross)

Plus a couple that came up in searches but that I did not end up testing:

  • Timbre.js by mohayonao — this is designed to pipeline with other classes in the same framework rather than running standalone
  • dsp.js by Corban Brook — only provides an API to retrieve the magnitude spectrum rather than complex frequency-domain output

I timed only the real-to-complex forward transform. This favours those implementations that optimise for real inputs, which of the above I think means only KissFFT and FFTW3. You can find all the test code in this code project, and I put up a page here which runs the tests in your browser.

Some numbers

These figures are for the number of 2048-point forward transforms per second: higher numbers are better.

Most of these results are from desktop browsers, but I’ve included a couple of mobile devices (thanks to James and Andy for the iPhone numbers). I’ve also included timings of a sample native-code build, made using FFTW3 on Linux, Accelerate on OSX and iOS, and a 32-bit IPP build on Windows.

For each platform I’ve highlighted the fastest “real Javascript” version, and the fastest version overall apart from the native code, in bold.

Javascript C/Emscripten Native
nayuki nayuki-obj nockert dntj kissfft fftw cross
Lenovo Yoga 2 Pro, Firefox 41, Linux 5200 8000 2900 850 23000 18000 10000 94000
Lenovo Yoga 2 Pro, Chromium 46, Linux 5200 6500 3600 2500 15000 7000 5800 94000
Dell XPS 13 2015, Firefox 41, Windows 10 4600 7300 2300 660 18000 11000 7400 43000
Dell XPS 13 2015, MS Edge, Windows 10 5000 6900 1800 990 13000 1900 4900 43000
MacBook Air 11″ 2012, Firefox 40, OSX 10.10 4000 5200 2200 720 8000 12000 7400 54000
MacBook Air 11″ 2012, Safari 8, OSX 10.10 3000 3500 1300 890 10000 2800 6000 54000
Nokia Lumia 620, IE, WP8.1 380 390 190 94 710 310
Alcatel OneTouch Fire E, Firefox OS 2.0 270 310 150 87 790 270 320
Apple iPad mini 2012, Safari, iOS 9 300 340 150 120 710 330 390 2000
Apple iPhone 6, iOS UIWebView, iOS 9 72 72 90 71 180 95 61
Apple iPhone 6, Safari, iOS 9 3800 4300 960 710 8700 2000 4800
Apple iPhone 6S, Safari, iOS 9 6000 7000 1600 1000 9900 2500 7400

The FFTW run didn’t seem to want to produce a result on the WP8 phone.

Some observations

On every platform there is at least one implementation that makes our 500 transforms/sec budget, although some have little to spare. (The exception is the iOS UIWebView, which doesn’t include a modern Javascript engine, but since iOS 8 it seems hybrid apps are allowed to use the fast WebKit web view instead. I haven’t included any Android results, but a similar divide exists between Google Chrome and the built-in browser on earlier Android versions.)

Usually the cross-compiled KissFFT is the fastest option. Although it seems culturally strange for the fastest Javascript version to be one that is not written in Javascript at all, I think this is a positive result: KissFFT simply has faster algorithms than any of the plain Javascript versions, and the compilation process is robust enough to preserve this advantage. The case of FFTW is stranger, but may have something to do with its relatively large girth.

If code size is a primary concern, the Nayuki code appears to be an effective implementation of the core algorithm.

The code produced by Emscripten, in the versions compiled from C, is asm.js. Firefox gives this restricted subset of Javascript special treatment, and it appears to be possible to see how much difference that makes: if you enable the developer console, a message appears warning you that asm.js optimisations are disabled while the console is active. The results are then 10% or so slower.

Overall we’re seeing between 2.4 and 5.4 times slowdown compared to a fast native-code version. This may not be a great use of CPU and battery if signal processing is your application’s main job, but good enough to make some interesting things.

Any thoughts?

New software releases all around

A few months ago (in February!!) I wrote a post called Unreleased project pile-up that gave a pretty long list of software projects I’d been working on that could benefit from a proper release. It ended: let’s see how many of these I can tidy up & release during the next few weeks. The answer: very few.

During the past couple of weeks I’ve finally managed to make a bit of a dent, crossing off these applications from the list:

along with these earlier in the year:

and one update that wasn’t on the list:

Apart from the Python Vamp host, those all fall into the category of “overdue updates”. I haven’t managed to release as much of the actually new software on the list.

One “overdue update” that keeps getting pushed back is the next release of Sonic Visualiser. This is going to be quite a major release, featuring audio recording (a feature I once swore I would never add), proper support for retina and hi-dpi displays with retina rendering in the view layers and a new set of scalable icons, support for very long audio files (beyond the 32-bit WAV limit), a unit conversion window to help convert between units such as Hz and MIDI pitch, and a few other significant features. There’s a little way to go before release yet though.

Rosegarden v15.08

D. Michael McIntyre today announced the release of version 15.08 of Rosegarden, an audio and MIDI sequencer and music notation editor.

Rosegarden is a slightly crazy piece of work.

As a project it has existed for more than two decades, and the repository containing its current code was initialised in April 2000. It’s not a huge program, but it is quite complicated, and during its most active period it was run by three argumentative developers all trying to accomplish slightly different things. I wanted to replace Sibelius, and typeset string quartets. Richard wanted to replace Logic and run his home studio with it. Guillaume wanted to replace Band-in-a-Box and make jazz guitar arrangements. We ended up with something that is essentially a MIDI sequencer, but with some audio recording and arrangement capacity and a lot of interesting (fragile) logic for adjusting score layout of music that is stored as MIDI-plus-notation-metadata rather than directly as notation.

Rosegarden has all sorts of rather wild features which even its developers routinely forget. It has a “triggered segment” feature that replaces single notes with algorithmically expanded sequences at playback time, intended for use in playing ornaments from notation but also potentially usable for simple algorithmic compositions. It knows the playable range and transpositions of large numbers of real instruments, and can transpose between them and warn when a part’s notes are out of range. It has a note-timing quantizer that aims to produce notation as well as possible from performed MIDI recordings, but that doesn’t change the underlying recorded MIDI, instead trying (futilely?) to keep tabs on the adaptations necessary to make the raw MIDI appear well on the score. It can record audio, play back MIDI through audio synth plugins, apply effects, and do basic audio editing and timestretching. It has a feature which surely nobody except me has ever used, that allows you to tap along on a MIDI keyboard to the beats in an audio recording and then inserts tempo events in your MIDI (assuming it represents the same score as the audio you were tapping along to) that make it play back with the same tempo changes as the audio.

Rosegarden contains about 300,000 lines of C++, excluding all its library dependencies, with (ahem) no tests of any sort. It has seen well over 10,000 commits from about 40 contributors, in a single Subversion repository hosted at SourceForge. (Previously it was in CVS, but the move from CVS to Subversion was hard enough that it has never moved again. Some of its current developers use git, but they do so through a bridge to the Subversion repository.) Although the code is moderately portable, and lightly-supported ports to Windows and OS/X have appeared, the only platform ever officially supported is Linux and the code has only been officially published in source code form—it is assumed that Linux distributions will handle compilation and packaging.

Despite its complexities and disadvantages, Rosegarden has survived reasonably well; it appears still to be one of the more widely-used programs of its type. Admittedly this is in a tiny pond—Linux-only audio and music users—but it has persisted in spite of all of its early active developers having left the project. Here are the top three committers per year since 2000, by number of commits:

2000 Guillaume Laurent Chris Cannam
2001 Guillaume Laurent Chris Cannam Richard Bown
2002 Richard Bown Guillaume Laurent Chris Cannam
2003 Chris Cannam Guillaume Laurent Richard Bown
2004 Chris Cannam Guillaume Laurent Richard Bown
2005 Guillaume Laurent Chris Cannam D. Michael McIntyre
2006 Chris Cannam Pedro Lopez-Cabanillas Guillaume Laurent
2007 Chris Cannam Heikki Junes Guillaume Laurent
2008 D. Michael McIntyre Chris Cannam Heikki Junes
2009 D. Michael McIntyre Chris Cannam Jani Frilander
2010 D. Michael McIntyre Julie Swango Chris Cannam
2011 D. Michael McIntyre Ted Felix Yves Guillemot
2012 Ted Felix D. Michael McIntyre Tom Breton
2013 D. Michael McIntyre Ted Felix Tom Breton
2014 Ted Felix D. Michael McIntyre Tom Breton
2015 Ted Felix D. Michael McIntyre Tom Breton

Some developers (Tom Breton for example) flatten numerous commits from git into single Subversion commits for the official repo and are probably under-represented, but this gives the general shape. Richard Bown mostly retired from this project in 2005, although his 794 commits in 2002 seems still to be the record. (Ted Felix has made 231 so far this year.) Guillaume Laurent forcefully moved to OS/X in 2007, and I faded out in 2010 after a big push to port Rosegarden from Qt3 to Qt4. What is most notable is the unifying thread provided by D. Michael McIntyre.

 

Rubber Band Audio v1.9.0

Some three years after its last release (!), I present to you version 1.9.0 of Rubber Band Audio:

rbap-1.9.0-win32-full-hiresRubber Band Audio is a little audio-processing application that does time-stretching and pitch-shifting using the Rubber Band Library. It’s quite a neat tool for adjusting loops, adapting recordings to a new key, or slowing down sections for closer listening.

The previous version still works perfectly fine — I don’t really subscribe to the idea that you need to update an application constantly even though it’s basically done. But this one does add a couple of nice features. It now includes little start and end playback markers, that you can drag to set a looping range within the recording — useful when studying something closely, and a feature users have asked for. It works better with retina displays (on OS/X) and hi-dpi ones (on Windows). Audio file input and output have been updated to support a wider range of formats on Windows and to save a bit faster.

Rubber Band Audio is available direct from my little company (for Windows or Mac, with a pretty full-featured demo version available for both) or through the Mac App Store, for £7.99 or currency equivalent.

Replacing the GNOME Shell font in GNOME 3.16

[Edit: see the comment by Hugo Roy after the article, describing a much simpler, more “official” way to achieve this]

When using Linux on a touchscreen laptop, I generally use the GNOME 3 desktop — logical design, big touch targets, good support for high-resolution displays, nice to look at.

While I mostly approve of the design decisions made for GNOME 3, there is one thing about it that I don’t get on with, and that’s its use of the Cantarell font. Cantarell is clear and readable, and a fine default for most UI text, but at the middle of the top of the screen there lives a digital clock:

Clock in Cantarelland I find this strangely annoying to look at. I think it has a lot to do with the excessively round zero. Since it’s always there, it annoys me a surprising amount.

Until GNOME 3.15, it was easy to change the font used throughout GNOME Shell by editing one property in a CSS file. Unfortunately GNOME 3.16 moved that CSS file into an opaque resource bundle and made it accessible only through some awkwardly-designed tools. I can’t fathom how that appeared to be a good idea, but there it is.

Anyway, with help from this forum post I knocked out a script to update this resource file so as to make it prefer the Fira Sans font from FirefoxOS. It makes a copy of the existing resource file with a .dist suffix.

This may be specific to Arch Linux (the distro I’m using), so caution advised if you refer to this for any reason. It’s necessary to have the glib2 and otf-fira-sans packages installed for this to work.

#!/bin/bash

set -eu

rname=gnome-shell-theme.gresource
resource="/usr/share/gnome-shell/$rname"

ext="$(date +%s)$$"
tmpdir="./fix_$ext"
mkdir "$tmpdir"
trap "rm -f $tmpdir/* ; rmdir $tmpdir" 0

manifest="$rname.xml"
cat > "$tmpdir/$manifest" <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<gresources>
<gresource prefix="/org/gnome/shell/theme">
EOF

for file in $(gresource list "$resource"); do
    base=$(basename "$file")
    out="$tmpdir/$base"
    gresource extract "$resource" "$file" > "$out"
    echo "<file>$base</file>" >> "$tmpdir/$manifest"
done

cat >> "$tmpdir/$manifest" <<EOF
</gresource>
</gresources>
EOF

(
    cd "$tmpdir"
    perl -i -p -e 's/font-family:.*;/font-family: "Fira Sans", Cantarell, Sans-Serif;/' gnome-shell.css
    glib-compile-resources "$manifest"
)

sudo cp "$resource" "$resource.dist.$ext"
sudo cp "$tmpdir/$rname" "$resource"

Of course every time an update comes along, it overwrites the resource file and I have to run this again. Which is one reason I’m posting this as a reminder to myself.

Standard ML and how I’m compiling it

I mentioned in an earlier post that I was starting to use Standard ML for a (modest) real project. An early problem I encountered was how to manage builds, when using third-party library modules and multiple files of my own code.

I’m not talking here about anything advanced; I don’t even care yet about incremental compilation or packaging source for other people to build. My requirements are:

  1. produce an efficient native binary
  2. but get error reports quickly, for a fast build cycle
  3. and make it possible to load my code into a REPL for interactive experiments
  4. while using a small amount of third-party library code.

Several implementations of Standard ML exist (I listed some in my last post) and they’re broadly compatible at the core language level, but they disagree on how to compile things.

There’s little consensus between implementations about how to describe module dependencies, pull in third-party libraries, or compile a program that consists of more than one file. The standard says nothing about how different source files interact. There’s no include or import directive and no link between filesystem and module naming. Some implementations do extend the standard (most notably SML/NJ adds a few things) but there isn’t any standard way for a program to detect what language features are available to it. Implementations differ in purpose as well: some SML systems are primarily interactive environments, others primarily compilers.

So here’s what I found myself doing. I hope it might be useful to someone. But before that, I hope that somebody will post a comment that suggests a better way and makes the whole post obsolete.

My general principle was to set up a “default build” that used the MLton compiler, because that has the best combination of standards compliance and generating fast binaries; but then hack up a script to make the build also possible with other compilers, because MLton is slow to run and provides no REPL so is less useful during development.

Let’s build that up from a simple program, starting with Hello World. (I’m using Poly/ML as the “other” compiler—I’ll explain why later on.)

What compilers expect

I’m going to use a function for my Hello World, rather than just calling print at the top level. It’ll make things clearer in a moment. Like any effectively no-argument function in SML, it takes a single argument of unit type, ().

(* hello.sml *)
 
fun hello () =
    print "Hello, world!\n"

val _ = hello ()

And to compile it:

$ mlton hello.sml
$ ./hello
Hello, world!
$

When MLton compiles a program, it produces an executable that will evaluate the ML source you provide. There is no “main” function in the ML itself; instead the program will evaluate the bindings that appear at top level—in this case the final call to hello.

Compiling with MLton is not fast. Hello World takes more than two seconds to build, which is beyond the “immediate” threshold you need for a good compile/edit feedback loop.

Poly/ML compiles much faster, but it doesn’t like this code:

$ polyc hello.sml
Hello, world!
Error-Value or constructor (main) has not been declared
Found near PolyML.export ("/tmp/polyobj.15826.o", main)
Static Errors
$

The limitations of the Standard in Standard ML are quickly reached! It doesn’t define any kind of entry point for a compiled program.

Most SML compilers—including the Poly/ML compiler, but not MLton—work by reading the program into the interactive environment and then dumping it out as object code. Anything evaluated at top-level in the program gets executed in the interactive environment while reading the code in, rather than being exposed as an entry point in the resulting executable. We can see this happening in our example, as Poly/ML prints out “Hello, world!” before the compile error.

Instead Poly/ML expects to have a separate function called main, which will be the entry point for the program:

(* hello.sml *)
 
fun hello () =
    print "Hello, world!\n"

fun main () = hello ()

That works…

$ polyc hello.sml
$ ./a.out
Hello, world!
$

… and with Poly/ML it only takes a small fraction of a second to compile. But now it won’t work with MLton:

$ mlton hello.sml
$ ./hello
$

The main function is never called. We’re going to need to do something different for the two compilers, using a two-file setup something like this:

(* hello.sml *)

fun hello () =
    print "Hello, world!\n"

fun main () = hello ()
(* main.sml *)

val _ = main ()

Then when using MLton, we compile both hello.sml and main.sml; when using Poly/ML, we compile only hello.sml. In both cases we will end up with a single native binary executable that calls the hello function when invoked.

Using Basis files to drive the compiler

So now we have two files instead of one. How do we compile two files?

MLton’s multi-file compilation is driven by what it calls Basis files. These have an extension of .mlb, and in their simplest form just consist of a list of .sml filenames which are evaluated one after another into a single environment:

(* hello.mlb *)
hello.sml
main.sml

This file format is specific to the MLton compiler, it’s not part of the SML language.

The above example isn’t quite enough for a complete build with MLton—whereas the compiler automatically introduces the standard Basis library into scope when building a single .sml file, it doesn’t do so when building from a .mlb description. So any functions we use from the standard Basis library (such as print) will be missing.

We need to add a line at the start to include the standard library:

(* hello.mlb *)
$(SML_LIB)/basis/basis.mlb
hello.sml
main.sml

and then

$ mlton hello.mlb 
$ ./hello 
Hello, world!
$

This is simple enough so far. It doesn’t work directly with Poly/ML, because the polyc compiler doesn’t support the .mlb format. But polyc is itself just a shell script that loads a .sml file into Poly/ML and exports out the results. So we can make our own script that reads filenames from a simple .mlb file (omitting the one that requests the MLton-specific Basis library, as this is automatically loaded by Poly/ML).

You can find such a script here. At the moment I’m saving this in the project directory as a file named polybuild, so:

$ ./polybuild hello.mlb
$ ./hello
Hello, world!
$

The main.sml file gives no indication of where the main function it calls might live. The question of how to organise SML files together into an application is left completely outside the scope of the SML standard.

Now what if we introduce a dependency on a third-party library other than the standard one?

Third-party library dependencies

I was surprised to find no associative container type in the Standard ML Basis library—these containers have many applications and are often built in to languages nowadays. Let’s consider introducing one of them.

It happens that MLton (but not Poly/ML) ships with a port of the SML/NJ library which includes a couple of associative map containers. Here’s a little program that uses one:

(* dict.sml *)

structure Dict = SplayMapFn (struct
    type ord_key = string
    val compare = String.compare
    end)

val dict =
    let val d = Dict.empty
        val d = Dict.insert (d, "eggs", 5)
        val d = Dict.insert (d, "bacon", 2)
        val d = Dict.insert (d, "tomatoes", 3)
    in d
    end

fun main () =
    print ("Dictionary contains " ^
           (Int.toString (Dict.numItems dict)) ^ " items\n")

As before, the source code contains no indication of where SplayMapFn is to be found. (It’s a functor in the SML/NJ library—this is a type of function that, in this case, converts the generic map structure into a structure specialised for a key of type string, a bit like a template instantiation in C++. “Functor” is one of those terms that means something different in every language that uses it.)

Here’s a MLton-compatible .mlb file for this, in which we resolve the question of where to find our map type:

(* dict.mlb *)

$(SML_LIB)/basis/basis.mlb
$(SML_LIB)/smlnj-lib/Util/smlnj-lib.mlb
dict.sml
main.sml
$ mlton dict.mlb
$ ./dict
Dictionary contains 3 items
$

This doesn’t work with our polybuild script, as it includes another .mlb file and we have only dealt with files that include .sml files. We can’t fix this by just reading in the .mlb file it includes, because (you’ll see this if you have a look at the smlnj-lib.mlb file being included) it isn’t just a list of source files—the MLton developers have worked it carefully with a more advanced syntax to make sure no extraneous names are exported.

It’s still possible to concoct a Basis file for our program that refers only to other .sml files (and the Basis library). We just start by adding a reference to the pivotal file from the library that we are calling into, in this case splay-map-fn.sml, and then when we run the build and get an error, we add another file for each undefined symbol. The end result is this:

(* dict.mlb *)
$(SML_LIB)/basis/basis.mlb
$(SML_LIB)/smlnj-lib/Util/lib-base-sig.sml
$(SML_LIB)/smlnj-lib/Util/lib-base.sml
$(SML_LIB)/smlnj-lib/Util/ord-key-sig.sml
$(SML_LIB)/smlnj-lib/Util/ord-map-sig.sml
$(SML_LIB)/smlnj-lib/Util/splaytree-sig.sml
$(SML_LIB)/smlnj-lib/Util/splaytree.sml
$(SML_LIB)/smlnj-lib/Util/splay-map-fn.sml
dict.sml
main.sml

I fear problems with bigger builds, but this does work with our polybuild script.

As a way to introduce a simple associative array into a program, this looks a bit crazy. Every modern language has a map or hash type either built in to the language, or available with minimal syntax in a standard library. This program “should be” a two-liner with a one-line compiler invocation.

I do like the feeling of building my program from nice uniform bricks. Should I want to replace a container implementation with a different one, it’s clear how I would do so and it wouldn’t involve changing the calling code or losing type safety—that does feel good.

But such overhead, for something as basic as this: we’ve just got to hope that it works out well in the end, as we compose structures and the scale gets bigger. Will it?

Interaction with the REPL

Along with the polybuild script, I made a script called polyrepl that starts the Poly/ML interactive environment (remember MLton doesn’t have one) and loads the contents of an .mlb file into it for further investigations. You can find that one in the same repo, here.

Notes on other compilers

(I summarised some of the Standard ML compilers available in my previous post.)

  • MLKit supports .mlb files directly, but it doesn’t understand the extended syntax used in the MLton smlnj-lib.mlb, and it doesn’t seem to like compiling files found in locations it doesn’t have write permission for (such as system directories).
  • Moscow ML doesn’t support .mlb files directly, and appears to be harder to use existing module code with than Poly/ML because it seems to enforce structure/signature separation (signatures have to be separated out into .sig files, I think).
  • SML/NJ has its own compilation manager (called Compilation Manager) which seems rather complex, and I don’t really want to go there at the moment