Bowie

Here’s a playlist of good David Bowie songs that I had never heard until after he died last week.

Spotify playlist
YouTube links:
Dead Against It (1993)
Up The Hill Backwards (1980)
Move On (1979)
Dancing With The Big Boys (1984) (with Iggy Pop)
I Would Be Your Slave (2002)
Girl Loves Me (2016)
You’ve Been Around [Dangers 12″] (1993) (Jack Dangers remix)
Nite Flights (1993) (Scott Walker cover)
No Control (1995)
Bring Me The Disco King (2003)
I’m Deranged (1995)
5:15 The Angels Have Gone (2002)

Most of these came out after the peak of his popularity, but they aren’t obscure at all — I was just never a fan.

The first Bowie songs I remember hearing on the radio were Modern Love and Let’s Dance, both released in 1983 when I was eleven. I thought those two were fine, though they weren’t the sort of thing I liked to think I was into at the time. (I had a big Motörhead war-pig patch on my little denim jacket. Lemmy’s death was also the end of an era.)

A few years later, a cousin introduced me to some of the Spiders from Mars period songs like Rebel Rebel and Suffragette City. I was a bit puzzled because I thought I knew Bowie as a smooth, modern 80s-sounding chap. But I didn’t get the appeal either: too much like everything else from the early 70s. Rebel Rebel even sounded like the Stones, which was definitely my dad’s music.

Back in the real timeline of the 80s, Bowie was releasing Never Let Me Down, an album seen everywhere (one of several awful record covers) but seldom played, then launching the drearily adult Tin Machine.

His next album, Black Tie White Noise, didn’t come out until 1993, when I was briefly in Berlin as a student and mostly listening to industrial music and obscure things I read about in Usenet groups. If I had been aware that David Bowie had an album out, I would certainly have ignored it. By the time of 1997’s Earthling, a jungle-influenced album released a whole two years after peak jungle with a dreadful Cool Britannia cover, it felt socially impossible to admit to liking a Bowie song ever again. And that was pretty much the end of that.

There’s been a David Bowie album, collaboration, tour, or retrospective for almost every year of my life, and I’ve never taken more than a passing interest in any of them.

I was taken by surprise, then, by how emotional I felt about his death.

***

What did eventually make me notice David Bowie as a figure was the connection with Iggy Pop. I think Iggy is brilliant, and I’d been a bit of a fan for a while before I eventually twigged what it was that his most interesting stuff had in common. That made me aware of the famously dramatic and productive spell for those two in Berlin the late 70s (the only albums of Bowie’s that I ever actually bought are from this period) and also an opening to a bit of a web of interesting collaborations and influences.

(Going back during the past week and filling in a lot of the songs of Bowie’s that I’ve missed during the last few decades, it’s been particularly fun to hear Iggy Pop numbers, er, pop up all over the place. China Girl — always an Iggy song to me — is well known, but there are at least three other albums that recycle songs previously recorded by him, including a straight cover of the flop lead single from Iggy’s most foolish album. A sustained friendship.)

***

So something of the emotion for me has to do with all that Berlin stuff. There are two aspects to that. One is the grubbily romantic idea of “pressure-cooker” West Berlin, seen from a distance as a place of hideouts, drugs, spying, longing, separation, and any other melodrama that “we” could project onto it. I’m sure this version of the city was overstated for lyrical purposes, but it probably did exist to a degree. The Berlin that fascinated and frightened me in 1993 was already a very different city, and both versions are hardly visible in today’s shiny metropolis.

The other aspect is the notion that moving to a different town in a different country could give you a new life and make your past disappear, even for someone already so celebrated — that it could really be so simple. What makes that idea available here is that Bowie didn’t just go, but then produced such different work after going that it really could appear as if his past had not gone with him.

This impression of self-effacement alongside all the self-promotion, the ability to erase the past, is a very attractive one for a pop star, and it fits also with the amount of collaborative work Bowie did. From some of the videos you can imagine that he was never happier than when playing keyboards or doing tour production for Iggy, singing backing vocals in a one-off with Pink Floyd, or playing second fiddle to Freddie Mercury or Mick Jagger.

Perl 6

I see the official release of the Perl 6 language specification happened on Christmas day.

The first piece of commercial web development I did was in Perl 5. A lot of people can probably say the same thing. This one was a content-management system led by James Elson in 1999 at PSWeb Ltd, a small agency in Farringdon that renamed itself to Citria and expanded rapidly during 1999-2001 before deflating even more rapidly when the dotcom bust arrived.

My recollection was that this particular CMS only ever had one customer, the BBC, who used it only for their very small Digital Radio site. But I still have a copy of the code and on inspection it turns out to have some comments that must have been added during a later project, so perhaps it did get deployed elsewhere. It was a neat, unambitious system (that’s a good thing, James is a tasteful guy) that presented a dynamic inline-editing blocks-based admin interface on a backend URL while generating static pages at the front end.

I remember there was an open question, for a time, about whether the company should pursue a product strategy and make this first CMS, or something like it, the basis of its business, or else take up a project strategy and use whatever technology from whichever provider seemed most appropriate for each client. The latter approach won out. It’s interesting to speculate about the other option.

(I like to imagine that the release of Perl 6 is sparking tiresome reminiscences like this from ageing programmers across the world.)

Perl 6 looks like an interesting language. (It’s a different language from Perl 5, not compatible with it.) The great strength of Perl was of course its text-processing capacity, and for all the fun/cute/radically-unmaintainable syntax showcased on the Perl 6 site, it’s clear that that’s still a big focus. Perl 6 appears to be the first language to attempt to do the right thing with Unicode from the ground up: that is, to represent text as Unicode graphemes, rather than byte strings (like C/C++), awkward UTF-16 sequences (Java), or ISO-10646 codepoint sequences (Python 3). This could, in principle, mean your ad-hoc botched-together text processing script in Perl 6 has a better chance of working reliably across multilingual inputs than it would in any other language. Since plenty of ad-hoc botched-together text processing still goes on in companies around the world, that has the feel of a good thing.

MIREX 2015 submissions

For the past three years now, we’ve taken a number of Vamp audio analysis plugins published by the Centre for Digital Music and submitted them to the annual MIREX evaluation. The motivation is to give other methods a baseline to compare against, to compare one year’s evaluation metrics and datasets against the next year’s, and to give our group a bit of visibility. See my posts about this process in 2014 and in 2013.

Here are this year’s outcomes. All these categories are ones we had submitted to before, but I managed to miss a couple of category deadlines last year, so in total we had more categories than in either 2013 or 2014.

Structural Segmentation

Results for the four datasets are here, here, here, and here. This is one of the categories I missed last year and, although I find the evaluations quite hard to understand, it’s clear that the competition has moved on a bit.

Our own submissions, the Segmentino plugin from Matthias Mauch and the much older QM Segmenter from Mark Levy, produced the expected results (identical to 2013 for Segmentino; similar for QM Segmenter, which has a random initialisation step). As before, Segmentino obtains the better scores. There was only one other submission this year, a convolutional neural network based approach from Thomas Grill and Jan Schlüter which (I think) outperformed both of ours by some margin, particularly on the segment boundary measure.

Multiple Fundamental Frequency Estimation and Tracking

Results here and here. In addition to last year’s submission for the note tracking task of this category, this year I also scripted up a submission for the multiple fundamental frequency estimation task. Emmanouil Benetos and I had made some tweaks to the Silvet plugin during the year, and we also submitted a new fast “live mode” version of it. The evaluation also includes a new test dataset this year.

Our updated Silvet plugin scores better than last year’s version in every test they have in common, and the “live mode” version is actually not all that far off, considering that it’s very much written for speed. (Nice to see a report of run times in the results page — Silvet live mode is 15-20 times as fast as the default Silvet mode and more than twice as fast as any other submission.) Emmanouil’s more recent research method does substantially better, but this is still a pleasing result.

This category is an extremely difficult one, and it’s also painfully difficult to get good test data for it. There’s plenty of potential here, but it’s worth noting that a couple of the authors of the best submissions from last year were not represented this year — in particular, if Elowsson and Friberg’s 2014 method had appeared again this year, it looks as if it would still be at the top.

Audio Onset Detection

Results here. Although the top scores haven’t improved since last year, the field has broadened a bit — it’s no longer only Sebastian Böck vs the world. Our two submissions, both venerable methods, are now placed last and second-last.

Oddly, our OnsetsDS submission gets slightly better results than last year despite being the same, deterministic, implementation (indeed exactly the same plugin binary) run on the same dataset. I should probably check this with the task captain.

Audio Beat Tracking

Results here, here, and here. Again the other submissions are moving well ahead and our BeatRoot and QM Tempo Tracker submissions, producing unchanged results from last year and the year before, are now languishing toward the back. (Next year will see BeatRoot’s 15th birthday, by the way.) The top of the leaderboard is largely occupied by a set of neural network based methods from Sebastian Böck and Florian Krebs.

This is a more interesting category than it gets credit for, I think — still improving and still with potential. Some MIREX categories have very simplistic test datasets, but this category introduced an intentionally difficult test set in 2012 and it’s notable that the best new submissions are doing far better here than the older ones. I’m not quite clear on how the evaluation process handles the problem of what the ground truth represents, and I’d love to know what a reasonable upper bound on F-measure might be.

Audio Tempo Estimation

Results here. This is another category I missed last year, but we get the same results for the QM Tempo Tracker as we did in 2013. It still does tolerably well considering its output isn’t well fitted to the evaluation metric (which rewards estimators that produce best and second-best estimates across the whole piece).

The top scorer here is a neural network approach (spotting a theme here?) from Sebastian Böck, just as for beat tracking.

Audio Key Detection

Results here and here. The second dataset is new.

The QM Key Detector gets the same results as last year for the dataset that existed then. It scores much worse on the new dataset, which suggests that may be a more realistic test. Again there were no other submissions in this category — a pity now that it has a second dataset. Does nobody like key estimation? (I realise it’s a problematic task from a musical point of view, but it does have its applications.)

Audio Chord Estimation

Poor results for Chordino because of a bug which I went over at agonising length in my previous post. This problem is now fixed in Chordino v1.1, so hopefully it’ll be back to its earlier standard in 2016!

Some notes

Neural networks

… are widely-used this year. Several categories contained at least one submission whose abstract referred to a convolutional or recurrent neural network or deep learning, and in at least 5 categories I think a neural network method can reasonably be said to have “won” the category. (Yes I know, MIREX isn’t a competition…)

  • Structural segmentation: convolutional NN performed best
  • Beat tracking: NNs all over the place, definitely performing best
  • Tempo estimation: NN performed best
  • Onset detection: NN performed best
  • Multi-F0: no NNs I think, but it does look as if last year’s “deep learning” submission would have performed better than any of this year’s
  • Chord estimation: NNs present, but not yet quite at the top
  • Key detection: no NNs, indeed no other submissions at all

Categories I missed

  • Audio downbeat estimation: I think I just overlooked this one, for the second year in a row. As last year, I should have submitted the QM Bar & Beat Tracker plugin from Matthew Davies.
  • Real-time audio to score alignment: I nearly submitted the MATCH Vamp Plugin for this, but actually it only produces a reverse path (offline alignment) and not a real-time output, even though it’s a real-time-capable method internally.

Other submissions from the Centre for Digital Music

Emmanouil Benetos submitted a well-performing method, mentioned above, in the Multiple Fundamental Frequency Estimation & Tracking category.

Apart from that, there appear to be none.

This feels like a pity — evaluation is always a pain and it’s nice to get someone else to do some of it.

It’s also a pity because several of the plugins I’m submitting are getting a bit old and are falling to the bottom of the results tables. There are very sound reasons for submitting them (though I may drop some of the less well performing categories next year, assuming I do this again) but it would be good if they didn’t constitute the only visibility QM researchers have in MIREX.

Why would this be the case? I don’t really know. The answer presumably must include some or all of

  • not working on music informatics signal-processing research at all
  • working on research that builds on feature extractors, rather than building the feature extractors themselves
  • research not congruent with MIREX tasks (e.g. looking at dynamics or articulations rather than say notes or chords)
  • research uses similar methods but not on mainstream music recordings (e.g. solo singing, animal sounds)
  • state-of-the-art considered good enough
  • lack the background to compete with current methods (e.g. the wave of NNs) and so sticking with progressive enhancements of existing models
  • lack the data to compete with current methods
  • not aware of MIREX
  • not prioritised by supervisor

The last four reasons would be a problem, but the rest might not be. It could really be that MIREX isn’t very relevant to the people in this group at the moment. I’ll have to see what I can find out.

Chordino troubles

On September the 9th, I released a v1.0 build of the Chordino and NNLS Chroma Vamp plugin. This plugin analyses audio recordings of music and calculates some harmonic features, including an estimated chord transcription. When used with Sonic Visualiser, Chordino is potentially very useful for anyone who likes to play along with songs, as well as for research.

Chordino was written by Matthias Mauch, based on his own research. Although I made this 1.0 release, my work on it only really extended as far as fixing some bugs found in earlier releases using the Vamp Plugin Tester.

Unfortunately, with one of those fixes, I broke the plugin. The supposedly more reliable 1.0 update was substantially less accurate at identifying both chord-change boundaries and the chords themselves than any previous version.

I didn’t notice. Nor did Matthias, who had recently left our research group and was busy starting at a new job. One colleague sent me an email saying he had problems with the new release, but I jumped to the completely wrong conclusion that this had something to do with parameter settings having changed since the last release, and suggested he raise it with Matthias.

I only realised what had happened after we submitted the plugin to MIREX evaluation, something we do routinely every year for plugins published by C4DM, when the MIREX task captain Johan Pauwels emailed to ask whether I had expected its scores to drop by 15% from the previous year’s submission (see results pages for 2014, 2015). By that time, the broken plugin had been available for over a month.

This is obviously hugely embarrassing—perhaps the most unambiguous screwup of my whole programming career so far. As the supposed professional software developer of my research group, I took someone else’s working code, broke it, published and promoted the broken version with his name all over it, and then submitted it to a public evaluation, again with his name on it, where its brokenness was finally made pointed out to me, a month later, by someone else. Any regression test, even on only a single audio file, would have shown up the problem immediately. Regression-testing this sort of software can be tricky, but the simplest possible test would have worked here. And a particularly nice irony is provided by the fact that I’ve just come from a four-year project whose aims included trying to improve the way software is tested in academia.

I’ve now published a fixed version of the plugin (v1.1), available for download here. This one has been regression tested against known-good output, and the tests are in the repository for future use. The broken version is actually gone from the download page (though of course it is still tagged in the source repository), to avoid anyone getting the wrong one by accident.

I’m also working on a way to make simple regression tests easier to provide and run, for the other plugins I work on.

That’s all for the “public service announcement” bit of this post; read on only if you’re interested in the details.

What was the change that broke it? Well, it was a change I made after running the plugin through the Vamp Plugin Tester, a sort of automated fuzz-testing tool that helps you find problems with your code. (Again, there’s an irony here. Using this tool is undoubtedly a good practice, as it can show up all sorts of problems that might not be apparent to developers otherwise. Even so, I should have known well how common it is to introduce bugs while fixing things like compiler warnings and static analysis tests.)

The problem I was trying to fix here was that intermediate floating-point divisions sometimes overflowed, resulting in infinity values in the output. This only happened for unusual inputs, so it appeared reasonable to fix it by clamping intermediate values when they appeared to be blowing up out of the expected range. But I set the threshold too low, so that many intermediate values from legitimate inputs were also being mangled. I then also made a stupid typo that made the results a bit worse still (you can see the change in question around line 500 of the file in this diff).

Note that this only broke the output from the Chordino chord estimator, not the other features calculated by NNLS Chroma.

A digression. An ongoing topic of debate in the world of the Research Software Engineer is whether software development resources in academia should be localised within research groups, or centralised.

The localised approach, which my research group has taken with my own position, employs developers directly within a research subject. The centralised approach, typified by the Research Software Development group at UCL, proposes a group of software developers who are loaned or hired out to research groups according to need and availability.

In theory, the localised approach can be simpler to manage and should increase the likelihood of developers being available to help with small pieces of work requiring subject knowledge at short notice. The centralised approach has the advantage that all developers can share the non-subject-specific parts of their workload and knowhow.

I believe that in general a localised approach is useful, and I suspect it is easier to hire developers for a specific research group than to find developers good enough to be able to parachute in to anywhere from a central team.

In a case like this, though, the localised approach makes for quite a lonely situation.

Companies that produce large software products that work do so not because they employ amazing developers but because they have systems in place to support them: code review, unit testing, regression tests, continuous integration, user acceptance tests.

But for me as a lone professional developer in a research group, it’s essentially my responsibility to provide those safety nets as well as to use them. I had some of them in place for most of the code I work on, but there was a big hole for this particular project. I broke the code, and I didn’t notice because I didn’t have the right tests ready. Neither did the researcher who wrote most of this code, but that wasn’t his job. When some software goes out from this group that I have worked on, it’s my responsibility to make sure that the code aspects of it (as opposed to the underlying methods) work correctly. Part of my job has to be to assume that nobody else will be in a position to help.

 

… and an FFT in Standard ML

While writing my earlier post on Javascript FFTs, I also (for fun) adapted the Nayuki FFT code into Standard ML. You can find it here.

The original idea was to see how performance of SML compiled to native code, and SML compiled to Javascript using smltojs, compared with the previously-tested Javascript implementations and with any other SML versions I could find. (There’s FFT and DFT code from Rory McGuire here, probably written for clarity rather than speed, plus versions I haven’t tried in the SML/NJ and MLKit test libraries.)

I didn’t get as far as doing a real comparison, but I did note that it ran at more or less the same speed when compiled natively with MLton as the Javascript version does when run in Firefox, and that compiling to JS with smltojs produced much slower code. I haven’t checked where the overhead lies.

FFTs in Javascript

Javascript engines are quite fast these days. Can we get away with doing serious signal-processing in Javascript yet?

People are doing things like image processing tools and audio spectrum visualisation in Javascript, so the answer must sometimes be yes, but I wanted to get an idea how well you’d get on with more demanding tasks like spectral audio processing. Although I’ve done this in many languages, Javascript has not yet been among them. And given that one reason you would do this would be for portability, the question is connected with the performance of target platforms such as mobile browsers.

Let’s consider the demands of a phase vocoder, commonly used for audio time-stretching (in the Rubber Band Library for example). To make a phase vocoder you chop up a sampled audio signal into short overlapping frames, take the short-time Fourier transform of each, convert its complex output to polar form, modify the phases, transform back again, and glue back together with a different amount of overlap from the original, thus changing the overall duration. Most of the overhead typically is in the forward and inverse Fourier transforms and in Cartesian-to-polar conversion (arctangents are slow).

If you have a mono signal sampled at 44.1kHz and you’re using a simple design with a 2048-point Fast Fourier Transform (FFT) with 75% overlap, you’ll need to go through this process at least 86 times a second to run at the equivalent of real-time. Budgeting maybe a third of the overall time for FFTs, and allowing equal time for forward and inverse transforms, I suppose you want to be able to do at least 500 transforms per second to have a decent chance of running nicely.

I decided to try some existing FFT implementations in Javascript and see how they went.

The candidates

These are the implementations I tried. (In parentheses are the labels I’ve used to identify them in the following tables.)

  • Multi-lingual FFT by Nayuki (nayuki)
  • The Nayuki code, but rearranged into an object that does sin/cos table generation once on construction: code is here (nayuki-obj)
  • FFT.js by Jens Nockert (nockert)
  • JSFFT by Nick Jones (dntj)

I also took three existing C libraries and compiled them to Javascript with Emscripten:

  • KissFFT, a small but reasonably sophisticated implementation by Mark Borgerding (kissfft)
  • FFTW3, a big library by Matteo Frigo et al (fftw). This library is so focused on native platform-specific performance tricks that it seems crazy to cross-compile it to Javascript, and I would never have thought to do so if it wasn’t for the fact that KissFFT worked better than I’d expected
  • Don Cross’s simple public domain implementation (cross)

Plus a couple that came up in searches but that I did not end up testing:

  • Timbre.js by mohayonao — this is designed to pipeline with other classes in the same framework rather than running standalone
  • dsp.js by Corban Brook — only provides an API to retrieve the magnitude spectrum rather than complex frequency-domain output

I timed only the real-to-complex forward transform. This favours those implementations that optimise for real inputs, which of the above I think means only KissFFT and FFTW3. You can find all the test code in this code project, and I put up a page here which runs the tests in your browser.

Some numbers

These figures are for the number of 2048-point forward transforms per second: higher numbers are better.

Most of these results are from desktop browsers, but I’ve included a couple of mobile devices (thanks to James and Andy for the iPhone numbers). I’ve also included timings of a sample native-code build, made using FFTW3 on Linux, Accelerate on OSX and iOS, and a 32-bit IPP build on Windows.

For each platform I’ve highlighted the fastest “real Javascript” version, and the fastest version overall apart from the native code, in bold.

Javascript C/Emscripten Native
nayuki nayuki-obj nockert dntj kissfft fftw cross
Lenovo Yoga 2 Pro, Firefox 41, Linux 5200 8000 2900 850 23000 18000 10000 94000
Lenovo Yoga 2 Pro, Chromium 46, Linux 5200 6500 3600 2500 15000 7000 5800 94000
Dell XPS 13 2015, Firefox 41, Windows 10 4600 7300 2300 660 18000 11000 7400 43000
Dell XPS 13 2015, MS Edge, Windows 10 5000 6900 1800 990 13000 1900 4900 43000
MacBook Air 11″ 2012, Firefox 40, OSX 10.10 4000 5200 2200 720 8000 12000 7400 54000
MacBook Air 11″ 2012, Safari 8, OSX 10.10 3000 3500 1300 890 10000 2800 6000 54000
Nokia Lumia 620, IE, WP8.1 380 390 190 94 710 310
Alcatel OneTouch Fire E, Firefox OS 2.0 270 310 150 87 790 270 320
Apple iPad mini 2012, Safari, iOS 9 300 340 150 120 710 330 390 2000
Apple iPhone 6, iOS UIWebView, iOS 9 72 72 90 71 180 95 61
Apple iPhone 6, Safari, iOS 9 3800 4300 960 710 8700 2000 4800
Apple iPhone 6S, Safari, iOS 9 6000 7000 1600 1000 9900 2500 7400

The FFTW run didn’t seem to want to produce a result on the WP8 phone.

Some observations

On every platform there is at least one implementation that makes our 500 transforms/sec budget, although some have little to spare. (The exception is the iOS UIWebView, which doesn’t include a modern Javascript engine, but since iOS 8 it seems hybrid apps are allowed to use the fast WebKit web view instead. I haven’t included any Android results, but a similar divide exists between Google Chrome and the built-in browser on earlier Android versions.)

Usually the cross-compiled KissFFT is the fastest option. Although it seems culturally strange for the fastest Javascript version to be one that is not written in Javascript at all, I think this is a positive result: KissFFT simply has faster algorithms than any of the plain Javascript versions, and the compilation process is robust enough to preserve this advantage. The case of FFTW is stranger, but may have something to do with its relatively large girth.

If code size is a primary concern, the Nayuki code appears to be an effective implementation of the core algorithm.

The code produced by Emscripten, in the versions compiled from C, is asm.js. Firefox gives this restricted subset of Javascript special treatment, and it appears to be possible to see how much difference that makes: if you enable the developer console, a message appears warning you that asm.js optimisations are disabled while the console is active. The results are then 10% or so slower.

Overall we’re seeing between 2.4 and 5.4 times slowdown compared to a fast native-code version. This may not be a great use of CPU and battery if signal processing is your application’s main job, but good enough to make some interesting things.

Any thoughts?