MIREX 2019 submissions

For the 2019 edition of MIREX, the Music Information Retrieval Evaluation eXchange, we at the Centre for Digital Music once again submitted a set of Vamp audio analysis plugins for evaluation. This is the seventh year in a row in which we’ve done so, and the fourth in which no completely new plugin has been added to the lineup. Although these methods are therefore getting more and more out-of-date, they do provide a potentially useful baseline for other submissions, a sanity check on the evaluation itself, and some historical colour.

Every year I write up the outcomes in a blog post. Like last year, I’m rather late writing this one. That’s partly because the official results page is still lacking a couple of categories, and says “More results are coming” at the top — I’m beginning to think they might not be, and decided not to wait any longer. (MIREX is volunteer-run, so this is just a remark, not a complaint.)

You can find my writeups of past years here: 2018, 2017, 2016, 2015, 2014, and 2013.

Structural Segmentation

Again no results have been published for this task. Last year I speculated that ours might have been the only entry, and since we submit the same one every year, there’s no point in re-running it if nobody else enters. Pity, this ought to be an interesting category.

Multiple Fundamental Frequency Estimation and Tracking

A rebound! Two years ago there were 14 entries here, last year only three: this year we’re back up to 12, including our two (both consisting of the Silvet plugin, in “live” and standard modes).

This category is famously difficult and I think still invites interesting approaches. An impressive submission from Anton Runov (linked abstract is worth reading) uses an approach based on visual object detection using the spectrogram as an image. Treating a spectrogram as an image is typical enough, but this particular method is new to me (having little exposure to rapid object detection algorithms). The code for this has been published, in C++ under the AGPL — I tried it, it seems like good code, builds cleanly, worked for me. Nice job.

Another interesting set of submissions achieving similar performance is that from Steiner, Jalalvand, and Birkholz (abstract also well worth a read) using “echo state networks”. An ESN appears to be like a recurrent neural network in which only the output weights are trained, input and internal weights remaining random.

Our own submissions are some way behind these methods, but there’s plenty of room for improvement ahead of them as well: I think the best submissions from 2017’s bumper crop still performed a little better than any from this year, and perfection is still well out of reach. (At least among labs that submit things to MIREX. Who knows what Google are up to by now.)

Results pages are here and here.

Audio Onset Detection

No results have (yet?) been published for this task.

Audio Beat Tracking

Another quiet year, with Sebastian Böck’s repeat submission still ahead. Results are here and here.

Audio Tempo Estimation

No results are yet available for this one either.

We made a tiny change to the submission protocol for our plugin this year (as foreshadowed in my post last year, I changed the calculation of the second estimate to be double instead of half of the first, in cases where the first estimate was below an arbitrary 100bpm) and I was curious what difference it made. I’ll update this if I notice any results having been published.

Audio Key Detection

We actually submitted a “new” plugin for this category: a version of the QM Key Detector containing a fix to chromagram initialisation provided by Daniel Schürmann, working in the Mixxx project. We submitted both “old” (same as last year) and “new” (with fix) versions, and saw significantly better results from the fixed version in all five test sets. So thank you, Daniel.

The most interesting submission, from Jiang, Xia, and Carlton, actually seems to be a presentation of a new(ish?) crowd-annotated dataset, used to train a key detection CRNN. It gets good results, with the rather critical caveat that the crowd-sourced training dataset could overlap with the MIREX test data. It’s not clear from the abstract whether the dataset is publicly available — I think it may be accessible via a developer API from the company (Hooktheory) that put it together.

Results are here.

Audio Chord Estimation

Last year was busy, this year isn’t: it sees only one submission besides ours, a straightforward CNN from the MIR Lab at National Taiwan University, whose performance is roughly comparable to our own Chordino. Results here.

 

MIREX 2018 submissions

The 2018 edition of MIREX, the Music Information Retrieval Evaluation eXchange, was the sixth in a row for which we at the Centre for Digital Music submitted a set of Vamp audio analysis plugins for evaluation. For the third year in a row, the set of plugins we submitted was entirely unchanged — these are increasingly antique methods, but we have continued to submit them with the idea that they could provide a useful year-on-year baseline at least. It also gives me a good reason to take a look at the MIREX results and write this little summary post, although I’m a bit late with it this year, having missed the end of 2018 entirely!

For reference, the past five years’ posts can be found at: 2017, 2016, 2015, 2014, and 2013.

Structural Segmentation

No results appear to have been published for this task in 2018; I don’t know why. Last time around, ours was the only entry. Maybe it was the only entry again, and since it was unchanged, there was no point in running the task.

Multiple Fundamental Frequency Estimation and Tracking

After 2017’s feast with 14 entries, 2018 is a famine with only 3, two of which were ours and the third of which (which I can’t link to, because its abstract is missing) was restricted to a single subtask, in which it got reasonable results. Results pages are here and here.

Audio Onset Detection

Almost as many entries as last time, and a new convolutional network from Axel Röbel et al disrupts the tidy sweep of Sebastian Böck’s group at the top of the results table. Our simpler methods are squarely at the bottom this time around. Röbel’s submission has a nice informative abstract which casts more light on the detailed result sets and is well worth a read. Results here.

Audio Beat Tracking

Pure consolidation: all the 2018 entries are repeats from 2017, and all perform identically (with the methods from Böck et al doing better than our plugins). Every year I say that this doesn’t feel like a solved problem, and it still doesn’t — the results we’re seeing here still don’t seem all that close to human performance, but perhaps there are misleading properties to the evaluation. Results here, here, here.

Audio Tempo Estimation

This is a busier category, with a new dataset and a few new submissions. The new dataset is most intriguing: all of the submissions perform better with the new dataset than the older one, except for our QM Tempo Tracker plugin, which performs much, much worse with the new one than the old!

I believe the new dataset is of electronic dance music, so it’s likely that much of it is high tempo, perhaps tripping our plugin into half-tempo octave errors. We could probe this next time by tweaking the submission protocol a little. Submissions are asked to output two tempo estimates, and the results report whether either of them was correct. Because our plugin only produces one estimate, we lazily submit half of that estimate as our second estimate (with a much lower salience score). But if our single estimate was actually half of the “true” value, as is plausible for fast music, we would see better scores from submitting double instead of half as the second estimate.

Results are here and here.

Audio Key Detection

Some novelty here from a pair of template-based methods from the Universitat Autonoma de Barcelona, one attributed to Galin and Castells-Rufas and the other to Castells-Rufas and Galin. Their performance is not a million miles away from our own template-based key estimation plugin.

The strongest results appear to be from a neural network method from Korzeniowski et al at JKU, an updated version of one of last year’s better-performing submissions, an implementation of which can be found in the madmom library.

Results are here.

Audio Chord Estimation

A lively (or daunting) category. A team from Fudan University in Shanghai, whence came two of the previous year’s strongest submissions, is back with another new method, an even stronger set of results, and once again a very readable abstract; and the JKU team have an updated model, just as in the key detection category, which also performs extremely impressively. Meanwhile a separate submission from JKU, due to Stefan Gasser and Franz Strasser, would have been at the very top had it been submitted a year earlier, but is now a little way behind. Convolutional neural networks are involved in all of these.

Our Chordino submission can still be described as creditable. Results can be found here.

 

MIREX 2017 submissions

For the fifth year in a row, this year the Centre for Digital Music submitted a number of Vamp audio analysis plugins to the MIREX evaluation for “music information retrieval” tasks. This year we submitted the same set of plugins as last year; there were no new implementations, and some of the existing ones are so old as to have celebrated their tenth birthday earlier in the year. So the goal is not to provide state-of-the-art results, but to give other methods a stable baseline for comparison and to check each year’s evaluation metrics and datasets against neighbouring years. I’ve written about this in each of the four previous years: see posts about 2016, 2015, 2014, and 2013.

Obviously, having submitted exactly the same plugins as last year, we expect basically the same results. But the other entries we’re up against will have changed, so here’s a review of how each category went.

(Note: we dropped one category this year, Audio Downbeat Estimation. Last year’s submission was not well prepared for reasons I touched on in last year’s post, and I didn’t find time to rework it.)

Structural Segmentation

Results for the four datasets are here, here, here, and here. Our results, for Segmentino from Matthias Mauch and the older QM Segmenter from Mark Levy, were the same as last year, with the caveat that the QM Segmenter uses random initialisation and so never gets exactly the same results twice.

Surprisingly, nobody else entered anything to this category this year, which seems a pity because it’s an interesting problem. This category seems to have peaked around 2012-2013.

Multiple Fundamental Frequency Estimation and Tracking

An exciting year for this mind-bogglingly difficult category, with 14 entries from ten different sets of authors and a straight fight between template decomposition methods (including our Silvet plugin, from Emmanouil Benetos’s work) and trendy convolutional neural networks. Results are here and here.

With so many entries and evaluations it’s not that easy to get a clear picture, and no single method appears to be overwhelmingly strong. There were fine results in some evaluations for CNN methods from Thickstun et al and Thomé and Ahlbäck, for Pogorelyuk and Rowley‘s very intriguing “Dynamic Mode Decomposition”, and for a few others whose abstracts are missing from the entry site and so can’t be linked to.

Silvet, with the same results as last year, does well enough to be interesting, but in most cases it isn’t troubling the best of the newer methods.

Audio Onset Detection

Bit of a puzzle here, as our two plugin submissions both got slightly different results from last year despite being unchanged implementations of deterministic methods invoked in the same way on the same data sets.

Last year saw a big expansion in the number of entries, and this year there were nearly as many. Just as last year, our old plugins did modestly, but again some of the new experiments fared a bit less well so we weren’t quite at the bottom. Results here.

Audio Beat Tracking

Same puzzle as in onset detection: while our results were basically similar to last year, they weren’t identical. The 2015 and 2016 results were identical and we would have expected the same again in 2017.

That apart, there’s little to report since last year. Results are here, here, and here.

Audio Tempo Estimation

Last year there were two entries in this category, ours and a much stronger one from Sebastian Böck. This year sees one addition, from Hendrick Schreiber and Meinard Müller, which fares creditably. The results are here.

Audio Key Detection

Two pretty successful new submissions this year, both using convolutional neural networks: one from Korzeniowski, Böck, Krebs and Widmer, and the other from Hendrik Schreiber. Our old plugin (from work by Katy Noland) does not fare tragically, but it’s clear that some other methods are getting much closer to the sort of performance one imagines should be realistic. The results are linked from here.

Intuitively, key estimation seems like the sort of problem that is interesting only so long as you don’t have enough training data. As a 24-way classification with large enough training datasets, it looks a bit mundane. The problem becomes, what does it mean for a piece of music to be in a particular key anyway? Submissions are not expected to answer that, but presumably it sets an upper bound on performance.

Audio Chord Estimation

Another increase in the number of test datasets, from 5 to 7, and a strong category again. Last year our submission Chordino (by Matthias Mauch) was beginning to trail, though it wasn’t quite at the back. This year some of the weaker submissions have not been repeated, some new entries have appeared, and Chordino is in last place for every evaluation. It’s not far behind — perceptually it’s still a pretty good algorithm — but some of the other methods are very impressive now. Here are the results.

The abstracts accompanying the two submissions from the audio information processing group at Fudan University in Shanghai (Jiang, Li and Wu and Wu, Feng and Li) are both well worth a read. The former paper refers closely to Chordino, using the same NNLS Chroma features with a new front-end. Meanwhile, the latter paper proposes a method worth remembering for dinner parties, using deep residual networks trained from MIDI-synchronised constant-Q representations of audio with a bidirectional long-short-term memory and conditional random field for labelling.

 

Sonic Visualiser 3.0, at last

Finally!

(See previous posts: Help test the Sonic Visualiser v3.0 beta, A second beta of Sonic Visualiser v3.0, A third beta of Sonic Visualiser v3.0, and Yes, there’s a fourth beta of Sonic Visualiser v3.0 now)

No doubt, now that the official release is out, some horrible problem or other will come to light. It wouldn’t be the first time: Sonic Visualiser v2.4 went through a beta programme before release and still had to be replaced with v2.4.1 after only a week. These things happen and that’s OK, but for now I’m feeling good about this one.

 

Yes, there’s a fourth beta of Sonic Visualiser v3.0 now

Previously I wrote about the third Sonic Visualiser v3.0 beta release:

“This may well be the final beta, so if you’re seeing problems with it, please do report them while there’s still time!”

Well some very kind people did report problems, and so that was not the final beta. A fourth one is now up for download. Here are the download URLs:

Fixes since the third beta

  • Fix a nasty crash in session I/O in the 64-bit Windows build (this is the main reason for the new beta)
  • Provide more log information about audio drivers to the debug log file
  • Fix a very occasional one-sample-too-short error in resampling audio files during load
  • Fix invisible measure tool crosshairs on spectrogram
  • Fix a possible memory leak in the spectrogram

Keep the bug reports coming!

This one really could be the final beta! So please do report any troubles you have with it. Drop me a line, post a comment below this article, or use the SourceForge bug tracker. And thank you!

 

A third beta of Sonic Visualiser v3.0

Update – 23rd Feb: We have a fourth beta now!

After a short break, we have a third beta of the forthcoming v3.0 release of Sonic Visualiser. Downloads here:

Bugs fixed, and other changes made since the second beta

  • Sonic Visualiser could hang when trying to initialise a transform that refused the first choice of initialisation parameters
  • Error handling for problems in running transforms has been improved in general
  • The Colour 3D Plot layer was sometimes pathologically slow to update
  • The “Normalise Visible Area” option in the Colour 3D Plot layer wasn’t working
  • The visual rendering style of some layers has been improved when viewed on high-resolution screens without pixel doubling
  • A new feature has snuck in, under cover of fixing a rendering offset problem in the spectrum layer: it is now possible (although cumbersome) to zoom the spectrum layer in the frequency axis
  • The process of overhauling the Help Reference documentation to properly describe the new release has begun

Let us know what else you find!

This may well be the final beta, so if you’re seeing problems with it, please do report them while there’s still time!

Drop me a line, post a comment below this article, or use the SourceForge bug tracker.

(This post is a follow-up to “Help test the Sonic Visualiser v3.0 beta” and “A second beta of Sonic Visualiser v3.0“.)