MIREX 2018 submissions

The 2018 edition of MIREX, the Music Information Retrieval Evaluation eXchange, was the sixth in a row for which we at the Centre for Digital Music submitted a set of Vamp audio analysis plugins for evaluation. For the third year in a row, the set of plugins we submitted was entirely unchanged — these are increasingly antique methods, but we have continued to submit them with the idea that they could provide a useful year-on-year baseline at least. It also gives me a good reason to take a look at the MIREX results and write this little summary post, although I’m a bit late with it this year, having missed the end of 2018 entirely!

For reference, the past five years’ posts can be found at: 2017, 2016, 2015, 2014, and 2013.

Structural Segmentation

No results appear to have been published for this task in 2018; I don’t know why. Last time around, ours was the only entry. Maybe it was the only entry again, and since it was unchanged, there was no point in running the task.

Multiple Fundamental Frequency Estimation and Tracking

After 2017’s feast with 14 entries, 2018 is a famine with only 3, two of which were ours and the third of which (which I can’t link to, because its abstract is missing) was restricted to a single subtask, in which it got reasonable results. Results pages are here and here.

Audio Onset Detection

Almost as many entries as last time, and a new convolutional network from Axel Röbel et al disrupts the tidy sweep of Sebastian Böck’s group at the top of the results table. Our simpler methods are squarely at the bottom this time around. Röbel’s submission has a nice informative abstract which casts more light on the detailed result sets and is well worth a read. Results here.

Audio Beat Tracking

Pure consolidation: all the 2018 entries are repeats from 2017, and all perform identically (with the methods from Böck et al doing better than our plugins). Every year I say that this doesn’t feel like a solved problem, and it still doesn’t — the results we’re seeing here still don’t seem all that close to human performance, but perhaps there are misleading properties to the evaluation. Results here, here, here.

Audio Tempo Estimation

This is a busier category, with a new dataset and a few new submissions. The new dataset is most intriguing: all of the submissions perform better with the new dataset than the older one, except for our QM Tempo Tracker plugin, which performs much, much worse with the new one than the old!

I believe the new dataset is of electronic dance music, so it’s likely that much of it is high tempo, perhaps tripping our plugin into half-tempo octave errors. We could probe this next time by tweaking the submission protocol a little. Submissions are asked to output two tempo estimates, and the results report whether either of them was correct. Because our plugin only produces one estimate, we lazily submit half of that estimate as our second estimate (with a much lower salience score). But if our single estimate was actually half of the “true” value, as is plausible for fast music, we would see better scores from submitting double instead of half as the second estimate.

Results are here and here.

Audio Key Detection

Some novelty here from a pair of template-based methods from the Universitat Autonoma de Barcelona, one attributed to Galin and Castells-Rufas and the other to Castells-Rufas and Galin. Their performance is not a million miles away from our own template-based key estimation plugin.

The strongest results appear to be from a neural network method from Korzeniowski et al at JKU, an updated version of one of last year’s better-performing submissions, an implementation of which can be found in the madmom library.

Results are here.

Audio Chord Estimation

A lively (or daunting) category. A team from Fudan University in Shanghai, whence came two of the previous year’s strongest submissions, is back with another new method, an even stronger set of results, and once again a very readable abstract; and the JKU team have an updated model, just as in the key detection category, which also performs extremely impressively. Meanwhile a separate submission from JKU, due to Stefan Gasser and Franz Strasser, would have been at the very top had it been submitted a year earlier, but is now a little way behind. Convolutional neural networks are involved in all of these.

Our Chordino submission can still be described as creditable. Results can be found here.