For the 2019 edition of MIREX, the Music Information Retrieval Evaluation eXchange, we at the Centre for Digital Music once again submitted a set of Vamp audio analysis plugins for evaluation. This is the seventh year in a row in which we’ve done so, and the fourth in which no completely new plugin has been added to the lineup. Although these methods are therefore getting more and more out-of-date, they do provide a potentially useful baseline for other submissions, a sanity check on the evaluation itself, and some historical colour.
Every year I write up the outcomes in a blog post. Like last year, I’m rather late writing this one. That’s partly because the official results page is still lacking a couple of categories, and says “More results are coming” at the top — I’m beginning to think they might not be, and decided not to wait any longer. (MIREX is volunteer-run, so this is just a remark, not a complaint.)
Again no results have been published for this task. Last year I speculated that ours might have been the only entry, and since we submit the same one every year, there’s no point in re-running it if nobody else enters. Pity, this ought to be an interesting category.
Multiple Fundamental Frequency Estimation and Tracking
A rebound! Two years ago there were 14 entries here, last year only three: this year we’re back up to 12, including our two (both consisting of the Silvet plugin, in “live” and standard modes).
This category is famously difficult and I think still invites interesting approaches. An impressive submission from Anton Runov (linked abstract is worth reading) uses an approach based on visual object detection using the spectrogram as an image. Treating a spectrogram as an image is typical enough, but this particular method is new to me (having little exposure to rapid object detection algorithms). The code for this has been published, in C++ under the AGPL — I tried it, it seems like good code, builds cleanly, worked for me. Nice job.
Another interesting set of submissions achieving similar performance is that from Steiner, Jalalvand, and Birkholz (abstract also well worth a read) using “echo state networks”. An ESN appears to be like a recurrent neural network in which only the output weights are trained, input and internal weights remaining random.
Our own submissions are some way behind these methods, but there’s plenty of room for improvement ahead of them as well: I think the best submissions from 2017’s bumper crop still performed a little better than any from this year, and perfection is still well out of reach. (At least among labs that submit things to MIREX. Who knows what Google are up to by now.)
Audio Onset Detection
No results have (yet?) been published for this task.
Audio Beat Tracking
Audio Tempo Estimation
No results are yet available for this one either.
We made a tiny change to the submission protocol for our plugin this year (as foreshadowed in my post last year, I changed the calculation of the second estimate to be double instead of half of the first, in cases where the first estimate was below an arbitrary 100bpm) and I was curious what difference it made. I’ll update this if I notice any results having been published.
Audio Key Detection
We actually submitted a “new” plugin for this category: a version of the QM Key Detector containing a fix to chromagram initialisation provided by Daniel Schürmann, working in the Mixxx project. We submitted both “old” (same as last year) and “new” (with fix) versions, and saw significantly better results from the fixed version in all five test sets. So thank you, Daniel.
The most interesting submission, from Jiang, Xia, and Carlton, actually seems to be a presentation of a new(ish?) crowd-annotated dataset, used to train a key detection CRNN. It gets good results, with the rather critical caveat that the crowd-sourced training dataset could overlap with the MIREX test data. It’s not clear from the abstract whether the dataset is publicly available — I think it may be accessible via a developer API from the company (Hooktheory) that put it together.
Results are here.
Audio Chord Estimation
Last year was busy, this year isn’t: it sees only one submission besides ours, a straightforward CNN from the MIR Lab at National Taiwan University, whose performance is roughly comparable to our own Chordino. Results here.