Last year, Luís Figueira and I experimentally submitted a batch of audio analysis methods, implemented in Vamp plugins developed over the past few years at the C4DM, to the Music Information Retrieval Evaluation Exchange (MIREX). I found the process interesting and wrote an article about the results.
I wasn’t sure whether to do a repeat submission this year—most of the plugins would be the same—but Simon Dixon persuaded me. The test datasets might change; it might be interesting to see whether results are consistent from one year to the next; and it’s always good to provide one more baseline for other submissions to compare themselves against. So I dusted off last year’s submission scripts, added the new Silvet note transcription plugin, and submitted them.
Here goes with the outcomes. There is also an overview poster published by MIREX. See last year’s article for more information about what the tasks consist of.
Multiple Fundamental Frequency Estimation and Tracking
The only category we didn’t submit to last year. This is the problem of deducing which notes are being played, and at what times, in music where more than one note happens at once. I submitted the Silvet plugin which is based on a method by Emmanouil Benetos that had performed well in MIREX in an earlier year.
The results for this category are divided into two parts, multiple fundamental frequency estimation and note tracking. I submitted a script only for the note tracking part. I would describe the performance of our plugin as “correct”, in that it was reliably mid-pack across the board, pretty good for piano transcription, and generally marginally better than the MIREX 2012 submission which inspired it.
This was a fairly popular category this year, and one submission in particular improved quite substantially on previous years’ results—it may be no coincidence that that submission’s abstract employs the phrase-of-the-moment deep learning.
Audio Onset Detection
The same two submissions as last year (OnsetsDS and QM Onset Detector) and exactly the same results—the test dataset is unchanged and the plugins are entirely deterministic. Last year I remarked that our methods are quite old and other submissions should improve on them over time, but this year’s top methods were actually no improvement on last year’s.
Audio Beat Tracking
Again the same two submissions as last year (BeatRoot and QM Tempo Tracker) and exactly the same results (1, 2, 3), behind the front-runners but still reasonably competitive. While the best-performing methods continue to advance, it’s clear that beat tracking is still not a solved problem.
Audio Key Detection
Last year we entered a plugin that wasn’t expected to do very well here, and it swept the field. This year everyone else seems to have dropped out, so our repeat submission was in fact the only entry! (It got the same results as last year.)
Audio Chord Estimation
This is interesting partly because our submission (Chordino) performed very well last year but the evaluation metric has since changed.
Sadly, there were only three submissions this year. Chordino still looks good in all three datasets (1, 2, 3) but it is now ranked second rather than first for all three. I’m a bit disappointed that the new leading submission seems to be lacking a descriptive abstract.
Categories we could have entered but didn’t
Audio Melody Extraction
Last year’s submission wasn’t really good enough to repeat.
Audio Downbeat Estimation
I overlooked this task, which was new this year. Otherwise I could have submitted the QM Bar and Beat Tracker plugin.
Audio Tempo Estimation, Structural Segmentation
These categories had an earlier submission deadline than the rest, and stupidly I missed it.