A second beta of Sonic Visualiser v3.0

Here’s a second beta release of Sonic Visualiser v3.0:

Bugs found in the first beta and fixed for the second

  • The peak-frequency spectrogram rendered the entire track into the first 1/8th of its length, and showed nothing after that. (The cause of this might make a marginally interesting technical post in its own right)
  • A similar effect was exhibited by Colour 3D Plot layers, but only at very close zoom levels
  • When the Windows build had been used to view an mp3 file, it would subsequently crash on exit
  • All platforms could hang on startup if certain plugins were installed (the Fan Chirp plugin from the Universidad de la República in Uruguay was one example, though it wasn’t the fault of the plugin)
  • The playback/record level meters were very flickery
  • The source package didn’t build on Fedora Linux

What other problems have you spotted?

Let us know! Drop me a line, post a comment below this article, or use the SourceForge bug tracker.

(This post is a follow-up to “Help test the Sonic Visualiser v3.0 beta“)

Help test the Sonic Visualiser v3.0 beta

A first beta release of Sonic Visualiser v3.0 is now available for download, and we’d love to get your feedback.

Sonic Visualiser v3.0beta1 on Windows

Sonic Visualiser is a free, open-source desktop application for close study and annotation of music audio recordings, developed in the Centre for Digital Music at Queen Mary, University of London. It’s been available for about a decade now, and v3.0 will be one of the most substantial updates it’s ever had. This should be a really good release, but we need to hear about the problems other people have with the beta versions before we can be sure of that.

Get it here

Update – 17th Jan: These are not the latest links any more: there is now a second beta! See here for details.

The first beta can be downloaded from the Sound Software code site:

There will be Linux binaries as well, but I’m still working on packaging for those. Watch this space. (Update: there is now an Ubuntu package linked above. I’d like to be making more options available, not least because I don’t actually use Ubuntu myself, but this is a start.)

Note that the beta pops up a dialog each time you run it to remind you that it’s a beta. Sorry about that, I know it might be annoying.

What’s changed

Here’s the list of changes since the last release.

Besides some new features and a lot of bug fixes, there are a few interesting internal changes:

  • Everything to do with sample indexing now uses 64-bit offsets, and it’s possible to load very long audio files that wouldn’t have worked in the previous release
  • Audio analysis plugins are now run with process separation so a misbehaving plugin should no longer be able to crash the host
  • It’s now possible to record audio as well as play it, and to select the record and playback devices in the preferences
  • The user interface now adapts fully to hi-dpi (“retina”) displays on all three platforms
  • For the first time the Windows version is natively 64-bit (if your Windows installation is, and almost all Windows installations are nowadays) — while still being able to use any 32-bit Vamp plugins you have installed

I’m quite excited about this release, so now I need to hear all your deflating reports about the things that aren’t working!

What we particularly need feedback on

  • Problems installing or running the application at all!
  • Problems running plugins that worked with a previous version
  • Problems playing or recording audio, glitches, error dialogs with complaints about audio drivers
  • Any crashes or other error dialogs
  • Any unexpectedly slow performance while showing analyses or running plugins

Note for Linux users

I mentioned above that I’m still working on packaging for Linux. That process also includes overhauling the INSTALL-file instructions, which are not quite up-to-date. If you look at the series of commands carried out in the Docker script at deploy/linux/docker/Dockerfile.ubuntu64 in the source tree, you’ll get an idea of what needs to be done to compile as things stand.

How to report problems

Use the venerable SourceForge bug tracker, or for quick reports you could just post a comment below, send me an email, tweet at me, etc.

For any problems that arise when using a specific file (audio or annotation), it’s massively helpful if you can attach an example file that exhibits the problem. In general, listing any steps to take to reproduce a bug (even if it seems to you that the bug must be so obvious that nobody could ever have missed it) is very useful indeed.

If you run into something and you’re not sure whether it’s a bug or you’re just being stupid, please do report it anyway. A program that makes you feel stupid is already wrong on some level, though I’m all too aware that Sonic Visualiser can do that sometimes because it is a bit overcomplicated in places.

Things we haven’t done yet

We had hoped to devise an easier way to obtain and install plugins in time for this release, and recent survey feedback suggested this would be a very welcome thing for many prospective users. Sadly we haven’t been able to do anything in that area yet, but I hope we may be able to soon.

MIREX 2016 submissions

This year, for the fourth year in a row, we submitted a number of Vamp audio analysis plugins published by the Centre for Digital Music to the annual MIREX evaluation. The motivation is to give other methods a baseline to compare against, to compare one year’s evaluation metrics and datasets against the next year’s, and to give our group a bit of visibility. See my posts about this process in 2015, 2014, and 2013.

Here’s a review of how we got on this year. We entered an extra category compared to last year, a makeshift entry in the audio downbeat estimation task, making this the widest range of categories we’ve covered with these plugins in MIREX so far.

Structural Segmentation

Results for the four datasets are here, here, here, and here. I don’t find the evaluations any easier to follow than I did last year, but I can see that both of our submissions (Segmentino from Matthias Mauch and the older QM Segmenter from Mark Levy) produced the same results as expected from previous years.

Segmentino actually comes across well in this year’s results, not least because the authors of last year’s best method (Thomas Grill and Jan Schlüter) didn’t submit anything this time.

Multiple Fundamental Frequency Estimation and Tracking

Results here and here. Our Silvet plugin performed much as before: reasonably well, though as usual in such a hard task, with hugely varying results from one test case to another.

Audio Onset Detection

Results here. Many more submissions than last year, which was already a broader field
than the year before. Our two old plugins score the same as they did last year, but are no longer placed last, as three of the new submissions have lower scores.

Audio Beat Tracking

Results here, here, and here. Our BeatRoot and QM Tempo Tracker are once again placed near the back. There’s little change from last year at the top, still occupied by the work of Sebastian Böck and Florian Krebs — work which the authors have, to their great credit, made available as freely-licensed, readable, and well-documented Python code in the madmom library.

Audio Tempo Estimation

Results here. Only two entries this year, our QM Tempo Tracker and Sebastian Böck’s entry from the aforementioned madmom.

Audio Downbeat Estimation

Results here. In this category we submitted the QM Bar and Beat Tracker plugin by Matthew Davies, which has been around for a few years; it’s based on the QM Tempo Tracker with an additional downbeat estimator.

The results don’t come across very well, for varying reasons according to the dataset. The QM Bar and Beat Tracker needs to be prompted with the time signature and (following a last-minute decision to enter the category this year) I submitted a script which assumed fixed 4/4 time. This meant we knowingly threw away the Ballroom category, which was all 3/4, but the plugin was also ill-suited to several of the other categories. Not a strong submission then, but interesting to see.

Audio Key Detection

Results here and here. Last year I lamented the lack of any other entries than ours, since the category had just gained a second (and more realistic) test dataset. So I’m delighted to see a couple of new submissions this year, including one from Gilberto Bernardes and Matthew Davies at INESC in Porto which appears to perform well.

Audio Chord Estimation

Results here, now up to five test datasets. Last year saw a torrid time with a bug in the Chordino plugin, but this year it’s back to normal. Chordino still performs well, but in a strong category this year it’s no longer one of the top performers.


MIREX 2015 submissions

For the past three years now, we’ve taken a number of Vamp audio analysis plugins published by the Centre for Digital Music and submitted them to the annual MIREX evaluation. The motivation is to give other methods a baseline to compare against, to compare one year’s evaluation metrics and datasets against the next year’s, and to give our group a bit of visibility. See my posts about this process in 2014 and in 2013.

Here are this year’s outcomes. All these categories are ones we had submitted to before, but I managed to miss a couple of category deadlines last year, so in total we had more categories than in either 2013 or 2014.

Structural Segmentation

Results for the four datasets are here, here, here, and here. This is one of the categories I missed last year and, although I find the evaluations quite hard to understand, it’s clear that the competition has moved on a bit.

Our own submissions, the Segmentino plugin from Matthias Mauch and the much older QM Segmenter from Mark Levy, produced the expected results (identical to 2013 for Segmentino; similar for QM Segmenter, which has a random initialisation step). As before, Segmentino obtains the better scores. There was only one other submission this year, a convolutional neural network based approach from Thomas Grill and Jan Schlüter which (I think) outperformed both of ours by some margin, particularly on the segment boundary measure.

Multiple Fundamental Frequency Estimation and Tracking

Results here and here. In addition to last year’s submission for the note tracking task of this category, this year I also scripted up a submission for the multiple fundamental frequency estimation task. Emmanouil Benetos and I had made some tweaks to the Silvet plugin during the year, and we also submitted a new fast “live mode” version of it. The evaluation also includes a new test dataset this year.

Our updated Silvet plugin scores better than last year’s version in every test they have in common, and the “live mode” version is actually not all that far off, considering that it’s very much written for speed. (Nice to see a report of run times in the results page — Silvet live mode is 15-20 times as fast as the default Silvet mode and more than twice as fast as any other submission.) Emmanouil’s more recent research method does substantially better, but this is still a pleasing result.

This category is an extremely difficult one, and it’s also painfully difficult to get good test data for it. There’s plenty of potential here, but it’s worth noting that a couple of the authors of the best submissions from last year were not represented this year — in particular, if Elowsson and Friberg’s 2014 method had appeared again this year, it looks as if it would still be at the top.

Audio Onset Detection

Results here. Although the top scores haven’t improved since last year, the field has broadened a bit — it’s no longer only Sebastian Böck vs the world. Our two submissions, both venerable methods, are now placed last and second-last.

Oddly, our OnsetsDS submission gets slightly better results than last year despite being the same, deterministic, implementation (indeed exactly the same plugin binary) run on the same dataset. I should probably check this with the task captain.

Audio Beat Tracking

Results here, here, and here. Again the other submissions are moving well ahead and our BeatRoot and QM Tempo Tracker submissions, producing unchanged results from last year and the year before, are now languishing toward the back. (Next year will see BeatRoot’s 15th birthday, by the way.) The top of the leaderboard is largely occupied by a set of neural network based methods from Sebastian Böck and Florian Krebs.

This is a more interesting category than it gets credit for, I think — still improving and still with potential. Some MIREX categories have very simplistic test datasets, but this category introduced an intentionally difficult test set in 2012 and it’s notable that the best new submissions are doing far better here than the older ones. I’m not quite clear on how the evaluation process handles the problem of what the ground truth represents, and I’d love to know what a reasonable upper bound on F-measure might be.

Audio Tempo Estimation

Results here. This is another category I missed last year, but we get the same results for the QM Tempo Tracker as we did in 2013. It still does tolerably well considering its output isn’t well fitted to the evaluation metric (which rewards estimators that produce best and second-best estimates across the whole piece).

The top scorer here is a neural network approach (spotting a theme here?) from Sebastian Böck, just as for beat tracking.

Audio Key Detection

Results here and here. The second dataset is new.

The QM Key Detector gets the same results as last year for the dataset that existed then. It scores much worse on the new dataset, which suggests that may be a more realistic test. Again there were no other submissions in this category — a pity now that it has a second dataset. Does nobody like key estimation? (I realise it’s a problematic task from a musical point of view, but it does have its applications.)

Audio Chord Estimation

Poor results for Chordino because of a bug which I went over at agonising length in my previous post. This problem is now fixed in Chordino v1.1, so hopefully it’ll be back to its earlier standard in 2016!

Some notes

Neural networks

… are widely-used this year. Several categories contained at least one submission whose abstract referred to a convolutional or recurrent neural network or deep learning, and in at least 5 categories I think a neural network method can reasonably be said to have “won” the category. (Yes I know, MIREX isn’t a competition…)

  • Structural segmentation: convolutional NN performed best
  • Beat tracking: NNs all over the place, definitely performing best
  • Tempo estimation: NN performed best
  • Onset detection: NN performed best
  • Multi-F0: no NNs I think, but it does look as if last year’s “deep learning” submission would have performed better than any of this year’s
  • Chord estimation: NNs present, but not yet quite at the top
  • Key detection: no NNs, indeed no other submissions at all

Categories I missed

  • Audio downbeat estimation: I think I just overlooked this one, for the second year in a row. As last year, I should have submitted the QM Bar & Beat Tracker plugin from Matthew Davies.
  • Real-time audio to score alignment: I nearly submitted the MATCH Vamp Plugin for this, but actually it only produces a reverse path (offline alignment) and not a real-time output, even though it’s a real-time-capable method internally.

Other submissions from the Centre for Digital Music

Emmanouil Benetos submitted a well-performing method, mentioned above, in the Multiple Fundamental Frequency Estimation & Tracking category.

Apart from that, there appear to be none.

This feels like a pity — evaluation is always a pain and it’s nice to get someone else to do some of it.

It’s also a pity because several of the plugins I’m submitting are getting a bit old and are falling to the bottom of the results tables. There are very sound reasons for submitting them (though I may drop some of the less well performing categories next year, assuming I do this again) but it would be good if they didn’t constitute the only visibility QM researchers have in MIREX.

Why would this be the case? I don’t really know. The answer presumably must include some or all of

  • not working on music informatics signal-processing research at all
  • working on research that builds on feature extractors, rather than building the feature extractors themselves
  • research not congruent with MIREX tasks (e.g. looking at dynamics or articulations rather than say notes or chords)
  • research uses similar methods but not on mainstream music recordings (e.g. solo singing, animal sounds)
  • state-of-the-art considered good enough
  • lack the background to compete with current methods (e.g. the wave of NNs) and so sticking with progressive enhancements of existing models
  • lack the data to compete with current methods
  • not aware of MIREX
  • not prioritised by supervisor

The last four reasons would be a problem, but the rest might not be. It could really be that MIREX isn’t very relevant to the people in this group at the moment. I’ll have to see what I can find out.

New software releases all around

A few months ago (in February!!) I wrote a post called Unreleased project pile-up that gave a pretty long list of software projects I’d been working on that could benefit from a proper release. It ended: let’s see how many of these I can tidy up & release during the next few weeks. The answer: very few.

During the past couple of weeks I’ve finally managed to make a bit of a dent, crossing off these applications from the list:

along with these earlier in the year:

and one update that wasn’t on the list:

Apart from the Python Vamp host, those all fall into the category of “overdue updates”. I haven’t managed to release as much of the actually new software on the list.

One “overdue update” that keeps getting pushed back is the next release of Sonic Visualiser. This is going to be quite a major release, featuring audio recording (a feature I once swore I would never add), proper support for retina and hi-dpi displays with retina rendering in the view layers and a new set of scalable icons, support for very long audio files (beyond the 32-bit WAV limit), a unit conversion window to help convert between units such as Hz and MIDI pitch, and a few other significant features. There’s a little way to go before release yet though.

Unreleased project pile-up

Several of the software projects I’ve been working on at the Centre for Digital Music are in need of a new release.

I ran some queries on the SoundSoftware code site, where much of my code lives, to find

  • projects I’m a member of that have seen some work (in the form of repository commits) more recently than have seen a file release, and
  • projects I’m a member of that haven’t seen a file release at all

These returned 28 and 100 projects respectively.

I feel I’ve been a bit lax.

These of course include things that aren’t releasable yet, never will be, or don’t need to be packaged up into releases for one reason or another, as well as projects I don’t actively participate in or am not responsible for making releases of. I eventually eliminated about half of the first list and 93% of the second one.

These are the ones that remained—things that could usefully be released and which I am generally responsible for releasing. Let’s see how many of these I can tidy up & release during the next few weeks:

Existing stuff that could do with a new release

New stuff that hasn’t been released yet

Some of these projects may still be marked private, in which case the links won’t work yet, but they are all planned for release so that should change eventually.