Sonic Visualiser 3.0, at last


(See previous posts: Help test the Sonic Visualiser v3.0 beta, A second beta of Sonic Visualiser v3.0, A third beta of Sonic Visualiser v3.0, and Yes, there’s a fourth beta of Sonic Visualiser v3.0 now)

No doubt, now that the official release is out, some horrible problem or other will come to light. It wouldn’t be the first time: Sonic Visualiser v2.4 went through a beta programme before release and still had to be replaced with v2.4.1 after only a week. These things happen and that’s OK, but for now I’m feeling good about this one.


A third beta of Sonic Visualiser v3.0

Update – 23rd Feb: We have a fourth beta now!

After a short break, we have a third beta of the forthcoming v3.0 release of Sonic Visualiser. Downloads here:

Bugs fixed, and other changes made since the second beta

  • Sonic Visualiser could hang when trying to initialise a transform that refused the first choice of initialisation parameters
  • Error handling for problems in running transforms has been improved in general
  • The Colour 3D Plot layer was sometimes pathologically slow to update
  • The “Normalise Visible Area” option in the Colour 3D Plot layer wasn’t working
  • The visual rendering style of some layers has been improved when viewed on high-resolution screens without pixel doubling
  • A new feature has snuck in, under cover of fixing a rendering offset problem in the spectrum layer: it is now possible (although cumbersome) to zoom the spectrum layer in the frequency axis
  • The process of overhauling the Help Reference documentation to properly describe the new release has begun

Let us know what else you find!

This may well be the final beta, so if you’re seeing problems with it, please do report them while there’s still time!

Drop me a line, post a comment below this article, or use the SourceForge bug tracker.

(This post is a follow-up to “Help test the Sonic Visualiser v3.0 beta” and “A second beta of Sonic Visualiser v3.0“.)

A second beta of Sonic Visualiser v3.0

Update – 9th Feb: There is now a third beta! See here for details.

Here’s a second beta release of Sonic Visualiser v3.0:

Bugs found in the first beta and fixed for the second

  • The peak-frequency spectrogram rendered the entire track into the first 1/8th of its length, and showed nothing after that. (The cause of this might make a marginally interesting technical post in its own right)
  • A similar effect was exhibited by Colour 3D Plot layers, but only at very close zoom levels
  • When the Windows build had been used to view an mp3 file, it would subsequently crash on exit
  • All platforms could hang on startup if certain plugins were installed (the Fan Chirp plugin from the Universidad de la República in Uruguay was one example, though it wasn’t the fault of the plugin)
  • The playback/record level meters were very flickery
  • The source package didn’t build on Fedora Linux

What other problems have you spotted?

Let us know! Drop me a line, post a comment below this article, or use the SourceForge bug tracker.

(This post is a follow-up to “Help test the Sonic Visualiser v3.0 beta“)

Help test the Sonic Visualiser v3.0 beta

A first beta release of Sonic Visualiser v3.0 is now available for download, and we’d love to get your feedback.

Sonic Visualiser v3.0beta1 on Windows

Sonic Visualiser is a free, open-source desktop application for close study and annotation of music audio recordings, developed in the Centre for Digital Music at Queen Mary, University of London. It’s been available for about a decade now, and v3.0 will be one of the most substantial updates it’s ever had. This should be a really good release, but we need to hear about the problems other people have with the beta versions before we can be sure of that.

Get it here

Update – 17th Jan: These are not the latest links any more: there is now a second beta! See here for details.

The first beta can be downloaded from the Sound Software code site:

There will be Linux binaries as well, but I’m still working on packaging for those. Watch this space. (Update: there is now an Ubuntu package linked above. I’d like to be making more options available, not least because I don’t actually use Ubuntu myself, but this is a start.)

Note that the beta pops up a dialog each time you run it to remind you that it’s a beta. Sorry about that, I know it might be annoying.

What’s changed

Here’s the list of changes since the last release.

Besides some new features and a lot of bug fixes, there are a few interesting internal changes:

  • Everything to do with sample indexing now uses 64-bit offsets, and it’s possible to load very long audio files that wouldn’t have worked in the previous release
  • Audio analysis plugins are now run with process separation so a misbehaving plugin should no longer be able to crash the host
  • It’s now possible to record audio as well as play it, and to select the record and playback devices in the preferences
  • The user interface now adapts fully to hi-dpi (“retina”) displays on all three platforms
  • For the first time the Windows version is natively 64-bit (if your Windows installation is, and almost all Windows installations are nowadays) — while still being able to use any 32-bit Vamp plugins you have installed

I’m quite excited about this release, so now I need to hear all your deflating reports about the things that aren’t working!

What we particularly need feedback on

  • Problems installing or running the application at all!
  • Problems running plugins that worked with a previous version
  • Problems playing or recording audio, glitches, error dialogs with complaints about audio drivers
  • Any crashes or other error dialogs
  • Any unexpectedly slow performance while showing analyses or running plugins

Note for Linux users

I mentioned above that I’m still working on packaging for Linux. That process also includes overhauling the INSTALL-file instructions, which are not quite up-to-date. If you look at the series of commands carried out in the Docker script at deploy/linux/docker/Dockerfile.ubuntu64 in the source tree, you’ll get an idea of what needs to be done to compile as things stand.

How to report problems

Use the venerable SourceForge bug tracker, or for quick reports you could just post a comment below, send me an email, tweet at me, etc.

For any problems that arise when using a specific file (audio or annotation), it’s massively helpful if you can attach an example file that exhibits the problem. In general, listing any steps to take to reproduce a bug (even if it seems to you that the bug must be so obvious that nobody could ever have missed it) is very useful indeed.

If you run into something and you’re not sure whether it’s a bug or you’re just being stupid, please do report it anyway. A program that makes you feel stupid is already wrong on some level, though I’m all too aware that Sonic Visualiser can do that sometimes because it is a bit overcomplicated in places.

Things we haven’t done yet

We had hoped to devise an easier way to obtain and install plugins in time for this release, and recent survey feedback suggested this would be a very welcome thing for many prospective users. Sadly we haven’t been able to do anything in that area yet, but I hope we may be able to soon.

Mp3 decoding with the MAD library: We’ve all been doing it wrong

The MAD mp3 decoder library is widely used in open source applications that play or edit mp3 audio files.

It’s a respected library that consists of high quality C code, has a fairly friendly API, and was evidently written with great care. It’s now getting old (last updated in 2004) but people trust it.

I discovered this week that I’ve been using this library wrong for many years in a couple of small ways. I checked the code of a few other open source applications that use it, and found that all of them (including widely-used programs like Audacity) suffered at least one of the same problems as mine did. We’ve all been doing it wrong.

Here’s what almost every user of this library seems to be doing wrong:

  1. If an mp3 file starts with a Xing/LAME information frame, they are feeding that frame to the mp3 decoder rather than filtering it out, resulting in an unnecessary 1152 samples of silence at the start of the decoded audio. (This is in addition to the variable mp3 encoder delay, and note that the metadata frame is not the same thing as an id3 tag — those are not actually mp3 frames and so don’t have the same problem.)
  2. More importantly, they aren’t providing the decoder an expected but undocumented small block of zero data at the end of the file. Without this, it loses synchronisation on the last mp3 frame, which is consequently never decoded. This causes the decoded audio to be truncated by up to 1152 samples.

Here’s an example audio file you can use to check an application: (audio file link). This file contains two very short bursts of noise, one right at the start of the file and the other at the end, separated by a second and a half or so of silence.

After decoding with MAD, the first burst should start around 0.025 seconds in, and the second should finish just before the end of the decoded audio.

If you load this in an application that uses MAD and find the first burst starts around 0.05 sec, then you have the first of the above problems. If only one of the two bursts is there, or the second is shorter than the first, then you have the second.

My own Sonic Visualiser v2.5 suffers from both problems:


But both are fixed in the repository, and will be fixed in the forthcoming release:


(If both bursts are there and they appear exactly at the start and end of the file without any padding silence at all, then your decoder not only handles these details correctly but also interprets the LAME information frame and accounts for the encoder delay and padding listed in there. Sonic Visualiser doesn’t do that even after this fix, but that could change!)

I’ve also started feeding some fixes to a few other projects (e.g. this pull request for the more serious of those problems in Audacity).

The root of the problem I think is that MAD is an mp3 stream decoder and not an mp3 file decoder. These two things are almost the same, as an mp3 file is just a sequence of stream frames with no file header: if you concatenate two mp3 files you get a valid mp3 file containing the concatenation of the two audio streams. But the fact that MAD doesn’t deal with files means that it doesn’t know when a file has ended, and it doesn’t know about file metadata frames, and these turn out to be things you have to handle in the calling code.

Users of the library maybe don’t realise this because the documentation is quite limited. Developers are pointed to an example program (called minimad) which itself fails to deal with either of these things. There is an official program called madplay that handles both of them properly and could serve as an example, but people don’t seem to be all that conscious of it — it isn’t widely packaged for Linux distributions for example, and until this week I had never looked at its source code.

There ought to be lessons here for both library users and library authors, but I’m not completely sure what those lessons are.

Library users should be testing their import code by comparison with expected decoded data, but I was actually already doing that and I still missed both problems. (I allowed for the mp3 encoder delay by accommodating any amount of leading silence in my tests, so I missed that there was more than there should be; and I foolishly checked whether the decoded data matched the expected data throughout its extent rather than the other way around, so missing that it had been truncated.)

This is probably also a case for using higher-level libraries like CoreAudio (or gstreamer, except that I think gstreamer also gets this wrong in its MAD plugin). Using format-specific open source libraries gives you consistent portability across platforms from a single codebase, but that doesn’t help much if you are deceived by the differences between different format libraries and end up not using them correctly.

For library authors the lesson really seems to be that people will copy the code you give them expecting it to be a complete example for the most obvious use case. If the two don’t match, there’ll be trouble.

I’d be interested to hear about any examples of open source software that get the MAD decoder right.

MIREX 2016 submissions

This year, for the fourth year in a row, we submitted a number of Vamp audio analysis plugins published by the Centre for Digital Music to the annual MIREX evaluation. The motivation is to give other methods a baseline to compare against, to compare one year’s evaluation metrics and datasets against the next year’s, and to give our group a bit of visibility. See my posts about this process in 2015, 2014, and 2013.

Here’s a review of how we got on this year. We entered an extra category compared to last year, a makeshift entry in the audio downbeat estimation task, making this the widest range of categories we’ve covered with these plugins in MIREX so far.

Structural Segmentation

Results for the four datasets are here, here, here, and here. I don’t find the evaluations any easier to follow than I did last year, but I can see that both of our submissions (Segmentino from Matthias Mauch and the older QM Segmenter from Mark Levy) produced the same results as expected from previous years.

Segmentino actually comes across well in this year’s results, not least because the authors of last year’s best method (Thomas Grill and Jan Schlüter) didn’t submit anything this time.

Multiple Fundamental Frequency Estimation and Tracking

Results here and here. Our Silvet plugin performed much as before: reasonably well, though as usual in such a hard task, with hugely varying results from one test case to another.

Audio Onset Detection

Results here. Many more submissions than last year, which was already a broader field
than the year before. Our two old plugins score the same as they did last year, but are no longer placed last, as three of the new submissions have lower scores.

Audio Beat Tracking

Results here, here, and here. Our BeatRoot and QM Tempo Tracker are once again placed near the back. There’s little change from last year at the top, still occupied by the work of Sebastian Böck and Florian Krebs — work which the authors have, to their great credit, made available as freely-licensed, readable, and well-documented Python code in the madmom library.

Audio Tempo Estimation

Results here. Only two entries this year, our QM Tempo Tracker and Sebastian Böck’s entry from the aforementioned madmom.

Audio Downbeat Estimation

Results here. In this category we submitted the QM Bar and Beat Tracker plugin by Matthew Davies, which has been around for a few years; it’s based on the QM Tempo Tracker with an additional downbeat estimator.

The results don’t come across very well, for varying reasons according to the dataset. The QM Bar and Beat Tracker needs to be prompted with the time signature and (following a last-minute decision to enter the category this year) I submitted a script which assumed fixed 4/4 time. This meant we knowingly threw away the Ballroom category, which was all 3/4, but the plugin was also ill-suited to several of the other categories. Not a strong submission then, but interesting to see.

Audio Key Detection

Results here and here. Last year I lamented the lack of any other entries than ours, since the category had just gained a second (and more realistic) test dataset. So I’m delighted to see a couple of new submissions this year, including one from Gilberto Bernardes and Matthew Davies at INESC in Porto which appears to perform well.

Audio Chord Estimation

Results here, now up to five test datasets. Last year saw a torrid time with a bug in the Chordino plugin, but this year it’s back to normal. Chordino still performs well, but in a strong category this year it’s no longer one of the top performers.