Sonic Visualiser v3.2

Another release of Sonic Visualiser is out. This one, version 3.2, has some significant visible changes, in contrast to version 3.1 which was more behind-the-scenes.

The theme of this release could be said to be “oversampling” or “interpolation”.

Waveform interpolation

Ever since the Early Days, the waveform layer in Sonic Visualiser has had one major limitation: you can’t zoom in any closer (horizontally) than one pixel per sample. Here’s what that looks like — this is the closest zoom available in v3.1 or earlier:

Screenshot from 2018-12-20 09-23-39

This isn’t such a big deal with a lower-resolution display, since you don’t usually want to interact with individual samples anyway (you can’t edit waveforms in Sonic Visualiser). It’s a bigger problem with hi-dpi and “retina” displays, on which individual pixels can’t always be made out.

Why this limitation? It allowed an integer ratio between samples and pixels to be used internally, which made it a bit easier to avoid rounding errors. It also sidestepped any awkward decisions about how, or whether, to show a signal in between the sample points.

(In a waveform editor like Audacity it is necessary to be able to interact with individual samples, so some decision has to be made about what to show between the sample points when zoomed in closely. Older versions of Audacity connected the sample points with straight lines, a decision which attracted criticism as misrepresenting how sampling works. More recent versions show sample points on separate stems without connecting lines.)

In Sonic Visualiser v3.2 it’s now possible to zoom closer than one pixel per sample, and we show the signal oversampled between the sample points using sinc interpolation. Here’s an example from the documentation, showing the case where the sample values are all zero but for a single sample with value 1:

The sample points are the little square dots, and the wiggly line passing through them is the interpolated signal. (The horizontal line is just the x axis.) The principle here is that, although there are infinitely many ways to join the dots, there is only one that is “smooth” enough to be expressible as a sum of sinusoids of no higher frequency than half the sampling rate — which is the prerequisite for reconstructing a signal sampled without aliasing. That’s what is shown here.

The above artificial example has a nice shape, but in most cases with real music the interpolated signal will not be very different from just joining the dots with a marker. It’s mostly relevant in extreme cases. Let’s replace the single sample of value 1 above with a pair of consecutive samples of value 0.5:

Screenshot from 2018-12-19 20-31-48

Now we see that the interpolated signal has a peak between the two samples with a greater level than either sample. The peak sample value is not a safe indication of the peak level of the analogue signal.

Incidentally, another new feature in v3.2 is the ability to import audio data from a CSV or similar data file rather than only from standard audio formats. That made it much easier to set up the examples above.

Spectrogram and spectrum oversampling

The other oversampling-related feature added in v3.2 appears in the spectrogram and spectrum layers. These layers now have an option to set an oversampling level, from the default “1x” up to “8x”.

This option increases the length of the short-time Fourier transform used to generate the spectrum, by padding the time-domain signal window with additional zero-valued samples before calculating the transform. This results in an oversampled frequency-domain output, with a higher visual resolution than would have been obtained from the original, un-zero-padded sample window. The result is a smoother spectrum in which the locations of peaks can be seen with a little more accuracy, somewhat like the waveform example above.

This is nice in principle, but it can be deceiving.

In the case of waveform oversampling, there can be only one “matching” signal, given the sample points we have and the constraints of the sampling theorem. So we can oversample as much as we like, and all that happens is that we approximate the analogue signal more closely.

But in a short-time spectrum or spectrogram, we only use a small window of the original signal for each spectrum or spectrogram-column calculation. There is a tradeoff in the choice of window size (a longer window gives better frequency discrimination at the expense of time discrimination) but the window always exposes only a small part of the original signal, unless that signal is extremely short. Zero-padding and using a longer transform oversamples the output to make it smoother, but it obviously uses no extra information to do it — it still has no access to samples that were not in the original window. A higher-resolution output without any more information at the input can appear more effective at discriminating between frequencies than it really is.

Here’s an example. The signal consists of a mixture of two sine waves one tone apart (440 and 493.9 Hz). A log-log spectrum (i.e. log frequency on x axis, log magnitude on y) with an 8192-point short-time Fourier transform looks like this:

Screenshot from 2018-12-19 21-25-02

A log-log spectrum with a 1024-point STFT looks like this1:

Screenshot from 2018-12-19 21-25-26

The 1024-sample input isn’t long enough to discriminate between the two frequencies — they’re close enough that it’s necessary to “hear” a longer fragment than this in order to determine that there are two frequencies at all2.

Add 8x oversampling to that last example, and it looks like this:

Screenshot from 2018-12-19 21-26-04

This is very smooth and looks super detailed, and indeed we can use it to read the peak value with more accuracy — but the peak is deceptive, because it is still merging the two frequency components. In fact most of the detail here consists of the frequency response of the 1024-point windowing function used to shape the time-domain window (it’s a Hann window in this case).

Also, in the case of peak frequencies, Sonic Visualiser might already provide a way to get the same information more accurately — its peak-frequency identification in both spectrum and spectrogram views uses phase unwrapping instead of spectrum interpolation to estimate the frequencies of stable harmonics, which gives very good results if the sound is indeed harmonic and stable.

Finally, there’s a limitation in Sonic Visualiser’s implementation of this oversampling feature that eliminates one potential use for it, which is to choose the length of the Fourier transform in order to align bin frequencies with known or expected frequency components of the signal. We can’t generally do that here, since Sonic Visualiser still only supports a few fixed multiples of a power-of-two window size.

In conclusion: interesting if you know what you’re looking at, but use with caution.


1 Notice that we are connecting sample points in the spectrum with straight lines here — the same thing I characterised as a bad idea in the discussion of waveforms above. I think this is more forgivable here because the short-time transform output is not a sampled version of an original signal spectrum, but it’s still a bit icky

2 This is not exactly true, but it works for this example

Notes from the Audio Developer Conference

I’ve spent the last couple of days at the 2017 Audio Developer Conference organised by ROLI. This is a get-together and technical conference for people who work on audio software and software-driven-hardware, in practice mostly people working on music applications.

I don’t go to many conferences these days, despite working in academia. I don’t co-write many papers and I’m no longer funded by a project with a conference budget. I’ve been to a couple that we hosted ourselves at the Centre for Digital Music, but I think the last one I went to anywhere else was the 2014 Linux Audio Conference in Karlsruhe. I don’t mind this situation (I don’t like to travel away from my family anyway), I just mention it to give context for why a long-time academic employee like me should bother to write up a conference at all!

DSC_0909.JPG

Here are my notes — on things I liked and things I didn’t — in roughly chronological order.

The venue is interesting, quite fancy, and completely new to me. (It is called CodeNode.) I’m a bit boggled that there is such a big space right in the middle of the City given over to developer events. I probably shouldn’t be boggling at that any more, but I can’t help it.
Nice furniture too.

The attendees are amazingly homogeneous. I probably wouldn’t have even noticed this, back when I was tangentially involved in the commercial audio development world, as I was part of the homogeneity. But our research group is a fair bit more diverse and I’m a bit more perceptive now. From the attendance of this event, you would conclude that 98% of audio developers are male and 90% are white people from northern Europe.
When I have been involved in organising events in academia, we have found it hard to get a speaker lineup that is as diverse as the population of potential attendees (i.e. the classic all-male panel problem). I have failed badly at this, even when trying hard — I am definitely part of the problem when it comes to conference organisation. Here, though, my perception is the other way around: the speakers are a closer reflection of what I perceive as the actual population than the attendees are.

Talks I went to:

Day 2 (i.e. the first day of the talks):

  • The future is wide: SIMD, vector classes and branchless algorithms for audio synthesis by Angus Hewlett of FXpansion (now employed by ROLI). A topic I’m interested in and he has clearly done solid work on (see here), but it quickly reached the realms of tweaks I personally am probably never going to need. The most heartening lesson I learned was that compilers are getting better and better at auto-vectorisation.
  • Exploring time-frequency space with the Gaborator by Andreas Gustafsson. I loved this. It was about computing short-time constant-Q transforms of music audio and presenting the results in an interactive way. This is well-trodden territory: I have worked on more than one implementation of a constant-Q transform myself, and on visualising the results. But I really appreciated his dedication to optimising the transform (which appears to be quicker and more invertible than my best implementation) and his imagination in rendering it (reusing the Leaflet mapping API to display time-frequency “maps”). There is a demo of this here and I like it a lot.
    So I was sitting there thinking “yes! nice work!”, but when it came to the questions, it was apparent that people didn’t really get how nice it was. I wanted to pretend to ask a question, just in order to say “I like it!”. But I didn’t, and then I never managed to work up to introducing myself to Andreas afterwards. I feel bad and I wish I had.
  • The development of Ableton Live by Friedemann Schautz. This talk could only disappoint, after its title. But I had to attend anyway. It was a broad review of observations from the development of Live 10, and although I didn’t learn much, I did like Friedemann and thought I would happily work on a team he was running.
  • The amazing usefulness of band-limited impulse trains by Stefan Stenzel of Waldorf. This was a nice old-school piece. Who can resist an impulse train? Not I.
  • Some interesting phenomena in nonlinear oscillators by André Bergner of Native Instruments. André is a compelling speaker who uses hand-drawn slides (I approve) and this was a neat mathematical talk, though I wasn’t able to stay to the end of it.

Day 3 (second and final day of talks):

  • The human in the musical loop (keynote) by Elaine Chew. Elaine is a professor in my group and I know some of her work quite well, but her keynote was exactly what I needed at this time, first thing in the morning on the second day. After a day of bits-driven talks, this was a piece about performers and listeners from someone who is technologically adept herself, and curious, but talks about music first. Elaine is also very calm, which was useful when the projector hardware gave up during her talk and stopped working for a good few minutes. I think as a result she had to hurry the closing topic (about the heartbeat project) which was a pity, as it could have been fascinating to have expanded on this a bit more.
    Some of what Elaine talked about was more than a decade old, and I think this is one of the purposes of professors: to recall, and to be able to communicate, relevant stuff that happened longer ago than any current research student remembers.
  • The new C++17, and why it is good for you by Timur Doumler. The polar opposite of Elaine’s talk, but I was now well-cushioned for it. C++17 continues down the road of simplifying the “modern-language” capabilities C++ has been acquiring since C++11. Most memorable for me are destructuring bind, guaranteed copy elision on value return, variant types, and filesystem support in the standard library.
    Destructuring bind is interesting and I’ve written about it separately.
  • The use of std::variant in realtime DSP by Ian Hobson. A 50-minute slot, for a talk about which Timur Doumler’s earlier talk had already given away the twist! (Yes you can use std::variant, it doesn’t do any heap allocation.) Ambitious. This was a most satisfying talk anyway, as it was all about performance measurements and other very concrete stuff. No mention of the Expression Problem though.
  • Reactive Extensions (Rx) in JUCE by Martin Finke. I have never used either React or JUCE so I thought this would be perfect for me. I had a question lined up: “What is JUCE?” but I didn’t dare use it. The talk was perfectly comprehensible and quite enlightening though, so my silly bit of attitude was quite misplaced. I may even end up using some of what I learned in it.

 

Sonic Visualiser 3.0, at last

Finally!

(See previous posts: Help test the Sonic Visualiser v3.0 beta, A second beta of Sonic Visualiser v3.0, A third beta of Sonic Visualiser v3.0, and Yes, there’s a fourth beta of Sonic Visualiser v3.0 now)

No doubt, now that the official release is out, some horrible problem or other will come to light. It wouldn’t be the first time: Sonic Visualiser v2.4 went through a beta programme before release and still had to be replaced with v2.4.1 after only a week. These things happen and that’s OK, but for now I’m feeling good about this one.

 

Yes, there’s a fourth beta of Sonic Visualiser v3.0 now

Previously I wrote about the third Sonic Visualiser v3.0 beta release:

“This may well be the final beta, so if you’re seeing problems with it, please do report them while there’s still time!”

Well some very kind people did report problems, and so that was not the final beta. A fourth one is now up for download. Here are the download URLs:

Fixes since the third beta

  • Fix a nasty crash in session I/O in the 64-bit Windows build (this is the main reason for the new beta)
  • Provide more log information about audio drivers to the debug log file
  • Fix a very occasional one-sample-too-short error in resampling audio files during load
  • Fix invisible measure tool crosshairs on spectrogram
  • Fix a possible memory leak in the spectrogram

Keep the bug reports coming!

This one really could be the final beta! So please do report any troubles you have with it. Drop me a line, post a comment below this article, or use the SourceForge bug tracker. And thank you!

 

A third beta of Sonic Visualiser v3.0

Update – 23rd Feb: We have a fourth beta now!

After a short break, we have a third beta of the forthcoming v3.0 release of Sonic Visualiser. Downloads here:

Bugs fixed, and other changes made since the second beta

  • Sonic Visualiser could hang when trying to initialise a transform that refused the first choice of initialisation parameters
  • Error handling for problems in running transforms has been improved in general
  • The Colour 3D Plot layer was sometimes pathologically slow to update
  • The “Normalise Visible Area” option in the Colour 3D Plot layer wasn’t working
  • The visual rendering style of some layers has been improved when viewed on high-resolution screens without pixel doubling
  • A new feature has snuck in, under cover of fixing a rendering offset problem in the spectrum layer: it is now possible (although cumbersome) to zoom the spectrum layer in the frequency axis
  • The process of overhauling the Help Reference documentation to properly describe the new release has begun

Let us know what else you find!

This may well be the final beta, so if you’re seeing problems with it, please do report them while there’s still time!

Drop me a line, post a comment below this article, or use the SourceForge bug tracker.

(This post is a follow-up to “Help test the Sonic Visualiser v3.0 beta” and “A second beta of Sonic Visualiser v3.0“.)

A second beta of Sonic Visualiser v3.0

Update – 9th Feb: There is now a third beta! See here for details.

Here’s a second beta release of Sonic Visualiser v3.0:

Bugs found in the first beta and fixed for the second

  • The peak-frequency spectrogram rendered the entire track into the first 1/8th of its length, and showed nothing after that. (The cause of this might make a marginally interesting technical post in its own right)
  • A similar effect was exhibited by Colour 3D Plot layers, but only at very close zoom levels
  • When the Windows build had been used to view an mp3 file, it would subsequently crash on exit
  • All platforms could hang on startup if certain plugins were installed (the Fan Chirp plugin from the Universidad de la República in Uruguay was one example, though it wasn’t the fault of the plugin)
  • The playback/record level meters were very flickery
  • The source package didn’t build on Fedora Linux

What other problems have you spotted?

Let us know! Drop me a line, post a comment below this article, or use the SourceForge bug tracker.

(This post is a follow-up to “Help test the Sonic Visualiser v3.0 beta“)