Programs for Music

Performance improvements in Rubber Band Library

Today marks version 3.1 of the audio time-stretching and pitch-shifting library Rubber Band. This release focuses primarily on performance improvements.

In version 3.0 we introduced a totally new, higher-quality processing engine, which I’ll refer to as the R3 engine. The older one is still included, and I’ll call that R2.

Although the output of R3 typically sounds much better than R2, it uses a lot more CPU power to run. Measuring sustained throughput in frames-per-second for common fixed stretch factors, we find R2 to be typically about three times as fast as R3. Both are eminently usable in real-time on hardware from the last decade, but the headroom available for R2 can make a big difference.

It would be nice to do better, but the R3 code was already quite heavily optimised before release — it is simply a fairly CPU-intensive method. Still, as it turns out, there are a few things we can do.

Measuring performance

Sustained throughput is not the only measure. Rubber Band is often used in real-time situations where the worst-case time per processed block is what matters most.

To measure this, I set up a test case that simulates a typical sound processing callback, passing a music recording through a stretcher and emitting a fixed 512 sample frames from each processing cycle, while varying the time and pitch ratios and measuring how long each cycle takes to return. The stretcher is initialised with typical parameters for this activity (in code terms, OptionProcessRealTime | OptionPitchHighConsistency | OptionFormantPreserved) and it is primed with an initial pad before entering the cycle loop, as otherwise the first call would dominate results.

The results for R2 and R3, as of the 3.0 release, look like this:This is a graph of processing cycle count (x-axis) against time taken per 512-frame cycle (y-axis).  The y-axis is linear in time with zero at the bottom, so lower is better. No units are shown because they are totally system-dependent — this is purely a comparative visualisation, we’re only interested in the relative heights. Obviously the relative heights may also vary from system to system, so this is still quite tentative.

The test runs in four consecutive phases with different pitch and time modifications, and so the x-axis is divided into four (uneven) quadrants: raising pitch, lowering pitch, slowing down, and speeding up.

In the first quadrant, the pitch rises smoothly and then falls again, reaching a peak at two octaves up; in the second it falls smoothly and then rises again, reaching a trough at two octaves down; in the third the pitch is unchanged but the tempo slows to just under a third of the original speed and then returns to normal; and in the fourth quadrant the tempo gradually speeds up to 8x the original speed and then returns to normal.

The plots for R2 (orange) and R3 (purple) reveal significant differences in behaviour:

  • R2 is usually faster, sometimes much faster, especially for modest stretch factors.
  • R3’s long internal processing buffers and step size mean that it hops between “modes” depending on how many processing increments (1, 2, 3, 4 or occasionally 0) are required for each output block.
  • R2 has less widely-spaced distinct “modes”, because it uses smaller increments. It’s still faster because it does so much less work for each increment.
  • R2’s processing time becomes very variable, and relatively high, when speeding up the audio by a large factor (above about 3x). This may be because it continues to perform transient detection and adjust its input and output steps accordingly, and at those rates our test file contains a lot of transients. R3 is very predictable in this area by comparison.
  • Both stretchers use increasingly more CPU when pitch-shifting further upward, but not when shifting down.

The last point happens because we are using OptionPitchHighConsistency. This option ensures that the resampler used for the pitch-shift part of the operation is always engaged, so that there are no discontinuities when changing ratio (particularly to or from the 1x ratio). We’ll come back that later.

A Draft Mode for Finer Mode

The main novelty in version 3.1 is an option to deactivate R3’s multi-window processing system, dropping down to a single shorter processing window and potentially running much faster, while retaining its more advanced signal analysis and some of its output characteristics.

This is enabled using the OptionWindowShort flag when constructing a stretcher, or the --window-short argument to the command-line tool. It’s an option that already existed in R2, and conceptually it does something similar there, but the effect on performance is much greater with R3.

Here’s a plot comparing R2, R3, and the new R3 single window option (“R3short”):

With this new option we get both performance comparable to R2 and the more predictable behaviour at high tempo ratios found in R3. Splendid.

What does it sound like? Not as good as R3; it loses some percussive clarity and quite a lot of low-end stability. For some material, particularly acoustic instruments and vocals without too much bass content, it can sounds markedly better than R2. It’s not a universal substitute, but it’s really not bad given the CPU budget.

Here are some ten-second audio clips to give you an idea. Both are stretched to 140% of their original duration using R2, R3 with short window, and full R3. Neither of these is trivial to handle, though the second is far harder than the first.

Resamplers and FFTs

Rubber Band makes heavy use of audio resampler and fast Fourier transform (FFT) implementations. Originally it used external libraries for both, but in June 2021 a built-in FFT was added and in October 2021 a built-in resampler appeared as well.

These are both slower than the best external libraries, but they make Rubber Band simpler to build and integrate. And the built-in resampler is also designed to reduce clicky artifacts and maintain tempo integrity on ratio changes, at some further expense in performance, so if you do have the headroom it is worth defaulting to.

Here’s a performance comparison of the built-in resampler with libsamplerate in the “draft” short-window R3 mode described above.

Clearly libsamplerate is both faster and more predictable. It’s faster even when changing only the tempo, which doesn’t involve resampling, because of our previously-mentioned use of OptionPitchHighConsistency which keeps the resampler running at all ratios.

(Incidentally all of the other performance plots in this post were made using libsamplerate, unless otherwise specified. Its smoother performance profile makes other comparisons easier.)

I’ve mentioned OptionPitchHighConsistency a couple of times now. If we use OptionPitchHighSpeed instead, we get quite different behaviour:

The relation between the amount of pitch shift and the CPU effort is totally gone. All pitch shifts are roughly equal, and the time-stretching quadrants are faster. The tradeoff, unfortunately, is that there are now audible discontinuities every time the pitch ratio reaches or crosses 1.0.

Traditionally the alternative to libsamplerate in Rubber Band has been a resampler implementation cribbed from the Speex audio codec and provided with Rubber Band as a compile-time option. This resampler was a bit unsatisfactory for various reasons, but a much improved version of it has for a while been available in a library called speexdsp.

As of v3.1 Rubber Band now includes support for speexdsp as well, and it works well — audio quality seems good, and so is performance on my test hardware, shown here against libsamplerate:

I don’t think this is well-exercised enough to be a standard recommendation yet, but it’s promising.

The built-in FFT fares better than the resampler, but in addition to the previously-supported external libraries (FFTW, IPP, and Apple’s vDSP) this release also adds support for FFTs from SLEEF, a library which looks as if it should be competitive on platforms that have been short on good options in the past.

To summarise:

  • The R3 time-stretcher and pitch-shifter engine introduced in Rubber Band 3.0 sounds great, but is relatively CPU-intensive compared to the older R2
  • The new 3.1 release introduces a draft mode (“short-window” or single window mode) for the R3 engine, that retains some of its good qualities while running much faster and with more predictable CPU usage
  • You may be able to speed up your implementation by using an external resampler or FFT library, and the 3.1 release adds support for a couple of new ones with good performance.

See the Rubber Band Library site for more information about the library.

Thank you for your time. Perhaps we can help you make more of it.

* * *

Many thanks to Davy Wentzler for valuable feedback on the 3.1 development process.

 

Academics · Programs for Music

Note on “Explorations in Time-Frequency Analysis” by Patrick Flandrin

Patrick Flandrin is a physicist and signal-processing researcher whose name I first encountered as co-author (with François Auger) of a 1995 IEEE Transactions on Signal Processing paper called “Improving the Readability of Time-Frequency and Time-Scale Representations by the Reassignment Method”.

This crunchy publication (21 pages, dozens of equations and figures) took a pleasing idea — replacing the familiar grid-format time-frequency spectrogram with a field of precisely localised points calculated using both magnitude and phase of the frequency bins, rather than only magnitude as a traditional spectrogram does — and set out the mathematics of applying it to a number of different time-frequency and time-scale representations.

Illustration from Auger & Flandrin (1995)

I read this paper about 15 years ago and didn’t understand it. I have since realised this is partly because it isn’t all that clear with its notation, but there is also a big gap between the naive programmer’s view (that’s mine) of a spectrogram and the mathematical analysis used in the paper.

To explain. For a programmer, a spectrogram comes from taking short overlapping slices of a sampled signal, multiplying each by a smoothing window shape, applying a short-time Fourier transform, and taking the magnitudes of the complex output bins to get one column of the spectrogram per slice of input. The short slices are because you want a fixed, smallish number of output bins, and you have various tradeoffs — time and frequency resolution and computational efficiency — to consider in that. The smoothing window is because your Fourier transform — a thing which matches up sinusoids of different frequencies against a signal to identify which ones would add up to it — operates on an infinite signal, consisting of the input you give it repeated forever in both directions: this will have a discontinuity each time it wraps around, and the smoothing window removes some of the frequency artifacts from these discontinuities. There is nothing particularly mathematical about the implementation of this, and any intuition used by the programmer is a mixture of the visual and techniques from the world of engineering. The language used in a publication like the DAFx book is typical in this world.

The Auger & Flandrin paper instead comes from a world that summarises a spectrogram as a two-dimensional Wigner-Ville distribution filtered with a smoothing window leading to a time-frequency representation of the Cohen’s class. Signals are finite-energy functions over infinite domains, and a spectrogram is a double integral over time and angular frequency. Both time-domain functions and time-frequency representations are continuous, and practical questions about overlap and window length don’t arise. I can dimly remember this world, because my undergraduate degree — who am I kidding, my only degree — started out as pure maths, but I haven’t inhabited it for any of my working life.

So I didn’t really understand the paper, and a programmer has plenty to do, and that is one reason why Sonic Visualiser’s “Peak-Frequency Spectrogram” layer calculates instantaneous frequencies from the phase difference between successive columns, something which I found much easier to understand. (It turns out there are other good reasons one could make this choice, but I didn’t know that.1)

Returning to the paper recently, I learned that Flandrin had written a book on the subject, and I bought a copy hoping it might bridge the conceptual gap. It turned out to be a good experience.

* * *

“Explorations in Time-Frequency Analysis” is a monograph digressing on things the author has found interesting in the past 30 years, which — what luck! — happen to be about time-frequency analysis. It’s short, about 200 pages, and nicely printed. There are lots of diagrams, and although equation-heavy it doesn’t hang about proving things, sending you to the references instead. It begins with a glossary of notation (I like it when books do this) and ends with a 9-page bibliography. The writing is crisp and friendly and the scene is set by the first two chapters, a philosophical outline and a chapter of examples with the lovely title “Small Data Are Beautiful”.

Although the book provides a lot of the background to the paper that defeated me, I still spent a potentially embarrassing amount of thought on things I imagine that anyone properly within the target market finds obvious. An example is what it means for a Gaussian function to be “circular” in time and frequency. The book goes over this in far more detail, but briefly a Gaussian — the bell-shaped normal distribution curve found in probability — has the property that its Fourier transform is also a Gaussian. The “wider” the bell shape in the time domain, the “narrower” in the frequency domain: at some point it must be equal in both, and then if you plot it in a spectrogram-like heat map you will see a circle. When does this happen? It’s shown that it happens for the Gaussian corresponding to a normal distribution of variance 1. But at this point I am worrying about units. What does it mean to be circular? The figures illustrating this lack units in either axis — in fact detail-wise many of the figures are more like sketches — and the little bit of engineer in me is wondering: how can you possibly have a circle if you lack units?

The answer I eventually recalled is that the units in one domain define those in the other. In this case, if the time axis is in seconds then angular frequency is radians per second, and a circle is a distribution whose extent in seconds is the same as that in radians per second. Other units such as samples (in time) or STFT bins (in frequency) have similar correspondences in the other domain. This is a place where going back to basics took significant thought, but I did actually appreciate being expected to think about it.

So a nice rehearsal with some interesting bumps, but for me the thrilling twist arrives in chapter 12, “Spectrogram Geometry 2”. This reframes the spectrogram as a complex plane and the reassignment operator in terms of motion in a potential field proportional to the log-spectrogram. This mathematical leap is also an intuitively visual one, and it’s exciting for me because it is a little like how I pictured the spectrogram, with no meaningful mathematical analysis, when developing a certain feature of the Rubber Band timestretcher.2 This chapter is like seeing the vaguely-realised ground beneath your feet resolve into a larger, recognisable object — the moment when you realise you are standing on the back of a giant Pokémon, if you will.

There is a lot more in this book, and I think it will repay repeated visits. I’m not sure whether you could implement anything directly from it, but you could, say, pick a random page and follow up all the references until you really feel you understand it. I think this would be a rewarding exercise that, for someone like me, would probably take around a month per page.

* * *

On that note, one of the first references given is to a book called “Visible Speech” by Potter, Kopp, and Green, 1947. I looked this up and was so intrigued that I tracked down an ex-library copy. It is a lavish presentation, perhaps with both training and PR elements, of a then-new idea called the “sound spectrograph”, i.e. a spectrogram. The title “Visible Speech”, incidentally, is borrowed with attribution from an earlier (1867) work about phonetic alphabets.

The authors of the 1947 book were writing about work done at Bell Labs to try to make the telephone accessible to the deaf. Their experimental devices used paper tape or phosphor display to show spectrographs of the speech sounds, and users were specially trained to interpret speech from them. Here’s a picture from the book of someone using one.

Operator sitting at a table in front of a large box with a tiny screen on itThe spectrographs were produced by automatically recording the speech to tape and playing the tape repeatedly through a filter of 300Hz bandwidth, whose centre frequency was incremented linearly between passes in 15Hz steps from 0-3500Hz. (They also had a version using 45Hz bandwidth filters, but it was found to be less legible.) The system was of course analogue.

In this image the top spectrograph is the one with 45Hz bandwidth, which is used to point out some interesting features, but the 300Hz bandwidth spectrograph below it is the form used throughout the rest of the book:

It’s striking how clear these spectrographs are, and it makes a useful reminder that we really aren’t always looking for the most precise representation of something — 300Hz bandwidth at speech frequencies is pretty wide! — but instead the most appropriate in some human dimension.

 


1 The Sonic Visualiser peak-frequency spectrogram precisely localises stable frequencies, but for each frequency bin it draws a short horizontal line across the whole duration of the bin at the proper frequency rather than localise the bin to a point in time. A very similar output could have been produced using reassignment, because the frequency calculated from phase difference should be very close to that calculated with reassignment. But a decision to do that would have meant ignoring the other reassignment operator, localisation in time, which gives a single point rather than a horizontal line for each bin. Had I understood the reassignment paper, I would probably have felt compelled to do that part properly. For it to work well, a greater bin overlap and much more sophisticated rendering would have been needed, and the result would have been much slower and possibly less clear for real music. I think.

2 This feature, which I gave the vague name “phase lamination”, was worked out in a hurry after discovering that the “phase locking” technique of Jean Laroche and Mark Dolson which I had used in the very first release of Rubber Band was patented. Phase locking reduced audible phasiness with the nice side-effect of making the phase vocoder faster to compute, but it also lent a robotic tang to the sound which certain listeners found even more unpleasant than the phasiness. The scheme I came up with to replace it was based on picturing a gradient field and making adjustments to bins near a peak or trough in proportion to the distance from it — tuned by ear rather than worked out mathematically. Although it lost the improved speed of phase locking, it usually sounds better. The idea seems reasonably obvious, but I hadn’t seen it described anywhere else and I was delighted to find it.

Programs for Music · Work

Sonic Visualiser v3.2

Another release of Sonic Visualiser is out. This one, version 3.2, has some significant visible changes, in contrast to version 3.1 which was more behind-the-scenes.

The theme of this release could be said to be “oversampling” or “interpolation”.

Waveform interpolation

Ever since the Early Days, the waveform layer in Sonic Visualiser has had one major limitation: you can’t zoom in any closer (horizontally) than one pixel per sample. Here’s what that looks like — this is the closest zoom available in v3.1 or earlier:

Screenshot from 2018-12-20 09-23-39

This isn’t such a big deal with a lower-resolution display, since you don’t usually want to interact with individual samples anyway (you can’t edit waveforms in Sonic Visualiser). It’s a bigger problem with hi-dpi and “retina” displays, on which individual pixels can’t always be made out.

Why this limitation? It allowed an integer ratio between samples and pixels to be used internally, which made it a bit easier to avoid rounding errors. It also sidestepped any awkward decisions about how, or whether, to show a signal in between the sample points.

(In a waveform editor like Audacity it is necessary to be able to interact with individual samples, so some decision has to be made about what to show between the sample points when zoomed in closely. Older versions of Audacity connected the sample points with straight lines, a decision which attracted criticism as misrepresenting how sampling works. More recent versions show sample points on separate stems without connecting lines.)

In Sonic Visualiser v3.2 it’s now possible to zoom closer than one pixel per sample, and we show the signal oversampled between the sample points using sinc interpolation. Here’s an example from the documentation, showing the case where the sample values are all zero but for a single sample with value 1:

The sample points are the little square dots, and the wiggly line passing through them is the interpolated signal. (The horizontal line is just the x axis.) The principle here is that, although there are infinitely many ways to join the dots, there is only one that is “smooth” enough to be expressible as a sum of sinusoids of no higher frequency than half the sampling rate — which is the prerequisite for reconstructing a signal sampled without aliasing. That’s what is shown here.

The above artificial example has a nice shape, but in most cases with real music the interpolated signal will not be very different from just joining the dots with a marker. It’s mostly relevant in extreme cases. Let’s replace the single sample of value 1 above with a pair of consecutive samples of value 0.5:

Screenshot from 2018-12-19 20-31-48

Now we see that the interpolated signal has a peak between the two samples with a greater level than either sample. The peak sample value is not a safe indication of the peak level of the analogue signal.

Incidentally, another new feature in v3.2 is the ability to import audio data from a CSV or similar data file rather than only from standard audio formats. That made it much easier to set up the examples above.

Spectrogram and spectrum oversampling

The other oversampling-related feature added in v3.2 appears in the spectrogram and spectrum layers. These layers now have an option to set an oversampling level, from the default “1x” up to “8x”.

This option increases the length of the short-time Fourier transform used to generate the spectrum, by padding the time-domain signal window with additional zero-valued samples before calculating the transform. This results in an oversampled frequency-domain output, with a higher visual resolution than would have been obtained from the original, un-zero-padded sample window. The result is a smoother spectrum in which the locations of peaks can be seen with a little more accuracy, somewhat like the waveform example above.

This is nice in principle, but it can be deceiving.

In the case of waveform oversampling, there can be only one “matching” signal, given the sample points we have and the constraints of the sampling theorem. So we can oversample as much as we like, and all that happens is that we approximate the analogue signal more closely.

But in a short-time spectrum or spectrogram, we only use a small window of the original signal for each spectrum or spectrogram-column calculation. There is a tradeoff in the choice of window size (a longer window gives better frequency discrimination at the expense of time discrimination) but the window always exposes only a small part of the original signal, unless that signal is extremely short. Zero-padding and using a longer transform oversamples the output to make it smoother, but it obviously uses no extra information to do it — it still has no access to samples that were not in the original window. A higher-resolution output without any more information at the input can appear more effective at discriminating between frequencies than it really is.

Here’s an example. The signal consists of a mixture of two sine waves one tone apart (440 and 493.9 Hz). A log-log spectrum (i.e. log frequency on x axis, log magnitude on y) with an 8192-point short-time Fourier transform looks like this:

Screenshot from 2018-12-19 21-25-02

A log-log spectrum with a 1024-point STFT looks like this1:

Screenshot from 2018-12-19 21-25-26

The 1024-sample input isn’t long enough to discriminate between the two frequencies — they’re close enough that it’s necessary to “hear” a longer fragment than this in order to determine that there are two frequencies at all2.

Add 8x oversampling to that last example, and it looks like this:

Screenshot from 2018-12-19 21-26-04

This is very smooth and looks super detailed, and indeed we can use it to read the peak value with more accuracy — but the peak is deceptive, because it is still merging the two frequency components. In fact most of the detail here consists of the frequency response of the 1024-point windowing function used to shape the time-domain window (it’s a Hann window in this case).

Also, in the case of peak frequencies, Sonic Visualiser might already provide a way to get the same information more accurately — its peak-frequency identification in both spectrum and spectrogram views uses phase unwrapping instead of spectrum interpolation to estimate the frequencies of stable harmonics, which gives very good results if the sound is indeed harmonic and stable.

Finally, there’s a limitation in Sonic Visualiser’s implementation of this oversampling feature that eliminates one potential use for it, which is to choose the length of the Fourier transform in order to align bin frequencies with known or expected frequency components of the signal. We can’t generally do that here, since Sonic Visualiser still only supports a few fixed multiples of a power-of-two window size.

In conclusion: interesting if you know what you’re looking at, but use with caution.


1 Notice that we are connecting sample points in the spectrum with straight lines here — the same thing I characterised as a bad idea in the discussion of waveforms above. I think this is more forgivable here because the short-time transform output is not a sampled version of an original signal spectrum, but it’s still a bit icky

2 This is not exactly true, but it works for this example

Code · Work

Notes from the Audio Developer Conference

I’ve spent the last couple of days at the 2017 Audio Developer Conference organised by ROLI. This is a get-together and technical conference for people who work on audio software and software-driven-hardware, in practice mostly people working on music applications.

I don’t go to many conferences these days, despite working in academia. I don’t co-write many papers and I’m no longer funded by a project with a conference budget. I’ve been to a couple that we hosted ourselves at the Centre for Digital Music, but I think the last one I went to anywhere else was the 2014 Linux Audio Conference in Karlsruhe. I don’t mind this situation (I don’t like to travel away from my family anyway), I just mention it to give context for why a long-time academic employee like me should bother to write up a conference at all!

DSC_0909.JPG

Here are my notes — on things I liked and things I didn’t — in roughly chronological order.

The venue is interesting, quite fancy, and completely new to me. (It is called CodeNode.) I’m a bit boggled that there is such a big space right in the middle of the City given over to developer events. I probably shouldn’t be boggling at that any more, but I can’t help it.
Nice furniture too.

The attendees are amazingly homogeneous. I probably wouldn’t have even noticed this, back when I was tangentially involved in the commercial audio development world, as I was part of the homogeneity. But our research group is a fair bit more diverse and I’m a bit more perceptive now. From the attendance of this event, you would conclude that 98% of audio developers are male and 90% are white people from northern Europe.
When I have been involved in organising events in academia, we have found it hard to get a speaker lineup that is as diverse as the population of potential attendees (i.e. the classic all-male panel problem). I have failed badly at this, even when trying hard — I am definitely part of the problem when it comes to conference organisation. Here, though, my perception is the other way around: the speakers are a closer reflection of what I perceive as the actual population than the attendees are.

Talks I went to:

Day 2 (i.e. the first day of the talks):

  • The future is wide: SIMD, vector classes and branchless algorithms for audio synthesis by Angus Hewlett of FXpansion (now employed by ROLI). A topic I’m interested in and he has clearly done solid work on (see here), but it quickly reached the realms of tweaks I personally am probably never going to need. The most heartening lesson I learned was that compilers are getting better and better at auto-vectorisation.
  • Exploring time-frequency space with the Gaborator by Andreas Gustafsson. I loved this. It was about computing short-time constant-Q transforms of music audio and presenting the results in an interactive way. This is well-trodden territory: I have worked on more than one implementation of a constant-Q transform myself, and on visualising the results. But I really appreciated his dedication to optimising the transform (which appears to be quicker and more invertible than my best implementation) and his imagination in rendering it (reusing the Leaflet mapping API to display time-frequency “maps”). There is a demo of this here and I like it a lot.
    So I was sitting there thinking “yes! nice work!”, but when it came to the questions, it was apparent that people didn’t really get how nice it was. I wanted to pretend to ask a question, just in order to say “I like it!”. But I didn’t, and then I never managed to work up to introducing myself to Andreas afterwards. I feel bad and I wish I had.
  • The development of Ableton Live by Friedemann Schautz. This talk could only disappoint, after its title. But I had to attend anyway. It was a broad review of observations from the development of Live 10, and although I didn’t learn much, I did like Friedemann and thought I would happily work on a team he was running.
  • The amazing usefulness of band-limited impulse trains by Stefan Stenzel of Waldorf. This was a nice old-school piece. Who can resist an impulse train? Not I.
  • Some interesting phenomena in nonlinear oscillators by André Bergner of Native Instruments. André is a compelling speaker who uses hand-drawn slides (I approve) and this was a neat mathematical talk, though I wasn’t able to stay to the end of it.

Day 3 (second and final day of talks):

  • The human in the musical loop (keynote) by Elaine Chew. Elaine is a professor in my group and I know some of her work quite well, but her keynote was exactly what I needed at this time, first thing in the morning on the second day. After a day of bits-driven talks, this was a piece about performers and listeners from someone who is technologically adept herself, and curious, but talks about music first. Elaine is also very calm, which was useful when the projector hardware gave up during her talk and stopped working for a good few minutes. I think as a result she had to hurry the closing topic (about the heartbeat project) which was a pity, as it could have been fascinating to have expanded on this a bit more.
    Some of what Elaine talked about was more than a decade old, and I think this is one of the purposes of professors: to recall, and to be able to communicate, relevant stuff that happened longer ago than any current research student remembers.
  • The new C++17, and why it is good for you by Timur Doumler. The polar opposite of Elaine’s talk, but I was now well-cushioned for it. C++17 continues down the road of simplifying the “modern-language” capabilities C++ has been acquiring since C++11. Most memorable for me are destructuring bind, guaranteed copy elision on value return, variant types, and filesystem support in the standard library.
    Destructuring bind is interesting and I’ve written about it separately.
  • The use of std::variant in realtime DSP by Ian Hobson. A 50-minute slot, for a talk about which Timur Doumler’s earlier talk had already given away the twist! (Yes you can use std::variant, it doesn’t do any heap allocation.) Ambitious. This was a most satisfying talk anyway, as it was all about performance measurements and other very concrete stuff. No mention of the Expression Problem though.
  • Reactive Extensions (Rx) in JUCE by Martin Finke. I have never used either React or JUCE so I thought this would be perfect for me. I had a question lined up: “What is JUCE?” but I didn’t dare use it. The talk was perfectly comprehensible and quite enlightening though, so my silly bit of attitude was quite misplaced. I may even end up using some of what I learned in it.

 

Programs for Music

Sonic Visualiser 3.0, at last

Finally!

(See previous posts: Help test the Sonic Visualiser v3.0 beta, A second beta of Sonic Visualiser v3.0, A third beta of Sonic Visualiser v3.0, and Yes, there’s a fourth beta of Sonic Visualiser v3.0 now)

No doubt, now that the official release is out, some horrible problem or other will come to light. It wouldn’t be the first time: Sonic Visualiser v2.4 went through a beta programme before release and still had to be replaced with v2.4.1 after only a week. These things happen and that’s OK, but for now I’m feeling good about this one.

 

Uncategorized

Yes, there’s a fourth beta of Sonic Visualiser v3.0 now

Previously I wrote about the third Sonic Visualiser v3.0 beta release:

“This may well be the final beta, so if you’re seeing problems with it, please do report them while there’s still time!”

Well some very kind people did report problems, and so that was not the final beta. A fourth one is now up for download. Here are the download URLs:

Fixes since the third beta

  • Fix a nasty crash in session I/O in the 64-bit Windows build (this is the main reason for the new beta)
  • Provide more log information about audio drivers to the debug log file
  • Fix a very occasional one-sample-too-short error in resampling audio files during load
  • Fix invisible measure tool crosshairs on spectrogram
  • Fix a possible memory leak in the spectrogram

Keep the bug reports coming!

This one really could be the final beta! So please do report any troubles you have with it. Drop me a line, post a comment below this article, or use the SourceForge bug tracker. And thank you!