Rubber Band is a software library I wrote a while ago for changing audio recordings, typically of music, by altering their speed or pitch independently of one another — often known as time-stretching and pitch-shifting.
There’s a new release out, version 3.0, and I think it’s terrific and sounds great and I’m very proud of it. (Audio examples here.) But I should warn you that I find time-stretching an endlessly fascinating idea, so before I say more about the new release I’m going to digress around it for a bit.
If you speed up or slow down a recording by “naive” means such as by sample rate conversion (the computational equivalent of playing an old-school tape or record at the wrong speed) its tempo and pitch change together. As it gets slower it gets lower, as it gets higher it goes faster. The result is mathematically precise and perfectly sensible but not always auditorily useful.
Time-stretching in contrast is often useful but marvellously ill-defined. I think of it as answering the question “what would this sound like if the same musicians had played it at a different tempo?” But there isn’t enough information in the signal to answer that, and people’s expectations about it are subjective and inconsistent.
Say you’re making a recording slower. If a singer sings a note with vibrato, do you expect the vibrato also to slow down? Or should it wobble at the original speed while the note gets longer? If the drummer hits a cymbal, and you slow it down, do you expect the whole sound to be fuzzily smooshed out? Or do you expect the first percussive hit to sound like the original but the decay to be extended? Or do you expect both hit and decay to be preserved exactly as the original, because if they had been playing at a different speed they would still have been hitting the same cymbal? Whatever your opinion is, would it be the same for both a recording of a real cymbal and a synthetic cymbal-like sound from a noise generator?
We have already ruled out the straightforwardly mathematical answers to these questions, because those involved changing the pitch as well. The answers appear to be essentially aesthetic.
Time-stretching software has come to a sort of consensus on these things, but it’s still largely based on what is practical rather than what an audience might expect. They slow down the vibrato, but really because it’s so much more difficult not to. They try to preserve the hit of the cymbal and extend the decay. There are many other interesting possible choices.
No doubt before too long such software will be replaced by deep learning systems that re-dream the original performance as a mere side-effect of visualising the band playing it at a different tempo or just in a different posture. But that moment does not appear to have quite arrived yet.
Back to the subject
So yes, there’s a new release of Rubber Band out. After the above, I’m sorry to admit that it doesn’t totally redefine the time-stretcher consensus, but it does do an acceptable job with that consensus and that’s good enough to delight me.
The aim with this update was to bring Rubber Band back to the same relationship to the state-of-the-art as it had when first released a shocking 15 years ago. That is: not state-of-the-art, but as close as can reasonably be expected in a nicely-licensed portable library that is fast enough for real-time use on ordinary CPUs of the day.
For the original release, that meant it was a phase vocoder (a frequency domain technique) which tries to maintain horizontal phase continuity for harmonic partials within the signal, but also detects transients (noisy instants) and resets all phases when one is found, so that the transients sound good. That’s a nice approach for signals that have a clear distinction between steady and transient sounds, like drum loops or a lot of electronic music. It’s problematic for more organic sounds or complex mixes, in which it can have trouble deciding which bit is the transient and in which its incorrect decisions are all too obvious.
That processing engine is still there in the new release. It’s good. It’s nicely fast on current hardware and has a lot of practical uses, and for reasons of compatibility it is still the default method used — so if you update the library but don’t change your code, you’ll still get the same results.
But there is also a new engine that’s just like the original one was when it appeared. That is, it still isn’t the literal state-of-the-art, but it is once again as good as can be had in a nicely-licensed portable library that is fast enough for real-time use on ordinary CPUs.
The new engine is still a phase vocoder, but it splits the signal into up to multiple frequency bands with different window lengths and shapes, and seeks limited areas of transience within the frequency spectrum rather than applying its transient phase reset across the whole signal at once.
It does use a lot more CPU power than the older one. I had aimed to get it within twice the CPU budget, but at the moment it’s more like 3 or 4 times. There may be improvements to come — as it stands this is fast enough for real-time in a responsive application on desktop or laptop, but probably not for mobile platforms, where the original Rubber Band engine has been and continues to be very suitable.
Our listening tests found that it sounded really good: it wasn’t considered the best available for every test case, but it was the best in test for some, for the rest it was close to it, and in every case it improved on the existing method. I hope you’ll agree, but time-stretching is both very subjective and very dependent on the source material and ratio. Despite our tests, it’s totally possible you might listen to the new version and hear something that offends you straight away — I hope you won’t, but people have amazingly different levels of receptivity for different audible artifacts. It might be interesting to hear about it if that happens.
If you’d like to try out the new engine (or indeed the old one) we have a little desktop application called Rubber Band Audio that you can use to load an audio file and mess with the tempo and pitch as you listen. It has a free demo version for Windows, Mac, and Linux.