Programs for Music

Performance improvements in Rubber Band Library

Today marks version 3.1 of the audio time-stretching and pitch-shifting library Rubber Band. This release focuses primarily on performance improvements.

In version 3.0 we introduced a totally new, higher-quality processing engine, which I’ll refer to as the R3 engine. The older one is still included, and I’ll call that R2.

Although the output of R3 typically sounds much better than R2, it uses a lot more CPU power to run. Measuring sustained throughput in frames-per-second for common fixed stretch factors, we find R2 to be typically about three times as fast as R3. Both are eminently usable in real-time on hardware from the last decade, but the headroom available for R2 can make a big difference.

It would be nice to do better, but the R3 code was already quite heavily optimised before release — it is simply a fairly CPU-intensive method. Still, as it turns out, there are a few things we can do.

Measuring performance

Sustained throughput is not the only measure. Rubber Band is often used in real-time situations where the worst-case time per processed block is what matters most.

To measure this, I set up a test case that simulates a typical sound processing callback, passing a music recording through a stretcher and emitting a fixed 512 sample frames from each processing cycle, while varying the time and pitch ratios and measuring how long each cycle takes to return. The stretcher is initialised with typical parameters for this activity (in code terms, OptionProcessRealTime | OptionPitchHighConsistency | OptionFormantPreserved) and it is primed with an initial pad before entering the cycle loop, as otherwise the first call would dominate results.

The results for R2 and R3, as of the 3.0 release, look like this:This is a graph of processing cycle count (x-axis) against time taken per 512-frame cycle (y-axis).  The y-axis is linear in time with zero at the bottom, so lower is better. No units are shown because they are totally system-dependent — this is purely a comparative visualisation, we’re only interested in the relative heights. Obviously the relative heights may also vary from system to system, so this is still quite tentative.

The test runs in four consecutive phases with different pitch and time modifications, and so the x-axis is divided into four (uneven) quadrants: raising pitch, lowering pitch, slowing down, and speeding up.

In the first quadrant, the pitch rises smoothly and then falls again, reaching a peak at two octaves up; in the second it falls smoothly and then rises again, reaching a trough at two octaves down; in the third the pitch is unchanged but the tempo slows to just under a third of the original speed and then returns to normal; and in the fourth quadrant the tempo gradually speeds up to 8x the original speed and then returns to normal.

The plots for R2 (orange) and R3 (purple) reveal significant differences in behaviour:

  • R2 is usually faster, sometimes much faster, especially for modest stretch factors.
  • R3’s long internal processing buffers and step size mean that it hops between “modes” depending on how many processing increments (1, 2, 3, 4 or occasionally 0) are required for each output block.
  • R2 has less widely-spaced distinct “modes”, because it uses smaller increments. It’s still faster because it does so much less work for each increment.
  • R2’s processing time becomes very variable, and relatively high, when speeding up the audio by a large factor (above about 3x). This may be because it continues to perform transient detection and adjust its input and output steps accordingly, and at those rates our test file contains a lot of transients. R3 is very predictable in this area by comparison.
  • Both stretchers use increasingly more CPU when pitch-shifting further upward, but not when shifting down.

The last point happens because we are using OptionPitchHighConsistency. This option ensures that the resampler used for the pitch-shift part of the operation is always engaged, so that there are no discontinuities when changing ratio (particularly to or from the 1x ratio). We’ll come back that later.

A Draft Mode for Finer Mode

The main novelty in version 3.1 is an option to deactivate R3’s multi-window processing system, dropping down to a single shorter processing window and potentially running much faster, while retaining its more advanced signal analysis and some of its output characteristics.

This is enabled using the OptionWindowShort flag when constructing a stretcher, or the --window-short argument to the command-line tool. It’s an option that already existed in R2, and conceptually it does something similar there, but the effect on performance is much greater with R3.

Here’s a plot comparing R2, R3, and the new R3 single window option (“R3short”):

With this new option we get both performance comparable to R2 and the more predictable behaviour at high tempo ratios found in R3. Splendid.

What does it sound like? Not as good as R3; it loses some percussive clarity and quite a lot of low-end stability. For some material, particularly acoustic instruments and vocals without too much bass content, it can sounds markedly better than R2. It’s not a universal substitute, but it’s really not bad given the CPU budget.

Here are some ten-second audio clips to give you an idea. Both are stretched to 140% of their original duration using R2, R3 with short window, and full R3. Neither of these is trivial to handle, though the second is far harder than the first.

Resamplers and FFTs

Rubber Band makes heavy use of audio resampler and fast Fourier transform (FFT) implementations. Originally it used external libraries for both, but in June 2021 a built-in FFT was added and in October 2021 a built-in resampler appeared as well.

These are both slower than the best external libraries, but they make Rubber Band simpler to build and integrate. And the built-in resampler is also designed to reduce clicky artifacts and maintain tempo integrity on ratio changes, at some further expense in performance, so if you do have the headroom it is worth defaulting to.

Here’s a performance comparison of the built-in resampler with libsamplerate in the “draft” short-window R3 mode described above.

Clearly libsamplerate is both faster and more predictable. It’s faster even when changing only the tempo, which doesn’t involve resampling, because of our previously-mentioned use of OptionPitchHighConsistency which keeps the resampler running at all ratios.

(Incidentally all of the other performance plots in this post were made using libsamplerate, unless otherwise specified. Its smoother performance profile makes other comparisons easier.)

I’ve mentioned OptionPitchHighConsistency a couple of times now. If we use OptionPitchHighSpeed instead, we get quite different behaviour:

The relation between the amount of pitch shift and the CPU effort is totally gone. All pitch shifts are roughly equal, and the time-stretching quadrants are faster. The tradeoff, unfortunately, is that there are now audible discontinuities every time the pitch ratio reaches or crosses 1.0.

Traditionally the alternative to libsamplerate in Rubber Band has been a resampler implementation cribbed from the Speex audio codec and provided with Rubber Band as a compile-time option. This resampler was a bit unsatisfactory for various reasons, but a much improved version of it has for a while been available in a library called speexdsp.

As of v3.1 Rubber Band now includes support for speexdsp as well, and it works well — audio quality seems good, and so is performance on my test hardware, shown here against libsamplerate:

I don’t think this is well-exercised enough to be a standard recommendation yet, but it’s promising.

The built-in FFT fares better than the resampler, but in addition to the previously-supported external libraries (FFTW, IPP, and Apple’s vDSP) this release also adds support for FFTs from SLEEF, a library which looks as if it should be competitive on platforms that have been short on good options in the past.

To summarise:

  • The R3 time-stretcher and pitch-shifter engine introduced in Rubber Band 3.0 sounds great, but is relatively CPU-intensive compared to the older R2
  • The new 3.1 release introduces a draft mode (“short-window” or single window mode) for the R3 engine, that retains some of its good qualities while running much faster and with more predictable CPU usage
  • You may be able to speed up your implementation by using an external resampler or FFT library, and the 3.1 release adds support for a couple of new ones with good performance.

See the Rubber Band Library site for more information about the library.

Thank you for your time. Perhaps we can help you make more of it.

* * *

Many thanks to Davy Wentzler for valuable feedback on the 3.1 development process.

 

Code · Programs for Music · Work

Rubber Band Library: a thrilling new release

Rubber Band is a software library I wrote a while ago for changing audio recordings, typically of music, by altering their speed or pitch independently of one another — often known as time-stretching and pitch-shifting.

There’s a new release out, version 3.0, and I think it’s terrific and sounds great and I’m very proud of it. (Audio examples here.) But I should warn you that I find time-stretching an endlessly fascinating idea, so before I say more about the new release I’m going to digress around it for a bit.

Time-stretching

If you speed up or slow down a recording by “naive” means such as by sample rate conversion (the computational equivalent of playing an old-school tape or record at the wrong speed) its tempo and pitch change together. As it gets slower it gets lower, as it gets higher it goes faster. The result is mathematically precise and perfectly sensible but not always auditorily useful.

Time-stretching in contrast is often useful but marvellously ill-defined. I think of it as answering the question “what would this sound like if the same musicians had played it at a different tempo?” But there isn’t enough information in the signal to answer that, and people’s expectations about it are subjective and inconsistent.

Say you’re making a recording slower. If a singer sings a note with vibrato, do you expect the vibrato also to slow down? Or should it wobble at the original speed while the note gets longer? If the drummer hits a cymbal, and you slow it down, do you expect the whole sound to be fuzzily smooshed out? Or do you expect the first percussive hit to sound like the original but the decay to be extended? Or do you expect both hit and decay to be preserved exactly as the original, because if they had been playing at a different speed they would still have been hitting the same cymbal? Whatever your opinion is, would it be the same for both a recording of a real cymbal and a synthetic cymbal-like sound from a noise generator?

We have already ruled out the straightforwardly mathematical answers to these questions, because those involved changing the pitch as well. The answers appear to be essentially aesthetic.

Time-stretching software has come to a sort of consensus on these things, but it’s still largely based on what is practical rather than what an audience might expect. They slow down the vibrato, but really because it’s so much more difficult not to. They try to preserve the hit of the cymbal and extend the decay. There are many other interesting possible choices.

No doubt before too long such software will be replaced by deep learning systems that re-dream the original performance as a mere side-effect of visualising the band playing it at a different tempo or just in a different posture. But that moment does not appear to have quite arrived yet.

Back to the subject

So yes, there’s a new release of Rubber Band out. After the above, I’m sorry to admit that it doesn’t totally redefine the time-stretcher consensus, but it does do an acceptable job with that consensus and that’s good enough to delight me.

The aim with this update was to bring Rubber Band back to the same relationship to the state-of-the-art as it had when first released a shocking 15 years ago. That is: not state-of-the-art, but as close as can reasonably be expected in a nicely-licensed portable library that is fast enough for real-time use on ordinary CPUs of the day.

For the original release, that meant it was a phase vocoder (a frequency domain technique) which tries to maintain horizontal phase continuity for harmonic partials within the signal, but also detects transients (noisy instants) and resets all phases when one is found, so that the transients sound good. That’s a nice approach for signals that have a clear distinction between steady and transient sounds, like drum loops or a lot of electronic music. It’s problematic for more organic sounds or complex mixes, in which it can have trouble deciding which bit is the transient and in which its incorrect decisions are all too obvious.

That processing engine is still there in the new release. It’s good. It’s nicely fast on current hardware and has a lot of practical uses, and for reasons of compatibility it is still the default method used — so if you update the library but don’t change your code, you’ll still get the same results.

But there is also a new engine that’s just like the original one was when it appeared. That is, it still isn’t the literal state-of-the-art, but it is once again as good as can be had in a nicely-licensed portable library that is fast enough for real-time use on ordinary CPUs.

The new engine is still a phase vocoder, but it splits the signal into up to multiple frequency bands with different window lengths and shapes, and seeks limited areas of transience within the frequency spectrum rather than applying its transient phase reset across the whole signal at once.

It does use a lot more CPU power than the older one. I had aimed to get it within twice the CPU budget, but at the moment it’s more like 3 or 4 times. There may be improvements to come — as it stands this is fast enough for real-time in a responsive application on desktop or laptop, but probably not for mobile platforms, where the original Rubber Band engine has been and continues to be very suitable.

Our listening tests found that it sounded really good: it wasn’t considered the best available for every test case, but it was the best in test for some, for the rest it was close to it, and in every case it improved on the existing method. I hope you’ll agree, but time-stretching is both very subjective and very dependent on the source material and ratio. Despite our tests, it’s totally possible you might listen to the new version and hear something that offends you straight away — I hope you won’t, but people have amazingly different levels of receptivity for different audible artifacts. It might be interesting to hear about it if that happens.

If you’d like to try out the new engine (or indeed the old one) we have a little desktop application called Rubber Band Audio that you can use to load an audio file and mess with the tempo and pitch as you listen. It has a free demo version for Windows, Mac, and Linux.

Code

A note on the paging behaviour of more(1) in util-linux 2.38

I just updated this system from util-linux 2.37 to 2.38 (util-linux is a set of small, commonly-used command line programs) to find a small but distracting change in the behaviour of more(1), the venerable text file pager utility.

For as long I can remember, the behaviour of more when run on a text file shorter than the current height of the terminal has been to print the contents of the file and return without any interaction.

In util-linux 2.38 this changes, so that more when run on a small text file will clear the terminal, show the file at the top of the window, print END at the bottom, and wait for input before returning. This is kind of distracting: clearing the terminal is not something I want, and also it makes the file look as if it has lots of blank lines at the end.

I spent a wee while figuring out where and why this change was introduced: turns out it’s for POSIX standards compliance. The commit that introduced the change is titled “POSIX compliance patch preventing exit on EOF without -e”, and the POSIX version of the man page for more(1) indeed supports this behaviour. I don’t remember ever seeing it before. I wonder which system it originated with.

Anyway the good news is that the new option -e or --exit-on-eof restores the expected behaviour, and adding export MORE="--exit-on-eof" to .bashrc makes it the default again.

Academics · Programs for Music

Note on “Explorations in Time-Frequency Analysis” by Patrick Flandrin

Patrick Flandrin is a physicist and signal-processing researcher whose name I first encountered as co-author (with François Auger) of a 1995 IEEE Transactions on Signal Processing paper called “Improving the Readability of Time-Frequency and Time-Scale Representations by the Reassignment Method”.

This crunchy publication (21 pages, dozens of equations and figures) took a pleasing idea — replacing the familiar grid-format time-frequency spectrogram with a field of precisely localised points calculated using both magnitude and phase of the frequency bins, rather than only magnitude as a traditional spectrogram does — and set out the mathematics of applying it to a number of different time-frequency and time-scale representations.

Illustration from Auger & Flandrin (1995)

I read this paper about 15 years ago and didn’t understand it. I have since realised this is partly because it isn’t all that clear with its notation, but there is also a big gap between the naive programmer’s view (that’s mine) of a spectrogram and the mathematical analysis used in the paper.

To explain. For a programmer, a spectrogram comes from taking short overlapping slices of a sampled signal, multiplying each by a smoothing window shape, applying a short-time Fourier transform, and taking the magnitudes of the complex output bins to get one column of the spectrogram per slice of input. The short slices are because you want a fixed, smallish number of output bins, and you have various tradeoffs — time and frequency resolution and computational efficiency — to consider in that. The smoothing window is because your Fourier transform — a thing which matches up sinusoids of different frequencies against a signal to identify which ones would add up to it — operates on an infinite signal, consisting of the input you give it repeated forever in both directions: this will have a discontinuity each time it wraps around, and the smoothing window removes some of the frequency artifacts from these discontinuities. There is nothing particularly mathematical about the implementation of this, and any intuition used by the programmer is a mixture of the visual and techniques from the world of engineering. The language used in a publication like the DAFx book is typical in this world.

The Auger & Flandrin paper instead comes from a world that summarises a spectrogram as a two-dimensional Wigner-Ville distribution filtered with a smoothing window leading to a time-frequency representation of the Cohen’s class. Signals are finite-energy functions over infinite domains, and a spectrogram is a double integral over time and angular frequency. Both time-domain functions and time-frequency representations are continuous, and practical questions about overlap and window length don’t arise. I can dimly remember this world, because my undergraduate degree — who am I kidding, my only degree — started out as pure maths, but I haven’t inhabited it for any of my working life.

So I didn’t really understand the paper, and a programmer has plenty to do, and that is one reason why Sonic Visualiser’s “Peak-Frequency Spectrogram” layer calculates instantaneous frequencies from the phase difference between successive columns, something which I found much easier to understand. (It turns out there are other good reasons one could make this choice, but I didn’t know that.1)

Returning to the paper recently, I learned that Flandrin had written a book on the subject, and I bought a copy hoping it might bridge the conceptual gap. It turned out to be a good experience.

* * *

“Explorations in Time-Frequency Analysis” is a monograph digressing on things the author has found interesting in the past 30 years, which — what luck! — happen to be about time-frequency analysis. It’s short, about 200 pages, and nicely printed. There are lots of diagrams, and although equation-heavy it doesn’t hang about proving things, sending you to the references instead. It begins with a glossary of notation (I like it when books do this) and ends with a 9-page bibliography. The writing is crisp and friendly and the scene is set by the first two chapters, a philosophical outline and a chapter of examples with the lovely title “Small Data Are Beautiful”.

Although the book provides a lot of the background to the paper that defeated me, I still spent a potentially embarrassing amount of thought on things I imagine that anyone properly within the target market finds obvious. An example is what it means for a Gaussian function to be “circular” in time and frequency. The book goes over this in far more detail, but briefly a Gaussian — the bell-shaped normal distribution curve found in probability — has the property that its Fourier transform is also a Gaussian. The “wider” the bell shape in the time domain, the “narrower” in the frequency domain: at some point it must be equal in both, and then if you plot it in a spectrogram-like heat map you will see a circle. When does this happen? It’s shown that it happens for the Gaussian corresponding to a normal distribution of variance 1. But at this point I am worrying about units. What does it mean to be circular? The figures illustrating this lack units in either axis — in fact detail-wise many of the figures are more like sketches — and the little bit of engineer in me is wondering: how can you possibly have a circle if you lack units?

The answer I eventually recalled is that the units in one domain define those in the other. In this case, if the time axis is in seconds then angular frequency is radians per second, and a circle is a distribution whose extent in seconds is the same as that in radians per second. Other units such as samples (in time) or STFT bins (in frequency) have similar correspondences in the other domain. This is a place where going back to basics took significant thought, but I did actually appreciate being expected to think about it.

So a nice rehearsal with some interesting bumps, but for me the thrilling twist arrives in chapter 12, “Spectrogram Geometry 2”. This reframes the spectrogram as a complex plane and the reassignment operator in terms of motion in a potential field proportional to the log-spectrogram. This mathematical leap is also an intuitively visual one, and it’s exciting for me because it is a little like how I pictured the spectrogram, with no meaningful mathematical analysis, when developing a certain feature of the Rubber Band timestretcher.2 This chapter is like seeing the vaguely-realised ground beneath your feet resolve into a larger, recognisable object — the moment when you realise you are standing on the back of a giant Pokémon, if you will.

There is a lot more in this book, and I think it will repay repeated visits. I’m not sure whether you could implement anything directly from it, but you could, say, pick a random page and follow up all the references until you really feel you understand it. I think this would be a rewarding exercise that, for someone like me, would probably take around a month per page.

* * *

On that note, one of the first references given is to a book called “Visible Speech” by Potter, Kopp, and Green, 1947. I looked this up and was so intrigued that I tracked down an ex-library copy. It is a lavish presentation, perhaps with both training and PR elements, of a then-new idea called the “sound spectrograph”, i.e. a spectrogram. The title “Visible Speech”, incidentally, is borrowed with attribution from an earlier (1867) work about phonetic alphabets.

The authors of the 1947 book were writing about work done at Bell Labs to try to make the telephone accessible to the deaf. Their experimental devices used paper tape or phosphor display to show spectrographs of the speech sounds, and users were specially trained to interpret speech from them. Here’s a picture from the book of someone using one.

Operator sitting at a table in front of a large box with a tiny screen on itThe spectrographs were produced by automatically recording the speech to tape and playing the tape repeatedly through a filter of 300Hz bandwidth, whose centre frequency was incremented linearly between passes in 15Hz steps from 0-3500Hz. (They also had a version using 45Hz bandwidth filters, but it was found to be less legible.) The system was of course analogue.

In this image the top spectrograph is the one with 45Hz bandwidth, which is used to point out some interesting features, but the 300Hz bandwidth spectrograph below it is the form used throughout the rest of the book:

It’s striking how clear these spectrographs are, and it makes a useful reminder that we really aren’t always looking for the most precise representation of something — 300Hz bandwidth at speech frequencies is pretty wide! — but instead the most appropriate in some human dimension.

 


1 The Sonic Visualiser peak-frequency spectrogram precisely localises stable frequencies, but for each frequency bin it draws a short horizontal line across the whole duration of the bin at the proper frequency rather than localise the bin to a point in time. A very similar output could have been produced using reassignment, because the frequency calculated from phase difference should be very close to that calculated with reassignment. But a decision to do that would have meant ignoring the other reassignment operator, localisation in time, which gives a single point rather than a horizontal line for each bin. Had I understood the reassignment paper, I would probably have felt compelled to do that part properly. For it to work well, a greater bin overlap and much more sophisticated rendering would have been needed, and the result would have been much slower and possibly less clear for real music. I think.

2 This feature, which I gave the vague name “phase lamination”, was worked out in a hurry after discovering that the “phase locking” technique of Jean Laroche and Mark Dolson which I had used in the very first release of Rubber Band was patented. Phase locking reduced audible phasiness with the nice side-effect of making the phase vocoder faster to compute, but it also lent a robotic tang to the sound which certain listeners found even more unpleasant than the phasiness. The scheme I came up with to replace it was based on picturing a gradient field and making adjustments to bins near a peak or trough in proportion to the distance from it — tuned by ear rather than worked out mathematically. Although it lost the improved speed of phase locking, it usually sounds better. The idea seems reasonably obvious, but I hadn’t seen it described anywhere else and I was delighted to find it.

Actual physical objects made of stuff · Computers · Flat Things

Can you swap the keyboards between a Thinkpad T40 and an IBM SK-8845?

No.

(I’m posting this because it’s something I searched for and didn’t find an answer to.)

The SK-8845 and its sibling the SK-8840 are IBM-branded keyboards with built-in trackpoint and trackpad.

They appear to be essentially the keyboards from Thinkpad T40 (or T41, T42, T43) laptops, pulled out and put in a separate case for use as a compact keyboard/mouse controller in a server room. They’re pretty good keyboards in a classic Thinkpad laptop sort of way.

I have an SK-8840, which is the (less common?) PS/2 variant. As far as I can tell from photos, without having seen one in person, the SK-8845 is the same keyboard with a USB plug on it.

Here’s the SK-8840:

IBM SK-8840 keyboard. I’m sorry, I removed the trackpoint. I always do – I never use it and it gets in the way a little. I do keep the trackpoints in case I want to sell up though. Have you noticed the trackpoint pin is rotated 45° from the normal Thinkpad keyboards? I wonder why.

Visually this is much the same as the keyboard on the T4x series of Thinkpad laptops, which are truly excellent. It doesn’t feel like a T40 keyboard though. It’s more like the later, still good but not quite so good T60 series. So I wondered whether it was possible to swap the keyboard part with that from (ahem) one of my T4x laptops.

The answer is no: it’s physically incompatible. The plate at the back is different, the connections are different, the stand-offs are different. It may be possible to adapt one into the other and that could be an interesting project for someone more ambitious than me, but it definitely isn’t a case of just pulling out the keyboard part and plugging it in. The same goes for the T60.

Here’s the innards, and the back of the keyboard plate, of the SK-8840:

Here are the backs of the keyboards from a T40 (above) and T60 (below):

That’s all.

Films · Finnish Affairs · Good Things

Films by Aki Kaurismäki

We sat down, my wife and I, during a quiet period that could otherwise have been a little gloomy, to watch as many feature films directed by Aki Kaurismäki as we could lay hands on.

We watched them in chronological order, so this could have been an interesting article about the director’s development from his earliest days. But we were geeky enough to mark them out of ten, so what we’ve actually made is a list ranked from our least to most favourite.

These aren’t our first impressions, in most cases – we’ve seen many of these before, some three or four times, but watching them in order with a vaguely critical eye was new and pretty good fun. I should say that we don’t know any Finnish (other than words like raha and miksi? which come up all the time in these films) and watched with English subtitles. We got all of these from the Curzon DVD box set, except for I Hired A Contract Killer which isn’t in it for some reason.

A constant in these films is the cinematographer Timo Salminen who seems to have worked on every one of them. A few more than half are in colour, the rest black-and-white, all analogue film. They’re all beautiful to look at, and since Kaurismäki spends a lot of time just looking at things, that’s certainly fortunate.

I have placed a useful star * next to the names of the films in which the male hero is violently attacked for no reason other than to set up the rest of the film. This Kaurismäki trope is so common that one might as well mark it. I have not specially marked the films in which a cute dog plays a sympathetic role, the films in which the heroes drive around in a car much older than any of the others on the streets, the films that cut away to a complete live performance of a song by a folk or rock ‘n roll band, or the films in which someone impulsively performs an act of great kindness without expecting or receiving any acknowledgement for it.

17. Leningrad Cowboys Meet Moses (1994)

Leningrad Cowboys, the worst band in the world, return from the Americas to Europe on the instigation of a man who strangely resembles both their manager and the Biblical Moses.

16. Juha (1999)

An adaptation of an apparently noted Finnish novel from 1911 in the form of a black-and-white silent movie (with music soundtrack) with film noir stylings and references.

I like its confusion about its era, as when the silly, soon-to-be-tragic couple (“they were as happy as children”) suddenly switch from serving and eating porridge out of a big pot to heating up a microwave meal, as a symbol of their fall.

I very much like that Marja packs a rubber duck when she moves out.

But the story, setting, and ending felt a bit too miserable for us. Its silence felt as if we’d been left on our own to digest it, which made it feel gloomier.

15. Hamlet Goes Business (1987)

A straight-ish Hamlet, considering it’s a contemporary film about the rubber-duck industry, filmed in super-crisp black-and-white with a terrific ending. Not bad at all.

14. Calamari Union (1985)

This film can’t possibly live up to the summary of its plot: Fifteen desperate men, all called Frank, try to cross Helsinki to reach the fabled Eira district. Most don’t make it.

Their journey is a terrible struggle that takes… months? years? – whereas maps of the real world suggest that Eira is maybe an hour’s walk from where they began.

This was the second Kaurismäki film I ever saw, after Take Care Of Your Scarf, Tatjana!. It’s messy and puzzling in comparison and some of the jokes are a bit too in-joke for my head, but it has a similarly glorious commitment to its idea. We’ve made a note to watch it again soon.

13. Crime and Punishment (1983)

Much more fun than might be expected from an adaptation of a grim Russian novel as the début work by a famously downbeat Finnish director.

12. La Vie de Bohème (1992)

Sparkling, then it hits the wall. This adaptation of the book that also spawned the opera La Bohème is set in Paris and acted in French, even though some of the main actors apparently didn’t speak French. Since those people are portrayed as both foreign and, more importantly, bonkers fabulists, this actually works ok for me, though I might not think so if French was my first language. This was Kaurismäki’s first film with delightful antihero André Wilms, and its first half-hour is as overtly comical as anything here.

When things start to get real, it drags a little, perhaps because by that point we are not really in the mood for realism. So although the ending should be moving, and for better people than us it probably is, we found ourselves thinking this was the first time one of his films had run longer than it should have.

11. The Match Factory Girl (1990)

Sent to divide us.

This is the only film by Kaurismäki to appear in the TSPDT Top 1000, which I understand to be a crowd-sourced list of the 1000 best films in the world. The director has apparently said this was the first film of his that he found good enough to watch. Reviews call it “devastating” and “a sobering parable”. Kati Outinen says she didn’t know she was expected to play the lead until she turned up for filming. It’s about a woman who is misused by her family, abused by a lover, and seeks revenge.

It’s a beautifully posed and filmed study of people, relationships, and work, a grotesque story about sexism and class, and also a potentially funny comedy, if you are not a child. It is very good, but it’s bleak, rather linear, a bit slight, and I don’t think I could ever be persuaded that it’s the best thing Kaurismäki has made. I wonder whether the high rankings come from people thinking it must be important because it’s grim, or that they ought to include Kaurismäki on their list and this looks like the least foolish option. I’d love to hear opposing views.

10. * Lights in the Dusk (2006)

This is an odd one. It’s in a more realistic style than the films that came before and after it (The Man Without A Past and Le Havre). Helsinki is modern, shiny, and unfriendly, and there is, unusually, no ambiguity about when the film is set. It’s a film about a man who ignores the world: he is picked up and used in a heist, and takes the fall for it with no indication that he recognises what is happening. The idea is not unfamiliar, but it’s unusually concrete and blunt. Perhaps the world is not as good as some of these other films suggest it can be. Kaurismäki’s previous film had been almost a fairytale. Was he sickened by it?

This is surely the most macho of Kaurismäki’s films. It has taciturn mob bosses and bullet-headed thugs, the female lead is a classic femme fatale, and other women get merely a line or two. It has what appears, to ignorant outsider me, to be a slightly laboured red vs white class divide (poor honest folk with Finnish names, rich gangsters with Swedish ones). There’s little warmth to the film, though there are certainly mysteries, of which the prime one is why the hero refuses to acknowledge Aila the hot-dog seller.

It is funny, sometimes, and definitely compelling to look at, and has a fine soundtrack, but it left me feeling uneasy. Is it me? Am I misunderstanding? I fear that my response to this film might be something like the response an English-speaking viewer ought to have to Kaurismäki in general.

9. I Hired A Contract Killer (1990)

This film is in English, set in London. If you happen to know London, it’s well worth watching just for the amazing adaptations of London scenery.

For the English-speaking viewer, it also offers an opportunity to test the theory that we only like these films because they’re in a foreign language, set in foreign places, and subtitled.

We just about passed that test. London looks splendidly grubby, there are some amazing tableaux and great dialogue, and I enjoyed the either dispassionate or dismissive delivery, and it’s charming in a satisfyingly bleak way. It does feel a little bit arbitrary and discontinuous in comparison with the best here, but (unlike most of these) we’ve only seen this film once, and would happily see it again.

8. Leningrad Cowboys Go America (1989)

Definitely our kids’ favourite Kaurismäki film: the only one they find truly acceptable. This is the tale of the worst band in the world, who are too awful for their home country and must go to America to be tolerated.

Part of the fun, besides plenty of low comedy, is that the Cowboys are portrayed as dreadful while being in fact pretty good. They may have to subsist on onions but they can definitely play, and the audiences, which are apparently real, generally seem to think so too. (The band now has an extensive performing history outside of the movies.)

A brilliant film in its stupid way.

7. * The Other Side of Hope (2017)

The story of an emigrant from wartime Syria arriving in Helsinki and then trying to find his sister. More realistic than Le Havre, the film that it is temporally and thematically following, and less hopeful. Its bleaker outlook is curiously reflected in the soundtrack, which is much sparser than usual. Although the film does have comic moments, they don’t always work for me. The Japanese restaurant sequence here is the only scene in any of Kaurismäki’s films that feels as if it’s making fun of the protagonists, although it’s redeemed a little by the gravity with which the restaurant owner responds to the humiliation.

That aside, this film is full of tender and generous acts. It feels like a handbook for behaviour in difficult times. I’m oddly reminded of Robin Sloan’s wordy but conceptually neat Proposal for a book to be adapted into a movie starring Dwayne The Rock Johnson, the thrust of which is that The Rock will inevitably be US President at some point, so we should prepare by placing him into a dramatic situation which enacts the compassion and empathy that we expect a US President to have. This film immerses us in a specific situation, like an aeroplane emergency instruction sequence, to give us an overdue preparation for a crisis we are already in.

6. * Ariel (1989)

Everything in this film is terrific. The opening, the drive across Finland, the scene where the couple meet, the breakfast with the kid and the gun, the scene with the cake, that scene at the end whose punchline has been set up from the start. It’s glorious, and by some measure it’s as brilliant as any film ever was, but by our measure it’s a little too much a series of great set-pieces, so it’s not allowed to be the best here.

5. * The Man Without A Past (2002)

I wasn’t keen on this the first time I saw it. I think that was because: it has a Hollywood plot (man wakes up with no idea who he is, learns to live among people he would have previously overlooked); it starts with an unpleasant act of violence that is hard to forget; and the lead actor is fairly hard-looking and doesn’t give us a lot to go on.

I changed my mind completely the second time around. That may be because in the meantime I’d seen its hero Markku Peltola in Drifting Clouds, and I felt warmer toward the man whose hands could no longer whip up a porridge. Or I had become a more sympathetic person myself, perhaps because of the times, and I realised how beatific The Man Without A Past is. It’s the first of a series of recent Kaurismäki films that are essentially richly coloured fairytales, illustrations of how we all could be, if we admitted our better selves. Despite the brutal act that sets up the plot, this is a peaceful film.

It also now seems to me like a wonderful showcase for Markku Peltola, the men who embraces the difficult life and looks just gently humorous at just the right moments, and for Kati Outinen in the most uncompromising and uncommunicative of her leading roles.

There are problems with it I think. Life in difficult places is made to seem rather easy, and we have a not entirely workable dichotomy between the happy poor and everyone else. But having seen this film twice, I look forward to watching it again.

4. Take Care of your Scarf, Tatjana! (1994)

This ludicrously-titled black-and-white road movie is for me the ur-Kaurismäki, in that it was the first of his films I ever saw. I taped it from a late-night TV broadcast in the 90s (no way I was staying up for something so random) and still have the VHS tape somewhere. It looks much better on DVD though, because the photography throughout is just gorgeous. I mean really gorgeous.

It’s sort of a loser-buddy comedy, filmed in stark monochrome as if it’s a gritty exposé. The (male) heroes are a silent idiot and a violent alcoholic braggart who have no depth and, in principle, nothing much to like about them. That makes them sound like action figures, but they have no action going on either. They are totally outshone by Tatjana and Klavdia, the friendly but unsentimental women they meet.

The film is full of lugubrious gags and comic ideas, and those are what I generally remember about it. A celebratory quarter-of-a-sandwich with tea to cement the friendship between nations, repairing a car by pulling bits out of the engine and throwing them away, Valto’s in-car coffee maker, Reino’s worryingly excitable monologue about punching someone.

But when watching it, it’s the spaces between those moments that make the film what it is. I found that I love this film much more when I’m actually watching it than I do in my memory.

And it’s only an hour long.

3. * Shadows in Paradise (1986)

Bone-dry and beautiful. This provokingly slow film spends a lot of rich colour film looking at its lead actors, Matti Pellonpää and Kati Outinen, who are everything that matters in it. It’s only 80 minutes long, but feels a little longer even if you’re enjoying it. I like that. It’s funny and just sufficiently kind.

It’s no accident that Kati Outinen has a leading role in all the films at the top of this list. I love her later, more reserved middle-aged figures, but in this earlier one her slightly shifty character, on the edge, suspicious, always ready to abscond is a delight.

2. Le Havre (2011)

A boy escapes when a migrant family is stopped by the police in France, and the people around try to help him. A fairytale, a romance about human beings that pretends to be a police story or thriller. Such a clean and beautiful film, and I think better when you’ve seen it once already, know what happens, and can stop worrying about the plot.

This film is supposedly a sequel to La Vie de Bohème and has a lot in common with it, including being in French, but it is built the other way up. La Vie de Bohème spends a lot of time on everyday transactions, which are satisfying and entertaining, but becomes harder work when it goes ethereal later on. In Le Havre it’s the other way around: the transcendent is normal, and the rest of life is just telegraphed there to be the context for it. And this one has a slightly higher proportion of actors who actually speak French.

Kaurismäki goes to exceptional lengths to blur when this film is set. Taxis and police cars are from the 80s, some of the locations are set up to look much older, photographers use manual-winding film cameras, phones have rotary dials, nobody has a mobile phone, but the plot is precisely contemporary for the date the film was made. This film was comparatively well-funded I think, which made me wonder whether he would have gone this far every time if he could have afforded it. The result is a highly personal feeling in which the world around us is subjective and dream-like, and only the people in it are real.

1. * Drifting Clouds (1996)

A glorious love story that, in my alternate world, is a Christmas film that the family gathers to watch, a bit like It’s A Wonderful Life is supposed to be.

Drifting Clouds is a film about a middle-aged couple facing financial disaster, deep in debt, having lost their jobs, with a tragedy in their past, in the bleak economic climate of mid-90s Finland. It’s also a comedy, one that doesn’t demean its characters, who are proud, admirable, and committed to one another. Kati Outinen is excellent again but the solidity and good humour of Kari Väänänen is essential to prop things up and there’s a compelling supporting cast. The staging is beautiful, the photography cautious and sympathetic, and there’s a po-faced joke in every other line. Although slow-moving, it’s never slow.