Note on “Explorations in Time-Frequency Analysis” by Patrick Flandrin

Patrick Flandrin is a physicist and signal-processing researcher whose name I first encountered as co-author (with François Auger) of a 1995 IEEE Transactions on Signal Processing paper called “Improving the Readability of Time-Frequency and Time-Scale Representations by the Reassignment Method”.

This crunchy publication (21 pages, dozens of equations and figures) took a pleasing idea — replacing the familiar grid-format time-frequency spectrogram with a field of precisely localised points calculated using both magnitude and phase of the frequency bins, rather than only magnitude as a traditional spectrogram does — and set out the mathematics of applying it to a number of different time-frequency and time-scale representations.

Illustration from Auger & Flandrin (1995)

I read this paper about 15 years ago and didn’t understand it. I have since realised this is partly because it isn’t all that clear with its notation, but there is also a big gap between the naive programmer’s view (that’s mine) of a spectrogram and the mathematical analysis used in the paper.

To explain. For a programmer, a spectrogram comes from taking short overlapping slices of a sampled signal, multiplying each by a smoothing window shape, applying a short-time Fourier transform, and taking the magnitudes of the complex output bins to get one column of the spectrogram per slice of input. The short slices are because you want a fixed, smallish number of output bins, and you have various tradeoffs — time and frequency resolution and computational efficiency — to consider in that. The smoothing window is because your Fourier transform — a thing which matches up sinusoids of different frequencies against a signal to identify which ones would add up to it — operates on an infinite signal, consisting of the input you give it repeated forever in both directions: this will have a discontinuity each time it wraps around, and the smoothing window removes some of the frequency artifacts from these discontinuities. There is nothing particularly mathematical about the implementation of this, and any intuition used by the programmer is a mixture of the visual and techniques from the world of engineering. The language used in a publication like the DAFx book is typical in this world.

The Auger & Flandrin paper instead comes from a world that summarises a spectrogram as a two-dimensional Wigner-Ville distribution filtered with a smoothing window leading to a time-frequency representation of the Cohen’s class. Signals are finite-energy functions over infinite domains, and a spectrogram is a double integral over time and angular frequency. Both time-domain functions and time-frequency representations are continuous, and practical questions about overlap and window length don’t arise. I can dimly remember this world, because my undergraduate degree — who am I kidding, my only degree — started out as pure maths, but I haven’t inhabited it for any of my working life.

So I didn’t really understand the paper, and a programmer has plenty to do, and that is one reason why Sonic Visualiser’s “Peak-Frequency Spectrogram” layer calculates instantaneous frequencies from the phase difference between successive columns, something which I found much easier to understand. (It turns out there are other good reasons one could make this choice, but I didn’t know that.1)

Returning to the paper recently, I learned that Flandrin had written a book on the subject, and I bought a copy hoping it might bridge the conceptual gap. It turned out to be a good experience.

* * *

“Explorations in Time-Frequency Analysis” is a monograph digressing on things the author has found interesting in the past 30 years, which — what luck! — happen to be about time-frequency analysis. It’s short, about 200 pages, and nicely printed. There are lots of diagrams, and although equation-heavy it doesn’t hang about proving things, sending you to the references instead. It begins with a glossary of notation (I like it when books do this) and ends with a 9-page bibliography. The writing is crisp and friendly and the scene is set by the first two chapters, a philosophical outline and a chapter of examples with the lovely title “Small Data Are Beautiful”.

Although the book provides a lot of the background to the paper that defeated me, I still spent a potentially embarrassing amount of thought on things I imagine that anyone properly within the target market finds obvious. An example is what it means for a Gaussian function to be “circular” in time and frequency. The book goes over this in far more detail, but briefly a Gaussian — the bell-shaped normal distribution curve found in probability — has the property that its Fourier transform is also a Gaussian. The “wider” the bell shape in the time domain, the “narrower” in the frequency domain: at some point it must be equal in both, and then if you plot it in a spectrogram-like heat map you will see a circle. When does this happen? It’s shown that it happens for the Gaussian corresponding to a normal distribution of variance 1. But at this point I am worrying about units. What does it mean to be circular? The figures illustrating this lack units in either axis — in fact detail-wise many of the figures are more like sketches — and the little bit of engineer in me is wondering: how can you possibly have a circle if you lack units?

The answer I eventually recalled is that the units in one domain define those in the other. In this case, if the time axis is in seconds then angular frequency is radians per second, and a circle is a distribution whose extent in seconds is the same as that in radians per second. Other units such as samples (in time) or STFT bins (in frequency) have similar correspondences in the other domain. This is a place where going back to basics took significant thought, but I did actually appreciate being expected to think about it.

So a nice rehearsal with some interesting bumps, but for me the thrilling twist arrives in chapter 12, “Spectrogram Geometry 2”. This reframes the spectrogram as a complex plane and the reassignment operator in terms of motion in a potential field proportional to the log-spectrogram. This mathematical leap is also an intuitively visual one, and it’s exciting for me because it is a little like how I pictured the spectrogram, with no meaningful mathematical analysis, when developing a certain feature of the Rubber Band timestretcher.2 This chapter is like seeing the vaguely-realised ground beneath your feet resolve into a larger, recognisable object — the moment when you realise you are standing on the back of a giant Pokémon, if you will.

There is a lot more in this book, and I think it will repay repeated visits. I’m not sure whether you could implement anything directly from it, but you could, say, pick a random page and follow up all the references until you really feel you understand it. I think this would be a rewarding exercise that, for someone like me, would probably take around a month per page.

* * *

On that note, one of the first references given is to a book called “Visible Speech” by Potter, Kopp, and Green, 1947. I looked this up and was so intrigued that I tracked down an ex-library copy. It is a lavish presentation, perhaps with both training and PR elements, of a then-new idea called the “sound spectrograph”, i.e. a spectrogram. The title “Visible Speech”, incidentally, is borrowed with attribution from an earlier (1867) work about phonetic alphabets.

The authors of the 1947 book were writing about work done at Bell Labs to try to make the telephone accessible to the deaf. Their experimental devices used paper tape or phosphor display to show spectrographs of the speech sounds, and users were specially trained to interpret speech from them. Here’s a picture from the book of someone using one.

Operator sitting at a table in front of a large box with a tiny screen on itThe spectrographs were produced by automatically recording the speech to tape and playing the tape repeatedly through a filter of 300Hz bandwidth, whose centre frequency was incremented linearly between passes in 15Hz steps from 0-3500Hz. (They also had a version using 45Hz bandwidth filters, but it was found to be less legible.) The system was of course analogue.

In this image the top spectrograph is the one with 45Hz bandwidth, which is used to point out some interesting features, but the 300Hz bandwidth spectrograph below it is the form used throughout the rest of the book:

It’s striking how clear these spectrographs are, and it makes a useful reminder that we really aren’t always looking for the most precise representation of something — 300Hz bandwidth at speech frequencies is pretty wide! — but instead the most appropriate in some human dimension.

 


1 The Sonic Visualiser peak-frequency spectrogram precisely localises stable frequencies, but for each frequency bin it draws a short horizontal line across the whole duration of the bin at the proper frequency rather than localise the bin to a point in time. A very similar output could have been produced using reassignment, because the frequency calculated from phase difference should be very close to that calculated with reassignment. But a decision to do that would have meant ignoring the other reassignment operator, localisation in time, which gives a single point rather than a horizontal line for each bin. Had I understood the reassignment paper, I would probably have felt compelled to do that part properly. For it to work well, a greater bin overlap and much more sophisticated rendering would have been needed, and the result would have been much slower and possibly less clear for real music. I think.

2 This feature, which I gave the vague name “phase lamination”, was worked out in a hurry after discovering that the “phase locking” technique of Jean Laroche and Mark Dolson which I had used in the very first release of Rubber Band was patented. Phase locking reduced audible phasiness with the nice side-effect of making the phase vocoder faster to compute, but it also lent a robotic tang to the sound which certain listeners found even more unpleasant than the phasiness. The scheme I came up with to replace it was based on picturing a gradient field and making adjustments to bins near a peak or trough in proportion to the distance from it — tuned by ear rather than worked out mathematically. Although it lost the improved speed of phase locking, it usually sounds better. The idea seems reasonably obvious, but I hadn’t seen it described anywhere else and I was delighted to find it.

On macOS, arm64, and universal binaries

A handful of notes I made while building and packaging the new Intel/ARM universal binary of Rubber Band Audio for Mac. I might add to this if other things come up. See also my earlier notes about notarization.

Context

I’m using an ARM Mac – M1 or Apple Silicon – with macOS 11 “Big Sur”, the application is in C++ using Qt, and everything is kicked off from the command line (I don’t use Xcode).

To refer to machine architectures here I will use “x86_64” for 64-bit Intel and “arm64” for 64-bit ARM, since these are the terms the Apple tools use. Elsewhere they may also be referred to as “amd64” for Intel, or “aarch64” for ARM.

Universal binaries

A universal binary is one that contains builds for more than one processor architecture in separate “slices”. They were used in the earlier architecture transitions as well. Some tools (such as the C compiler) can emit universal binaries directly when more than one architecture is requested, but this often isn’t good enough: perhaps it doesn’t fit in with the build system, or the architectures need different compiler flags or libraries. Then the answer is to run the build twice with separate output files and glue the resulting binaries together using the lipo tool which exists for the purpose.

How does the compiler decide which architecture(s) to emit?

The C compiler is a universal binary containing both arm64 and x86_64 “slices”, and it seems to be capable of emitting either arm64 or x86_64 code regardless of which slice of its own binary you invoke.

Perhaps the clearest way to tell it which architecture to emit is to use the -arch flag. With this, cc -arch x86_64 targets x86_64, cc -arch arm64 targets arm64, and cc -arch x86_64 -arch arm64 creates a fat binary containing both architectures.

If you don’t supply an -arch option, then it targets the same architecture as the process that invoked cc. The architecture of the invoking process is not necessarily the native machine architecture, so you can’t assume that a compiler on an ARM Mac will default to arm64 output.

I imagine the mechanism for this is simply that the x86_64 slice of the compiler emits x86_64 unless told otherwise, the arm64 slice emits arm64 likewise, and when you exec the compiler you get whichever slice matches the architecture of the process you exec it from.

There’s also a command called arch that selects a specific slice from a universal binary. So you can run arch -x86_64 make to run the x86_64 binary of make, so that any compiler it forks will default to x86_64. Or you can do things like arch -arm64 cc -arch x86_64 to run the arm64 binary of the compiler but produce an x86_64-only binary.

If you invoke a compiler directly from the shell without any of the above going on, then you get the machine native architecture. I assume this is just because a login shell is itself native.

For my builds I found it helpful to provide a cross-compile file to tell Meson explicitly which options to use for the architecture I wanted to target. That avoids the defaults being just an accident of whichever architecture Meson (or its Python interpreter, or Ninja) happened to be running in, without having to litter the build file with explicit architecture selections. I then scripted the build twice from a separate deployment script, using a different cross file for each, rather than try to have a single Meson file build both at once.

How do I target a particular version of macOS?

Use a flag like -mmacosx-version-min=10.13 at both compile and link time.

For ARM binaries, the oldest version you can target is 11. But you can still build a universal binary that combines this with an Intel binary built for an older version, and the result should run on those earlier versions of macOS as well.

How does a version of macOS decide whether my binary is compatible with it?

I had this question because I had built a universal binary (as above) in which the Intel slice was, I thought, built for macOS 10.13 or newer, but when I brought it to a machine with macOS 10.15 it showed as incompatible in the Finder and could not be opened there.

The answer is that it looks at the relevant architecture slice of the universal binary, and inspects it to find a Mach-O version number. In “older” versions of the macOS SDK this version is written using the LC_VERSION_MIN_MACOSX load command; in “newer” versions (I’m not quite sure when the cutoff is) it is tagged as the minos value of the LC_BUILD_VERSION load command instead. The linker quite logically decides which load command to write based on the value of the version number itself, so if you build -mmacosx-version-min=10.13 you get a binary with LC_VERSION_MIN_MACOSX specified.

You can display a binary’s version information with the vtool tool, and it also appears in the list of information printed by otool -l. In theory you can also change this tag using vtool, but (a) that’s a bad idea, fix it in the build instead and (b) vtool segfaulted when I tried it anyway.

And after all that, in my case the cause turned out to be that I’d failed to supply the -mmacosx-version-min flag at link time.

Why is my program being killed on startup?

It appears that if you build a program for one architecture and then rebuild it for the other arch to the same executable file without deleting the executable in between, sometimes it doesn’t run: it just gets “killed (9)” on startup. I failed to discover why and I failed to reproduce it just now in a test build. I guess if that happens, delete the executable between builds.

* * *

Bonus grumble about Mac trackpad and mouse options

This is not useful content. Please do not attempt to read it

I haven’t used a Mac in such earnest for a while now, so of course I’ve been rediscovering things about macOS that I don’t get on with. One that I find particularly maddening is the way it handles scroll direction for the trackpad and an external mouse.

I switch between the two a lot, and I like to use the “natural scrolling” direction (touchscreen-like, so your fingers are “pushing” the content) with the trackpad, but the opposite with the mouse, which has a scroll wheel or wheel-like scrolling zone whose behaviour I became accustomed to before touchscreen devices started sprouting everywhere.

Fortunately, macOS provides separate touchpad and mouse sections in the system preferences, which contain separate switches for the scroll direction of the trackpad and mouse respectively.

Unfortunately, when you change one of them, the other one changes as well. They aren’t separate options at all – they’re just two different switches in different windows that happen to control the same single internal option! So every time I go from trackpad to mouse or back again, I have to also go to system preferences and switch the scroll direction by hand. That is so stupid.

(Linux and Windows both have separate options that actually work as separate options. Of course they do. Why would they not?)

On macOS “notarization”

I’ve spent altogether too long, at various moments in the past year or so, trying to understand the code-signing, runtime entitlements, and “notarization” requirements that are now involved when packaging software for Apple macOS 10.15 Catalina. (I put notarization in quotes because it doesn’t carry the word’s general meaning; it appears to be an Apple coinage.)

In particular I’ve had difficulty understanding how one should package plugins — shared libraries that are distributed separately from their host application, possibly by different authors, and that are loaded from a general library path on disc rather than from within the host application’s bundle. In my case I’m dealing mostly with Vamp plugins, and the main host for them is Sonic Visualiser, or technically, its Piper helper program.

Catalina requires that applications (outside of the App Store, which I’m not considering here) be notarized before it will allow ordinary users to run them, but a notarized host application can’t always load a non-notarized plugin, the tools typically used to notarize applications don’t work for individual plugin binaries, and documentation relating to plugins has been slow in appearing. Complicating matters is the fact that notarization requirements are suspended for binaries built or downloaded before a certain date, so a host will often load old plugins but refuse new ones. As a non-native Apple developer, I find this situation… trying.

Anyway, this week I realised I had some misconceptions about how notarization actually worked, and once those were cleared up, the rest became obvious. Or obvious-ish.

(Everything here has been covered in other places before now, e.g. Apple docs, KVRaudio, Glyphs plugin documentation. But I want to write this as a conceptual note anyway.)

What notarization does

Here’s what happens when you notarize something:

  • Your computer sends a pack of executable binaries off to Apple’s servers. This may be an application bundle, or just a zip file with binaries in it.
  • Apple’s servers unpack it and pick out all of the binaries (executables, libraries etc) it contains. They scan them individually for malware and for each one (assuming it is clean) they file a cryptographic hash of the binary alongside a flag saying “yeah, nice” in a database somewhere, before returning a success code to you.

Later, when someone else wants to run your application bundle or load your plugin or whatever:

  • The user’s computer calculates locally the same cryptographic hashes of the binaries involved, then contacts Apple’s servers to ask “are these all right?”
  • If the server’s database has a record of the hashes and says they’re clean, the server returns “aye” and everything goes ahead. If not, the user gets an error dialog (blah cannot be opened) and the action is rejected.

Simple. But I found it hard to see what was going on, partly because the documentation mostly refers to processes and tools rather than principles, and partly because there are so many other complicating factors to do with code-signing, identity, authentication, developer IDs, runtimes, and packaging — I’ll survey those in a moment.

For me, though, the moment of truth came when I realised that none of the above has anything to do with the release flow of your software.

The documentation describes it as an ordered process: sign, then notarize, then publish. There are good reasons for that. The main one is that there is an optional step (the “stapler”) that re-signs your package between notarization and publication, so that users’ computers can skip ahead and know that it’s OK without having to contact Apple at all. But the only critical requirement is that Apple’s servers know about your binary before your users ask to run it. You could, in fact, package your software, release the package, then notarize it afterwards, and (assuming it passes the notarization checks) it should work just the same.

Notarizing plugins

A plugin (in this context) is just a single shared library, a single binary file that gets copied into some folder beneath $HOME/Library and loaded by the host application from there.

None of the notarization tools can handle individual binary files directly, so for a while I thought it wasn’t possible to notarize plugins at all. But that is just a limitation of the client tools: if you can get the binary to the server, the server will handle it the same as any other binary. And the client tools do support zip files, so first sign your plugin binary, and then:

$ zip blah.zip myplugin.dylib
adding: myplugin.dylib (deflated 65%)
$ xcrun altool --notarize-app -f blah.zip --primary-bundle-id org.example.myplugin -u 'my@appleid.example.org' -p @keychain:altool
No errors uploading 'blah.zip'.

(See the Apple docs for an explanation of the authentication arguments here.)

[Edit, 2020-02-17: John Daniel chides me for using the “zip” utility, pointing out that Apple recommend against it because of its poor handling of file metadata. Use Apple’s own “ditto” utility to create zip files instead.]

Wait for notarization to complete, using the request API to check progress as appropriate, and when it’s finished,

$ spctl -a -v -t install myplugin.dylib
myplugin.dylib: accepted
source=Notarized Developer ID

The above incantation seems to be how you test the notarization status of a single file: pretend it’s an installer (-t install), because once again the client tool doesn’t support this use case even though the service does. Note, though, that it is the dylib that is notarized, not the zip file, which was just a container for transport.

A Glossary of Everything Else

Signing — guaranteeing the integrity of a binary with your identity in a cryptographically secure way. Carried out by the codesign utility. Everything about the contemporary macOS release process, including notarization, expects that your binaries have been signed first, using your Apple Developer ID key.

Developer ID — a code-signing key that you can obtain from Apple once you are a paid-up member of the Apple Developer Program. That costs a hundred US dollars a year. Without it you can’t package programs for other people to run them, except if they disable security measures on their computers first.

Entitlements — annotations you can make when signing a thing, to indicate which permissions, exemptions, or restrictions you would like it to have. Examples include permissions such as audio recording, exemptions such as the JIT exemption for the hardened runtime, or restrictions such as sandboxing (q.v.).

Hardened runtime — an alternative runtime library that includes restrictions on various security-sensitive things. Enabled not by an entitlement, but by providing the --options runtime flag when signing the binary. Works fine for most programs. The documentation suggests that you can’t send a binary for notarization unless it uses the hardened runtime; that doesn’t appear to be true at the moment, but it seems reasonable to use it anyway. Note that a host that uses the hardened runtime needs to have the com.apple.security.cs.disable-library-validation entitlement set if it is to load third-party plugins. (That case appears to have an inelegant failure mode — the host crashes with an untrappable signal 9 following a kernel EXC_BAD_ACCESS exception.)

Stapler — a mechanism for annotating a bundle or package, after notarization, so that users’ computers can tell it has been notarized without having to contact Apple’s servers to ask. Carried out by xcrun stapler. It doesn’t appear (?) to be possible to staple a single plugin binary, only complex organisms like app bundles.

Quarantine — an extended filesystem attribute attached to files that have been downloaded from the internet. Shown by the ls command with the -l@ flags, can be removed with the xattr command. The restrictions on running packaged code (to do with signing, notarization etc) apply only when it is quarantined.

Sandboxing — a far more intrusive change to the way your application is run, that is disabled by default and that has nothing to do with any of the above except to fill up one’s brain with conceptually similar notions. A sandboxed application is one that is prevented from making any filesystem access except as authorised explicitly by the user through certain standard UI mechanisms. Sandboxing is an entitlement, so it does require that the application is signed, but it’s independent of the hardened runtime or notarization. Sandboxing is required for distribution in the App Store.

MIREX 2019 submissions

For the 2019 edition of MIREX, the Music Information Retrieval Evaluation eXchange, we at the Centre for Digital Music once again submitted a set of Vamp audio analysis plugins for evaluation. This is the seventh year in a row in which we’ve done so, and the fourth in which no completely new plugin has been added to the lineup. Although these methods are therefore getting more and more out-of-date, they do provide a potentially useful baseline for other submissions, a sanity check on the evaluation itself, and some historical colour.

Every year I write up the outcomes in a blog post. Like last year, I’m rather late writing this one. That’s partly because the official results page is still lacking a couple of categories, and says “More results are coming” at the top — I’m beginning to think they might not be, and decided not to wait any longer. (MIREX is volunteer-run, so this is just a remark, not a complaint.)

You can find my writeups of past years here: 2018, 2017, 2016, 2015, 2014, and 2013.

Structural Segmentation

Again no results have been published for this task. Last year I speculated that ours might have been the only entry, and since we submit the same one every year, there’s no point in re-running it if nobody else enters. Pity, this ought to be an interesting category.

Multiple Fundamental Frequency Estimation and Tracking

A rebound! Two years ago there were 14 entries here, last year only three: this year we’re back up to 12, including our two (both consisting of the Silvet plugin, in “live” and standard modes).

This category is famously difficult and I think still invites interesting approaches. An impressive submission from Anton Runov (linked abstract is worth reading) uses an approach based on visual object detection using the spectrogram as an image. Treating a spectrogram as an image is typical enough, but this particular method is new to me (having little exposure to rapid object detection algorithms). The code for this has been published, in C++ under the AGPL — I tried it, it seems like good code, builds cleanly, worked for me. Nice job.

Another interesting set of submissions achieving similar performance is that from Steiner, Jalalvand, and Birkholz (abstract also well worth a read) using “echo state networks”. An ESN appears to be like a recurrent neural network in which only the output weights are trained, input and internal weights remaining random.

Our own submissions are some way behind these methods, but there’s plenty of room for improvement ahead of them as well: I think the best submissions from 2017’s bumper crop still performed a little better than any from this year, and perfection is still well out of reach. (At least among labs that submit things to MIREX. Who knows what Google are up to by now.)

Results pages are here and here.

Audio Onset Detection

No results have (yet?) been published for this task.

Audio Beat Tracking

Another quiet year, with Sebastian Böck’s repeat submission still ahead. Results are here and here.

Audio Tempo Estimation

No results are yet available for this one either.

We made a tiny change to the submission protocol for our plugin this year (as foreshadowed in my post last year, I changed the calculation of the second estimate to be double instead of half of the first, in cases where the first estimate was below an arbitrary 100bpm) and I was curious what difference it made. I’ll update this if I notice any results having been published.

Audio Key Detection

We actually submitted a “new” plugin for this category: a version of the QM Key Detector containing a fix to chromagram initialisation provided by Daniel Schürmann, working in the Mixxx project. We submitted both “old” (same as last year) and “new” (with fix) versions, and saw significantly better results from the fixed version in all five test sets. So thank you, Daniel.

The most interesting submission, from Jiang, Xia, and Carlton, actually seems to be a presentation of a new(ish?) crowd-annotated dataset, used to train a key detection CRNN. It gets good results, with the rather critical caveat that the crowd-sourced training dataset could overlap with the MIREX test data. It’s not clear from the abstract whether the dataset is publicly available — I think it may be accessible via a developer API from the company (Hooktheory) that put it together.

Results are here.

Audio Chord Estimation

Last year was busy, this year isn’t: it sees only one submission besides ours, a straightforward CNN from the MIR Lab at National Taiwan University, whose performance is roughly comparable to our own Chordino. Results here.

 

MIREX 2018 submissions

The 2018 edition of MIREX, the Music Information Retrieval Evaluation eXchange, was the sixth in a row for which we at the Centre for Digital Music submitted a set of Vamp audio analysis plugins for evaluation. For the third year in a row, the set of plugins we submitted was entirely unchanged — these are increasingly antique methods, but we have continued to submit them with the idea that they could provide a useful year-on-year baseline at least. It also gives me a good reason to take a look at the MIREX results and write this little summary post, although I’m a bit late with it this year, having missed the end of 2018 entirely!

For reference, the past five years’ posts can be found at: 2017, 2016, 2015, 2014, and 2013.

Structural Segmentation

No results appear to have been published for this task in 2018; I don’t know why. Last time around, ours was the only entry. Maybe it was the only entry again, and since it was unchanged, there was no point in running the task.

Multiple Fundamental Frequency Estimation and Tracking

After 2017’s feast with 14 entries, 2018 is a famine with only 3, two of which were ours and the third of which (which I can’t link to, because its abstract is missing) was restricted to a single subtask, in which it got reasonable results. Results pages are here and here.

Audio Onset Detection

Almost as many entries as last time, and a new convolutional network from Axel Röbel et al disrupts the tidy sweep of Sebastian Böck’s group at the top of the results table. Our simpler methods are squarely at the bottom this time around. Röbel’s submission has a nice informative abstract which casts more light on the detailed result sets and is well worth a read. Results here.

Audio Beat Tracking

Pure consolidation: all the 2018 entries are repeats from 2017, and all perform identically (with the methods from Böck et al doing better than our plugins). Every year I say that this doesn’t feel like a solved problem, and it still doesn’t — the results we’re seeing here still don’t seem all that close to human performance, but perhaps there are misleading properties to the evaluation. Results here, here, here.

Audio Tempo Estimation

This is a busier category, with a new dataset and a few new submissions. The new dataset is most intriguing: all of the submissions perform better with the new dataset than the older one, except for our QM Tempo Tracker plugin, which performs much, much worse with the new one than the old!

I believe the new dataset is of electronic dance music, so it’s likely that much of it is high tempo, perhaps tripping our plugin into half-tempo octave errors. We could probe this next time by tweaking the submission protocol a little. Submissions are asked to output two tempo estimates, and the results report whether either of them was correct. Because our plugin only produces one estimate, we lazily submit half of that estimate as our second estimate (with a much lower salience score). But if our single estimate was actually half of the “true” value, as is plausible for fast music, we would see better scores from submitting double instead of half as the second estimate.

Results are here and here.

Audio Key Detection

Some novelty here from a pair of template-based methods from the Universitat Autonoma de Barcelona, one attributed to Galin and Castells-Rufas and the other to Castells-Rufas and Galin. Their performance is not a million miles away from our own template-based key estimation plugin.

The strongest results appear to be from a neural network method from Korzeniowski et al at JKU, an updated version of one of last year’s better-performing submissions, an implementation of which can be found in the madmom library.

Results are here.

Audio Chord Estimation

A lively (or daunting) category. A team from Fudan University in Shanghai, whence came two of the previous year’s strongest submissions, is back with another new method, an even stronger set of results, and once again a very readable abstract; and the JKU team have an updated model, just as in the key detection category, which also performs extremely impressively. Meanwhile a separate submission from JKU, due to Stefan Gasser and Franz Strasser, would have been at the very top had it been submitted a year earlier, but is now a little way behind. Convolutional neural networks are involved in all of these.

Our Chordino submission can still be described as creditable. Results can be found here.

 

EasyMercurial v1.4

Today’s second post about a software release will be a bit less detailed than the first.

I’ve just coordinated a new release of EasyMercurial, a cross-platform user interface for version control software that was previously updated in February 2013. It looks a bit like this.

Screenshot from 2018-12-20 18-55-36

EasyMercurial was written with a bit of academic funding from the SoundSoftware project, which ran from 2010 to 2014. The idea was to make something as simple as possible to teach and understand, and we believed that the Mercurial version-control system was the simplest and safest to learn so we should base it on that. The concurrent rise of Github, and resulting dominance of Git as the version control software that everyone must learn, took the wind out of its sails. We eventually tacitly accepted that the v1.3 release made in 2013 was “finished”, and abandoned the proposed feature roadmap. (It’s open source, so if someone else wanted to maintain it, they could.)

EasyMercurial has continued to be a nice piece of software to use, and I use it myself on many projects, so when a recent change in the protocol support at the world’s biggest public Mercurial hosting site, Bitbucket, broke the Windows version of EasyMercurial 1.3, I didn’t mind having an excuse to update it. So now we have version 1.4.

This release doesn’t change a great deal. It updates the code to use the Qt5 toolkit and improves support for hi-dpi displays. I’ve dragged the packaging process up-to-date and re-packaged using current Qt, Mercurial (where bundled), and KDiff3 diff-merge code.

Mercurial usage itself has moved on in most quarters since EasyMercurial was conceived. EasyMercurial assumes that you’ll be using named branches for branching development, but these days using bookmarks for lightweight branching (more akin to Git branching) is more popular — EasyMercurial shows bookmarks but can’t do anything useful with them. Other features of modern Mercurial that could have been very helpful in a simple application like this, such as phases, are not supported at all.

Anyway: EasyMercurial v1.4. Free for Windows, Linux, and macOS. Get it here.