On macOS “notarization”

I’ve spent altogether too long, at various moments in the past year or so, trying to understand the code-signing, runtime entitlements, and “notarization” requirements that are now involved when packaging software for Apple macOS 10.15 Catalina. (I put notarization in quotes because it doesn’t carry the word’s general meaning; it appears to be an Apple coinage.)

In particular I’ve had difficulty understanding how one should package plugins — shared libraries that are distributed separately from their host application, possibly by different authors, and that are loaded from a general library path on disc rather than from within the host application’s bundle. In my case I’m dealing mostly with Vamp plugins, and the main host for them is Sonic Visualiser, or technically, its Piper helper program.

Catalina requires that applications (outside of the App Store, which I’m not considering here) be notarized before it will allow ordinary users to run them, but a notarized host application can’t always load a non-notarized plugin, the tools typically used to notarize applications don’t work for individual plugin binaries, and documentation relating to plugins has been slow in appearing. Complicating matters is the fact that notarization requirements are suspended for binaries built or downloaded before a certain date, so a host will often load old plugins but refuse new ones. As a non-native Apple developer, I find this situation… trying.

Anyway, this week I realised I had some misconceptions about how notarization actually worked, and once those were cleared up, the rest became obvious. Or obvious-ish.

(Everything here has been covered in other places before now, e.g. Apple docs, KVRaudio, Glyphs plugin documentation. But I want to write this as a conceptual note anyway.)

What notarization does

Here’s what happens when you notarize something:

  • Your computer sends a pack of executable binaries off to Apple’s servers. This may be an application bundle, or just a zip file with binaries in it.
  • Apple’s servers unpack it and pick out all of the binaries (executables, libraries etc) it contains. They scan them individually for malware and for each one (assuming it is clean) they file a cryptographic hash of the binary alongside a flag saying “yeah, nice” in a database somewhere, before returning a success code to you.

Later, when someone else wants to run your application bundle or load your plugin or whatever:

  • The user’s computer calculates locally the same cryptographic hashes of the binaries involved, then contacts Apple’s servers to ask “are these all right?”
  • If the server’s database has a record of the hashes and says they’re clean, the server returns “aye” and everything goes ahead. If not, the user gets an error dialog (blah cannot be opened) and the action is rejected.

Simple. But I found it hard to see what was going on, partly because the documentation mostly refers to processes and tools rather than principles, and partly because there are so many other complicating factors to do with code-signing, identity, authentication, developer IDs, runtimes, and packaging — I’ll survey those in a moment.

For me, though, the moment of truth came when I realised that none of the above has anything to do with the release flow of your software.

The documentation describes it as an ordered process: sign, then notarize, then publish. There are good reasons for that. The main one is that there is an optional step (the “stapler”) that re-signs your package between notarization and publication, so that users’ computers can skip ahead and know that it’s OK without having to contact Apple at all. But the only critical requirement is that Apple’s servers know about your binary before your users ask to run it. You could, in fact, package your software, release the package, then notarize it afterwards, and (assuming it passes the notarization checks) it should work just the same.

Notarizing plugins

A plugin (in this context) is just a single shared library, a single binary file that gets copied into some folder beneath $HOME/Library and loaded by the host application from there.

None of the notarization tools can handle individual binary files directly, so for a while I thought it wasn’t possible to notarize plugins at all. But that is just a limitation of the client tools: if you can get the binary to the server, the server will handle it the same as any other binary. And the client tools do support zip files, so first sign your plugin binary, and then:

$ zip blah.zip myplugin.dylib
adding: myplugin.dylib (deflated 65%)
$ xcrun altool --notarize-app -f blah.zip --primary-bundle-id org.example.myplugin -u 'my@appleid.example.org' -p @keychain:altool
No errors uploading 'blah.zip'.

(See the Apple docs for an explanation of the authentication arguments here.)

[Edit, 2020-02-17: John Daniel chides me for using the “zip” utility, pointing out that Apple recommend against it because of its poor handling of file metadata. Use Apple’s own “ditto” utility to create zip files instead.]

Wait for notarization to complete, using the request API to check progress as appropriate, and when it’s finished,

$ spctl -a -v -t install myplugin.dylib
myplugin.dylib: accepted
source=Notarized Developer ID

The above incantation seems to be how you test the notarization status of a single file: pretend it’s an installer (-t install), because once again the client tool doesn’t support this use case even though the service does. Note, though, that it is the dylib that is notarized, not the zip file, which was just a container for transport.

A Glossary of Everything Else

Signing — guaranteeing the integrity of a binary with your identity in a cryptographically secure way. Carried out by the codesign utility. Everything about the contemporary macOS release process, including notarization, expects that your binaries have been signed first, using your Apple Developer ID key.

Developer ID — a code-signing key that you can obtain from Apple once you are a paid-up member of the Apple Developer Program. That costs a hundred US dollars a year. Without it you can’t package programs for other people to run them, except if they disable security measures on their computers first.

Entitlements — annotations you can make when signing a thing, to indicate which permissions, exemptions, or restrictions you would like it to have. Examples include permissions such as audio recording, exemptions such as the JIT exemption for the hardened runtime, or restrictions such as sandboxing (q.v.).

Hardened runtime — an alternative runtime library that includes restrictions on various security-sensitive things. Enabled not by an entitlement, but by providing the --options runtime flag when signing the binary. Works fine for most programs. The documentation suggests that you can’t send a binary for notarization unless it uses the hardened runtime; that doesn’t appear to be true at the moment, but it seems reasonable to use it anyway. Note that a host that uses the hardened runtime needs to have the com.apple.security.cs.disable-library-validation entitlement set if it is to load third-party plugins. (That case appears to have an inelegant failure mode — the host crashes with an untrappable signal 9 following a kernel EXC_BAD_ACCESS exception.)

Stapler — a mechanism for annotating a bundle or package, after notarization, so that users’ computers can tell it has been notarized without having to contact Apple’s servers to ask. Carried out by xcrun stapler. It doesn’t appear (?) to be possible to staple a single plugin binary, only complex organisms like app bundles.

Quarantine — an extended filesystem attribute attached to files that have been downloaded from the internet. Shown by the ls command with the -l@ flags, can be removed with the xattr command. The restrictions on running packaged code (to do with signing, notarization etc) apply only when it is quarantined.

Sandboxing — a far more intrusive change to the way your application is run, that is disabled by default and that has nothing to do with any of the above except to fill up one’s brain with conceptually similar notions. A sandboxed application is one that is prevented from making any filesystem access except as authorised explicitly by the user through certain standard UI mechanisms. Sandboxing is an entitlement, so it does require that the application is signed, but it’s independent of the hardened runtime or notarization. Sandboxing is required for distribution in the App Store.

MIREX 2019 submissions

For the 2019 edition of MIREX, the Music Information Retrieval Evaluation eXchange, we at the Centre for Digital Music once again submitted a set of Vamp audio analysis plugins for evaluation. This is the seventh year in a row in which we’ve done so, and the fourth in which no completely new plugin has been added to the lineup. Although these methods are therefore getting more and more out-of-date, they do provide a potentially useful baseline for other submissions, a sanity check on the evaluation itself, and some historical colour.

Every year I write up the outcomes in a blog post. Like last year, I’m rather late writing this one. That’s partly because the official results page is still lacking a couple of categories, and says “More results are coming” at the top — I’m beginning to think they might not be, and decided not to wait any longer. (MIREX is volunteer-run, so this is just a remark, not a complaint.)

You can find my writeups of past years here: 2018, 2017, 2016, 2015, 2014, and 2013.

Structural Segmentation

Again no results have been published for this task. Last year I speculated that ours might have been the only entry, and since we submit the same one every year, there’s no point in re-running it if nobody else enters. Pity, this ought to be an interesting category.

Multiple Fundamental Frequency Estimation and Tracking

A rebound! Two years ago there were 14 entries here, last year only three: this year we’re back up to 12, including our two (both consisting of the Silvet plugin, in “live” and standard modes).

This category is famously difficult and I think still invites interesting approaches. An impressive submission from Anton Runov (linked abstract is worth reading) uses an approach based on visual object detection using the spectrogram as an image. Treating a spectrogram as an image is typical enough, but this particular method is new to me (having little exposure to rapid object detection algorithms). The code for this has been published, in C++ under the AGPL — I tried it, it seems like good code, builds cleanly, worked for me. Nice job.

Another interesting set of submissions achieving similar performance is that from Steiner, Jalalvand, and Birkholz (abstract also well worth a read) using “echo state networks”. An ESN appears to be like a recurrent neural network in which only the output weights are trained, input and internal weights remaining random.

Our own submissions are some way behind these methods, but there’s plenty of room for improvement ahead of them as well: I think the best submissions from 2017’s bumper crop still performed a little better than any from this year, and perfection is still well out of reach. (At least among labs that submit things to MIREX. Who knows what Google are up to by now.)

Results pages are here and here.

Audio Onset Detection

No results have (yet?) been published for this task.

Audio Beat Tracking

Another quiet year, with Sebastian Böck’s repeat submission still ahead. Results are here and here.

Audio Tempo Estimation

No results are yet available for this one either.

We made a tiny change to the submission protocol for our plugin this year (as foreshadowed in my post last year, I changed the calculation of the second estimate to be double instead of half of the first, in cases where the first estimate was below an arbitrary 100bpm) and I was curious what difference it made. I’ll update this if I notice any results having been published.

Audio Key Detection

We actually submitted a “new” plugin for this category: a version of the QM Key Detector containing a fix to chromagram initialisation provided by Daniel Schürmann, working in the Mixxx project. We submitted both “old” (same as last year) and “new” (with fix) versions, and saw significantly better results from the fixed version in all five test sets. So thank you, Daniel.

The most interesting submission, from Jiang, Xia, and Carlton, actually seems to be a presentation of a new(ish?) crowd-annotated dataset, used to train a key detection CRNN. It gets good results, with the rather critical caveat that the crowd-sourced training dataset could overlap with the MIREX test data. It’s not clear from the abstract whether the dataset is publicly available — I think it may be accessible via a developer API from the company (Hooktheory) that put it together.

Results are here.

Audio Chord Estimation

Last year was busy, this year isn’t: it sees only one submission besides ours, a straightforward CNN from the MIR Lab at National Taiwan University, whose performance is roughly comparable to our own Chordino. Results here.

 

MIREX 2018 submissions

The 2018 edition of MIREX, the Music Information Retrieval Evaluation eXchange, was the sixth in a row for which we at the Centre for Digital Music submitted a set of Vamp audio analysis plugins for evaluation. For the third year in a row, the set of plugins we submitted was entirely unchanged — these are increasingly antique methods, but we have continued to submit them with the idea that they could provide a useful year-on-year baseline at least. It also gives me a good reason to take a look at the MIREX results and write this little summary post, although I’m a bit late with it this year, having missed the end of 2018 entirely!

For reference, the past five years’ posts can be found at: 2017, 2016, 2015, 2014, and 2013.

Structural Segmentation

No results appear to have been published for this task in 2018; I don’t know why. Last time around, ours was the only entry. Maybe it was the only entry again, and since it was unchanged, there was no point in running the task.

Multiple Fundamental Frequency Estimation and Tracking

After 2017’s feast with 14 entries, 2018 is a famine with only 3, two of which were ours and the third of which (which I can’t link to, because its abstract is missing) was restricted to a single subtask, in which it got reasonable results. Results pages are here and here.

Audio Onset Detection

Almost as many entries as last time, and a new convolutional network from Axel Röbel et al disrupts the tidy sweep of Sebastian Böck’s group at the top of the results table. Our simpler methods are squarely at the bottom this time around. Röbel’s submission has a nice informative abstract which casts more light on the detailed result sets and is well worth a read. Results here.

Audio Beat Tracking

Pure consolidation: all the 2018 entries are repeats from 2017, and all perform identically (with the methods from Böck et al doing better than our plugins). Every year I say that this doesn’t feel like a solved problem, and it still doesn’t — the results we’re seeing here still don’t seem all that close to human performance, but perhaps there are misleading properties to the evaluation. Results here, here, here.

Audio Tempo Estimation

This is a busier category, with a new dataset and a few new submissions. The new dataset is most intriguing: all of the submissions perform better with the new dataset than the older one, except for our QM Tempo Tracker plugin, which performs much, much worse with the new one than the old!

I believe the new dataset is of electronic dance music, so it’s likely that much of it is high tempo, perhaps tripping our plugin into half-tempo octave errors. We could probe this next time by tweaking the submission protocol a little. Submissions are asked to output two tempo estimates, and the results report whether either of them was correct. Because our plugin only produces one estimate, we lazily submit half of that estimate as our second estimate (with a much lower salience score). But if our single estimate was actually half of the “true” value, as is plausible for fast music, we would see better scores from submitting double instead of half as the second estimate.

Results are here and here.

Audio Key Detection

Some novelty here from a pair of template-based methods from the Universitat Autonoma de Barcelona, one attributed to Galin and Castells-Rufas and the other to Castells-Rufas and Galin. Their performance is not a million miles away from our own template-based key estimation plugin.

The strongest results appear to be from a neural network method from Korzeniowski et al at JKU, an updated version of one of last year’s better-performing submissions, an implementation of which can be found in the madmom library.

Results are here.

Audio Chord Estimation

A lively (or daunting) category. A team from Fudan University in Shanghai, whence came two of the previous year’s strongest submissions, is back with another new method, an even stronger set of results, and once again a very readable abstract; and the JKU team have an updated model, just as in the key detection category, which also performs extremely impressively. Meanwhile a separate submission from JKU, due to Stefan Gasser and Franz Strasser, would have been at the very top had it been submitted a year earlier, but is now a little way behind. Convolutional neural networks are involved in all of these.

Our Chordino submission can still be described as creditable. Results can be found here.

 

EasyMercurial v1.4

Today’s second post about a software release will be a bit less detailed than the first.

I’ve just coordinated a new release of EasyMercurial, a cross-platform user interface for version control software that was previously updated in February 2013. It looks a bit like this.

Screenshot from 2018-12-20 18-55-36

EasyMercurial was written with a bit of academic funding from the SoundSoftware project, which ran from 2010 to 2014. The idea was to make something as simple as possible to teach and understand, and we believed that the Mercurial version-control system was the simplest and safest to learn so we should base it on that. The concurrent rise of Github, and resulting dominance of Git as the version control software that everyone must learn, took the wind out of its sails. We eventually tacitly accepted that the v1.3 release made in 2013 was “finished”, and abandoned the proposed feature roadmap. (It’s open source, so if someone else wanted to maintain it, they could.)

EasyMercurial has continued to be a nice piece of software to use, and I use it myself on many projects, so when a recent change in the protocol support at the world’s biggest public Mercurial hosting site, Bitbucket, broke the Windows version of EasyMercurial 1.3, I didn’t mind having an excuse to update it. So now we have version 1.4.

This release doesn’t change a great deal. It updates the code to use the Qt5 toolkit and improves support for hi-dpi displays. I’ve dragged the packaging process up-to-date and re-packaged using current Qt, Mercurial (where bundled), and KDiff3 diff-merge code.

Mercurial usage itself has moved on in most quarters since EasyMercurial was conceived. EasyMercurial assumes that you’ll be using named branches for branching development, but these days using bookmarks for lightweight branching (more akin to Git branching) is more popular — EasyMercurial shows bookmarks but can’t do anything useful with them. Other features of modern Mercurial that could have been very helpful in a simple application like this, such as phases, are not supported at all.

Anyway: EasyMercurial v1.4. Free for Windows, Linux, and macOS. Get it here.

Sonic Visualiser v3.2

Another release of Sonic Visualiser is out. This one, version 3.2, has some significant visible changes, in contrast to version 3.1 which was more behind-the-scenes.

The theme of this release could be said to be “oversampling” or “interpolation”.

Waveform interpolation

Ever since the Early Days, the waveform layer in Sonic Visualiser has had one major limitation: you can’t zoom in any closer (horizontally) than one pixel per sample. Here’s what that looks like — this is the closest zoom available in v3.1 or earlier:

Screenshot from 2018-12-20 09-23-39

This isn’t such a big deal with a lower-resolution display, since you don’t usually want to interact with individual samples anyway (you can’t edit waveforms in Sonic Visualiser). It’s a bigger problem with hi-dpi and “retina” displays, on which individual pixels can’t always be made out.

Why this limitation? It allowed an integer ratio between samples and pixels to be used internally, which made it a bit easier to avoid rounding errors. It also sidestepped any awkward decisions about how, or whether, to show a signal in between the sample points.

(In a waveform editor like Audacity it is necessary to be able to interact with individual samples, so some decision has to be made about what to show between the sample points when zoomed in closely. Older versions of Audacity connected the sample points with straight lines, a decision which attracted criticism as misrepresenting how sampling works. More recent versions show sample points on separate stems without connecting lines.)

In Sonic Visualiser v3.2 it’s now possible to zoom closer than one pixel per sample, and we show the signal oversampled between the sample points using sinc interpolation. Here’s an example from the documentation, showing the case where the sample values are all zero but for a single sample with value 1:

The sample points are the little square dots, and the wiggly line passing through them is the interpolated signal. (The horizontal line is just the x axis.) The principle here is that, although there are infinitely many ways to join the dots, there is only one that is “smooth” enough to be expressible as a sum of sinusoids of no higher frequency than half the sampling rate — which is the prerequisite for reconstructing a signal sampled without aliasing. That’s what is shown here.

The above artificial example has a nice shape, but in most cases with real music the interpolated signal will not be very different from just joining the dots with a marker. It’s mostly relevant in extreme cases. Let’s replace the single sample of value 1 above with a pair of consecutive samples of value 0.5:

Screenshot from 2018-12-19 20-31-48

Now we see that the interpolated signal has a peak between the two samples with a greater level than either sample. The peak sample value is not a safe indication of the peak level of the analogue signal.

Incidentally, another new feature in v3.2 is the ability to import audio data from a CSV or similar data file rather than only from standard audio formats. That made it much easier to set up the examples above.

Spectrogram and spectrum oversampling

The other oversampling-related feature added in v3.2 appears in the spectrogram and spectrum layers. These layers now have an option to set an oversampling level, from the default “1x” up to “8x”.

This option increases the length of the short-time Fourier transform used to generate the spectrum, by padding the time-domain signal window with additional zero-valued samples before calculating the transform. This results in an oversampled frequency-domain output, with a higher visual resolution than would have been obtained from the original, un-zero-padded sample window. The result is a smoother spectrum in which the locations of peaks can be seen with a little more accuracy, somewhat like the waveform example above.

This is nice in principle, but it can be deceiving.

In the case of waveform oversampling, there can be only one “matching” signal, given the sample points we have and the constraints of the sampling theorem. So we can oversample as much as we like, and all that happens is that we approximate the analogue signal more closely.

But in a short-time spectrum or spectrogram, we only use a small window of the original signal for each spectrum or spectrogram-column calculation. There is a tradeoff in the choice of window size (a longer window gives better frequency discrimination at the expense of time discrimination) but the window always exposes only a small part of the original signal, unless that signal is extremely short. Zero-padding and using a longer transform oversamples the output to make it smoother, but it obviously uses no extra information to do it — it still has no access to samples that were not in the original window. A higher-resolution output without any more information at the input can appear more effective at discriminating between frequencies than it really is.

Here’s an example. The signal consists of a mixture of two sine waves one tone apart (440 and 493.9 Hz). A log-log spectrum (i.e. log frequency on x axis, log magnitude on y) with an 8192-point short-time Fourier transform looks like this:

Screenshot from 2018-12-19 21-25-02

A log-log spectrum with a 1024-point STFT looks like this1:

Screenshot from 2018-12-19 21-25-26

The 1024-sample input isn’t long enough to discriminate between the two frequencies — they’re close enough that it’s necessary to “hear” a longer fragment than this in order to determine that there are two frequencies at all2.

Add 8x oversampling to that last example, and it looks like this:

Screenshot from 2018-12-19 21-26-04

This is very smooth and looks super detailed, and indeed we can use it to read the peak value with more accuracy — but the peak is deceptive, because it is still merging the two frequency components. In fact most of the detail here consists of the frequency response of the 1024-point windowing function used to shape the time-domain window (it’s a Hann window in this case).

Also, in the case of peak frequencies, Sonic Visualiser might already provide a way to get the same information more accurately — its peak-frequency identification in both spectrum and spectrogram views uses phase unwrapping instead of spectrum interpolation to estimate the frequencies of stable harmonics, which gives very good results if the sound is indeed harmonic and stable.

Finally, there’s a limitation in Sonic Visualiser’s implementation of this oversampling feature that eliminates one potential use for it, which is to choose the length of the Fourier transform in order to align bin frequencies with known or expected frequency components of the signal. We can’t generally do that here, since Sonic Visualiser still only supports a few fixed multiples of a power-of-two window size.

In conclusion: interesting if you know what you’re looking at, but use with caution.


1 Notice that we are connecting sample points in the spectrum with straight lines here — the same thing I characterised as a bad idea in the discussion of waveforms above. I think this is more forgivable here because the short-time transform output is not a sampled version of an original signal spectrum, but it’s still a bit icky

2 This is not exactly true, but it works for this example

Rubber Band Library v1.8.2

I have finally managed to get together all the bits that go into a release of the Rubber Band library, and so have just released version 1.8.2.

The Rubber Band library is a software library for time-stretching and pitch-shifting of audio, particularly music audio. That means that it takes a recording of music and adjusts it so that it plays at a different speed or at a different pitch, and if desired, it can do that by changing the speed and pitch “live” as the music plays. This is impossible to do perfectly: essentially you are asking software to recreate what the music would have sounded like if the same musicians had played it faster, slower, or in a different key, and there just isn’t enough information in a recording to do that. It changes the sound and is absolutely not a reversible transformation. But Rubber Band does a pretty nice job. For anyone interested, I wrote a page (here) with a technical summary of how it does it.

I originally wrote this library between 2005 and 2007, with a v1.0 release at the end of 2007. My aim was to provide a useful tool for open source GPL-licensed audio applications on Linux, like Ardour or Rosegarden, with a commercial license as an afterthought. As so often happens, I seriously underestimated the work involved in getting the library from “working” (a few weeks of evening and weekend coding) to ready to use in production applications (two years).

It has now been almost six years since the last Rubber Band release, and since this one is just a bugfix release, we can say the library is pretty much finished. I would love to have the time and mental capacity for a version 2: there are many many things I would now do differently. (Sadly, the first thing is that I wouldn’t rely on my own ears for basic testing any more—in the intervening decade my hearing has deteriorated a lot and it amazes me to think that I used to accept it as somehow authoritative.)

In spite of all the things I would change, I think this latest release of version 1 is pretty good. It’s not the state-of-the-art, but it is very effective, and is in use right now in professional audio applications across the globe. I hope it can be useful to you somehow.