Programs for Music

Performance improvements in Rubber Band Library

Today marks version 3.1 of the audio time-stretching and pitch-shifting library Rubber Band. This release focuses primarily on performance improvements.

In version 3.0 we introduced a totally new, higher-quality processing engine, which I’ll refer to as the R3 engine. The older one is still included, and I’ll call that R2.

Although the output of R3 typically sounds much better than R2, it uses a lot more CPU power to run. Measuring sustained throughput in frames-per-second for common fixed stretch factors, we find R2 to be typically about three times as fast as R3. Both are eminently usable in real-time on hardware from the last decade, but the headroom available for R2 can make a big difference.

It would be nice to do better, but the R3 code was already quite heavily optimised before release — it is simply a fairly CPU-intensive method. Still, as it turns out, there are a few things we can do.

Measuring performance

Sustained throughput is not the only measure. Rubber Band is often used in real-time situations where the worst-case time per processed block is what matters most.

To measure this, I set up a test case that simulates a typical sound processing callback, passing a music recording through a stretcher and emitting a fixed 512 sample frames from each processing cycle, while varying the time and pitch ratios and measuring how long each cycle takes to return. The stretcher is initialised with typical parameters for this activity (in code terms, OptionProcessRealTime | OptionPitchHighConsistency | OptionFormantPreserved) and it is primed with an initial pad before entering the cycle loop, as otherwise the first call would dominate results.

The results for R2 and R3, as of the 3.0 release, look like this:This is a graph of processing cycle count (x-axis) against time taken per 512-frame cycle (y-axis).  The y-axis is linear in time with zero at the bottom, so lower is better. No units are shown because they are totally system-dependent — this is purely a comparative visualisation, we’re only interested in the relative heights. Obviously the relative heights may also vary from system to system, so this is still quite tentative.

The test runs in four consecutive phases with different pitch and time modifications, and so the x-axis is divided into four (uneven) quadrants: raising pitch, lowering pitch, slowing down, and speeding up.

In the first quadrant, the pitch rises smoothly and then falls again, reaching a peak at two octaves up; in the second it falls smoothly and then rises again, reaching a trough at two octaves down; in the third the pitch is unchanged but the tempo slows to just under a third of the original speed and then returns to normal; and in the fourth quadrant the tempo gradually speeds up to 8x the original speed and then returns to normal.

The plots for R2 (orange) and R3 (purple) reveal significant differences in behaviour:

  • R2 is usually faster, sometimes much faster, especially for modest stretch factors.
  • R3’s long internal processing buffers and step size mean that it hops between “modes” depending on how many processing increments (1, 2, 3, 4 or occasionally 0) are required for each output block.
  • R2 has less widely-spaced distinct “modes”, because it uses smaller increments. It’s still faster because it does so much less work for each increment.
  • R2’s processing time becomes very variable, and relatively high, when speeding up the audio by a large factor (above about 3x). This may be because it continues to perform transient detection and adjust its input and output steps accordingly, and at those rates our test file contains a lot of transients. R3 is very predictable in this area by comparison.
  • Both stretchers use increasingly more CPU when pitch-shifting further upward, but not when shifting down.

The last point happens because we are using OptionPitchHighConsistency. This option ensures that the resampler used for the pitch-shift part of the operation is always engaged, so that there are no discontinuities when changing ratio (particularly to or from the 1x ratio). We’ll come back that later.

A Draft Mode for Finer Mode

The main novelty in version 3.1 is an option to deactivate R3’s multi-window processing system, dropping down to a single shorter processing window and potentially running much faster, while retaining its more advanced signal analysis and some of its output characteristics.

This is enabled using the OptionWindowShort flag when constructing a stretcher, or the --window-short argument to the command-line tool. It’s an option that already existed in R2, and conceptually it does something similar there, but the effect on performance is much greater with R3.

Here’s a plot comparing R2, R3, and the new R3 single window option (“R3short”):

With this new option we get both performance comparable to R2 and the more predictable behaviour at high tempo ratios found in R3. Splendid.

What does it sound like? Not as good as R3; it loses some percussive clarity and quite a lot of low-end stability. For some material, particularly acoustic instruments and vocals without too much bass content, it can sounds markedly better than R2. It’s not a universal substitute, but it’s really not bad given the CPU budget.

Here are some ten-second audio clips to give you an idea. Both are stretched to 140% of their original duration using R2, R3 with short window, and full R3. Neither of these is trivial to handle, though the second is far harder than the first.

Resamplers and FFTs

Rubber Band makes heavy use of audio resampler and fast Fourier transform (FFT) implementations. Originally it used external libraries for both, but in June 2021 a built-in FFT was added and in October 2021 a built-in resampler appeared as well.

These are both slower than the best external libraries, but they make Rubber Band simpler to build and integrate. And the built-in resampler is also designed to reduce clicky artifacts and maintain tempo integrity on ratio changes, at some further expense in performance, so if you do have the headroom it is worth defaulting to.

Here’s a performance comparison of the built-in resampler with libsamplerate in the “draft” short-window R3 mode described above.

Clearly libsamplerate is both faster and more predictable. It’s faster even when changing only the tempo, which doesn’t involve resampling, because of our previously-mentioned use of OptionPitchHighConsistency which keeps the resampler running at all ratios.

(Incidentally all of the other performance plots in this post were made using libsamplerate, unless otherwise specified. Its smoother performance profile makes other comparisons easier.)

I’ve mentioned OptionPitchHighConsistency a couple of times now. If we use OptionPitchHighSpeed instead, we get quite different behaviour:

The relation between the amount of pitch shift and the CPU effort is totally gone. All pitch shifts are roughly equal, and the time-stretching quadrants are faster. The tradeoff, unfortunately, is that there are now audible discontinuities every time the pitch ratio reaches or crosses 1.0.

Traditionally the alternative to libsamplerate in Rubber Band has been a resampler implementation cribbed from the Speex audio codec and provided with Rubber Band as a compile-time option. This resampler was a bit unsatisfactory for various reasons, but a much improved version of it has for a while been available in a library called speexdsp.

As of v3.1 Rubber Band now includes support for speexdsp as well, and it works well — audio quality seems good, and so is performance on my test hardware, shown here against libsamplerate:

I don’t think this is well-exercised enough to be a standard recommendation yet, but it’s promising.

The built-in FFT fares better than the resampler, but in addition to the previously-supported external libraries (FFTW, IPP, and Apple’s vDSP) this release also adds support for FFTs from SLEEF, a library which looks as if it should be competitive on platforms that have been short on good options in the past.

To summarise:

  • The R3 time-stretcher and pitch-shifter engine introduced in Rubber Band 3.0 sounds great, but is relatively CPU-intensive compared to the older R2
  • The new 3.1 release introduces a draft mode (“short-window” or single window mode) for the R3 engine, that retains some of its good qualities while running much faster and with more predictable CPU usage
  • You may be able to speed up your implementation by using an external resampler or FFT library, and the 3.1 release adds support for a couple of new ones with good performance.

See the Rubber Band Library site for more information about the library.

Thank you for your time. Perhaps we can help you make more of it.

* * *

Many thanks to Davy Wentzler for valuable feedback on the 3.1 development process.

 

Code · Mighty Convolvuli · Work

On macOS, arm64, and universal binaries

A handful of notes I made while building and packaging the new Intel/ARM universal binary of Rubber Band Audio for Mac. I might add to this if other things come up. See also my earlier notes about notarization.

Context

I’m using an ARM Mac – M1 or Apple Silicon – with macOS 11 “Big Sur”, the application is in C++ using Qt, and everything is kicked off from the command line (I don’t use Xcode).

To refer to machine architectures here I will use “x86_64” for 64-bit Intel and “arm64” for 64-bit ARM, since these are the terms the Apple tools use. Elsewhere they may also be referred to as “amd64” for Intel, or “aarch64” for ARM.

Universal binaries

A universal binary is one that contains builds for more than one processor architecture in separate “slices”. They were used in the earlier architecture transitions as well. Some tools (such as the C compiler) can emit universal binaries directly when more than one architecture is requested, but this often isn’t good enough: perhaps it doesn’t fit in with the build system, or the architectures need different compiler flags or libraries. Then the answer is to run the build twice with separate output files and glue the resulting binaries together using the lipo tool which exists for the purpose.

How does the compiler decide which architecture(s) to emit?

The C compiler is a universal binary containing both arm64 and x86_64 “slices”, and it seems to be capable of emitting either arm64 or x86_64 code regardless of which slice of its own binary you invoke.

Perhaps the clearest way to tell it which architecture to emit is to use the -arch flag. With this, cc -arch x86_64 targets x86_64, cc -arch arm64 targets arm64, and cc -arch x86_64 -arch arm64 creates a fat binary containing both architectures.

If you don’t supply an -arch option, then it targets the same architecture as the process that invoked cc. The architecture of the invoking process is not necessarily the native machine architecture, so you can’t assume that a compiler on an ARM Mac will default to arm64 output.

I imagine the mechanism for this is simply that the x86_64 slice of the compiler emits x86_64 unless told otherwise, the arm64 slice emits arm64 likewise, and when you exec the compiler you get whichever slice matches the architecture of the process you exec it from.

There’s also a command called arch that selects a specific slice from a universal binary. So you can run arch -x86_64 make to run the x86_64 binary of make, so that any compiler it forks will default to x86_64. Or you can do things like arch -arm64 cc -arch x86_64 to run the arm64 binary of the compiler but produce an x86_64-only binary.

If you invoke a compiler directly from the shell without any of the above going on, then you get the machine native architecture. I assume this is just because a login shell is itself native.

For my builds I found it helpful to provide a cross-compile file to tell Meson explicitly which options to use for the architecture I wanted to target. That avoids the defaults being just an accident of whichever architecture Meson (or its Python interpreter, or Ninja) happened to be running in, without having to litter the build file with explicit architecture selections. I then scripted the build twice from a separate deployment script, using a different cross file for each, rather than try to have a single Meson file build both at once.

How do I target a particular version of macOS?

Use a flag like -mmacosx-version-min=10.13 at both compile and link time.

For ARM binaries, the oldest version you can target is 11. But you can still build a universal binary that combines this with an Intel binary built for an older version, and the result should run on those earlier versions of macOS as well.

How does a version of macOS decide whether my binary is compatible with it?

I had this question because I had built a universal binary (as above) in which the Intel slice was, I thought, built for macOS 10.13 or newer, but when I brought it to a machine with macOS 10.15 it showed as incompatible in the Finder and could not be opened there.

The answer is that it looks at the relevant architecture slice of the universal binary, and inspects it to find a Mach-O version number. In “older” versions of the macOS SDK this version is written using the LC_VERSION_MIN_MACOSX load command; in “newer” versions (I’m not quite sure when the cutoff is) it is tagged as the minos value of the LC_BUILD_VERSION load command instead. The linker quite logically decides which load command to write based on the value of the version number itself, so if you build -mmacosx-version-min=10.13 you get a binary with LC_VERSION_MIN_MACOSX specified.

You can display a binary’s version information with the vtool tool, and it also appears in the list of information printed by otool -l. In theory you can also change this tag using vtool, but (a) that’s a bad idea, fix it in the build instead and (b) vtool segfaulted when I tried it anyway.

And after all that, in my case the cause turned out to be that I’d failed to supply the -mmacosx-version-min flag at link time.

Why is my program being killed on startup?

It appears that if you build a program for one architecture and then rebuild it for the other arch to the same executable file without deleting the executable in between, sometimes it doesn’t run: it just gets “killed (9)” on startup. I failed to discover why and I failed to reproduce it just now in a test build. I guess if that happens, delete the executable between builds.

* * *

Bonus grumble about Mac trackpad and mouse options

This is not useful content. Please do not attempt to read it

I haven’t used a Mac in such earnest for a while now, so of course I’ve been rediscovering things about macOS that I don’t get on with. One that I find particularly maddening is the way it handles scroll direction for the trackpad and an external mouse.

I switch between the two a lot, and I like to use the “natural scrolling” direction (touchscreen-like, so your fingers are “pushing” the content) with the trackpad, but the opposite with the mouse, which has a scroll wheel or wheel-like scrolling zone whose behaviour I became accustomed to before touchscreen devices started sprouting everywhere.

Fortunately, macOS provides separate touchpad and mouse sections in the system preferences, which contain separate switches for the scroll direction of the trackpad and mouse respectively.

Unfortunately, when you change one of them, the other one changes as well. They aren’t separate options at all – they’re just two different switches in different windows that happen to control the same single internal option! So every time I go from trackpad to mouse or back again, I have to also go to system preferences and switch the scroll direction by hand. That is so stupid.

(Linux and Windows both have separate options that actually work as separate options. Of course they do. Why would they not?)

Code · Mighty Convolvuli · Security And That · Work

On macOS “notarization”

I’ve spent altogether too long, at various moments in the past year or so, trying to understand the code-signing, runtime entitlements, and “notarization” requirements that are now involved when packaging software for Apple macOS 10.15 Catalina. (I put notarization in quotes because it doesn’t carry the word’s general meaning; it appears to be an Apple coinage.)

In particular I’ve had difficulty understanding how one should package plugins — shared libraries that are distributed separately from their host application, possibly by different authors, and that are loaded from a general library path on disc rather than from within the host application’s bundle. In my case I’m dealing mostly with Vamp plugins, and the main host for them is Sonic Visualiser, or technically, its Piper helper program.

Catalina requires that applications (outside of the App Store, which I’m not considering here) be notarized before it will allow ordinary users to run them, but a notarized host application can’t always load a non-notarized plugin, the tools typically used to notarize applications don’t work for individual plugin binaries, and documentation relating to plugins has been slow in appearing. Complicating matters is the fact that notarization requirements are suspended for binaries built or downloaded before a certain date, so a host will often load old plugins but refuse new ones. As a non-native Apple developer, I find this situation… trying.

Anyway, this week I realised I had some misconceptions about how notarization actually worked, and once those were cleared up, the rest became obvious. Or obvious-ish.

(Everything here has been covered in other places before now, e.g. Apple docs, KVRaudio, Glyphs plugin documentation. But I want to write this as a conceptual note anyway.)

What notarization does

Here’s what happens when you notarize something:

  • Your computer sends a pack of executable binaries off to Apple’s servers. This may be an application bundle, or just a zip file with binaries in it.
  • Apple’s servers unpack it and pick out all of the binaries (executables, libraries etc) it contains. They scan them individually for malware and for each one (assuming it is clean) they file a cryptographic hash of the binary alongside a flag saying “yeah, nice” in a database somewhere, before returning a success code to you.

Later, when someone else wants to run your application bundle or load your plugin or whatever:

  • The user’s computer calculates locally the same cryptographic hashes of the binaries involved, then contacts Apple’s servers to ask “are these all right?”
  • If the server’s database has a record of the hashes and says they’re clean, the server returns “aye” and everything goes ahead. If not, the user gets an error dialog (blah cannot be opened) and the action is rejected.

Simple. But I found it hard to see what was going on, partly because the documentation mostly refers to processes and tools rather than principles, and partly because there are so many other complicating factors to do with code-signing, identity, authentication, developer IDs, runtimes, and packaging — I’ll survey those in a moment.

For me, though, the moment of truth came when I realised that none of the above has anything to do with the release flow of your software.

The documentation describes it as an ordered process: sign, then notarize, then publish. There are good reasons for that. The main one is that there is an optional step (the “stapler”) that re-signs your package between notarization and publication, so that users’ computers can skip ahead and know that it’s OK without having to contact Apple at all. But the only critical requirement is that Apple’s servers know about your binary before your users ask to run it. You could, in fact, package your software, release the package, then notarize it afterwards, and (assuming it passes the notarization checks) it should work just the same.

Notarizing plugins

A plugin (in this context) is just a single shared library, a single binary file that gets copied into some folder beneath $HOME/Library and loaded by the host application from there.

None of the notarization tools can handle individual binary files directly, so for a while I thought it wasn’t possible to notarize plugins at all. But that is just a limitation of the client tools: if you can get the binary to the server, the server will handle it the same as any other binary. And the client tools do support zip files, so first sign your plugin binary, and then:

$ zip blah.zip myplugin.dylib
adding: myplugin.dylib (deflated 65%)
$ xcrun altool --notarize-app -f blah.zip --primary-bundle-id org.example.myplugin -u 'my@appleid.example.org' -p @keychain:altool
No errors uploading 'blah.zip'.

(See the Apple docs for an explanation of the authentication arguments here.)

[Edit, 2020-02-17: John Daniel chides me for using the “zip” utility, pointing out that Apple recommend against it because of its poor handling of file metadata. Use Apple’s own “ditto” utility to create zip files instead.]

Wait for notarization to complete, using the request API to check progress as appropriate, and when it’s finished,

$ spctl -a -v -t install myplugin.dylib
myplugin.dylib: accepted
source=Notarized Developer ID

The above incantation seems to be how you test the notarization status of a single file: pretend it’s an installer (-t install), because once again the client tool doesn’t support this use case even though the service does. Note, though, that it is the dylib that is notarized, not the zip file, which was just a container for transport.

A Glossary of Everything Else

Signing — guaranteeing the integrity of a binary with your identity in a cryptographically secure way. Carried out by the codesign utility. Everything about the contemporary macOS release process, including notarization, expects that your binaries have been signed first, using your Apple Developer ID key.

Developer ID — a code-signing key that you can obtain from Apple once you are a paid-up member of the Apple Developer Program. That costs a hundred US dollars a year. Without it you can’t package programs for other people to run them, except if they disable security measures on their computers first.

Entitlements — annotations you can make when signing a thing, to indicate which permissions, exemptions, or restrictions you would like it to have. Examples include permissions such as audio recording, exemptions such as the JIT exemption for the hardened runtime, or restrictions such as sandboxing (q.v.).

Hardened runtime — an alternative runtime library that includes restrictions on various security-sensitive things. Enabled not by an entitlement, but by providing the --options runtime flag when signing the binary. Works fine for most programs. The documentation suggests that you can’t send a binary for notarization unless it uses the hardened runtime; that doesn’t appear to be true at the moment, but it seems reasonable to use it anyway. Note that a host that uses the hardened runtime needs to have the com.apple.security.cs.disable-library-validation entitlement set if it is to load third-party plugins. (That case appears to have an inelegant failure mode — the host crashes with an untrappable signal 9 following a kernel EXC_BAD_ACCESS exception.)

Stapler — a mechanism for annotating a bundle or package, after notarization, so that users’ computers can tell it has been notarized without having to contact Apple’s servers to ask. Carried out by xcrun stapler. It doesn’t appear (?) to be possible to staple a single plugin binary, only complex organisms like app bundles.

Quarantine — an extended filesystem attribute attached to files that have been downloaded from the internet. Shown by the ls command with the -l@ flags, can be removed with the xattr command. The restrictions on running packaged code (to do with signing, notarization etc) apply only when it is quarantined.

Sandboxing — a far more intrusive change to the way your application is run, that is disabled by default and that has nothing to do with any of the above except to fill up one’s brain with conceptually similar notions. A sandboxed application is one that is prevented from making any filesystem access except as authorised explicitly by the user through certain standard UI mechanisms. Sandboxing is an entitlement, so it does require that the application is signed, but it’s independent of the hardened runtime or notarization. Sandboxing is required for distribution in the App Store.

Code

Notes on Idris

Brady-TDDI-HI.png
Idris Book cover

In a bid to expand my programming brain by learning something about “dependent types”, I recently bought the Idris book.

(Idris is a pure functional programming language that is mostly known for supporting dependent types. Not knowing what that really meant, and seeing that this recently-published book written by the author of the language was warmly reviewed on Amazon, I saw an opportunity.)

The idea behind dependent typing is to allow types to be declared as having dependencies on information that would, in most languages, only be known at runtime. For example, a function might accept two arrays as arguments, but only work correctly if the two arrays that are actually passed to it have the same length: with dependent types, this dependency can be written into the type declaration and checked at compile time. On the face of it we can’t in general know things like the number of elements in an arbitrary array at compile time, so this seems like compelling magic. Types are, in fact, first class values, and if that raises a lot of horribly recursive-seeming questions in your mind, then this book might help.

A number of academic languages are exploring this area, but Idris is interesting because of its ambition to be useful for general-purpose programming. The book is pleasingly straightforward about this: it adopts an attitude that “this is how we program, and I’ll help you do it”. In fact, the Idris book might be the most inspiring programming book I’ve read since ML for the Working Programmer (although the two are very different, more so than their synopses would suggest). It’s not that the ideas in it are new — although they are, to me — but that it presents them so crisply as to project a tantalising image of a better practice of programming.

What the book suggests

The principles in the book — as I read it — are:

  • We always write down the type of a function before writing the function. Because the type system is so expressive, this gives the compiler, and the programmer, an awful lot of information — sometimes so much that there remains only one obvious way to satisfy the type requirements when writing the rest of the function. (The parallel with test-driven development, which can be a useful thinking aid when you aren’t sure how to address a problem, presumably inspired the title of the book.)
  • Sometimes, we don’t even implement the rest of the function yet. Idris supports a lovely feature called “holes”, which are just names you plonk down in place of code you haven’t got around to writing yet. The compiler will happily type-check the rest of the program as if the holes were really there, and then report the holes and their types to you to give you a checklist of the bits you still need to fill in.
  • Functions are pure, as in they simply convert their input values to their return values and don’t modify any other state as they go along, and are if possible total, meaning that every input has a corresponding return value and the function won’t bail out or enter an infinite loop. The compiler will actually check that your functions are total — the Halting Problem shows that this can’t be done in general, but apparently it can be done enough of the time to be useful. The prospect of having a compiler confirm that your program cannot crash is exciting even to someone used to Standard ML, where typing is generally sound but inexhaustive cases and range errors still happen.
  • Because functions are pure, all input and output uses monads. That means that I/O effects are encapsulated into function return types. The book gives a supremely matter-of-fact introduction to monadic I/O, without using the word monad.
  • We use an IDE which supports all this with handy shortcuts — there is an impressive Idris integration for Atom which is used extensively throughout the book. For an Emacs user like me this is slightly offputting. There is an Emacs Idris mode as well, it’s just that so much of the book is expressed in terms of keystrokes in Atom.

None of these principles refers to dependent types at all, but the strength of the type system is what makes it plausible that this could all work. Idris does a lot of evaluation at compile time, to support the type system and totality checking, so you always have the feeling of having run your program a few times before it has even compiled.

So, I enjoyed the book. To return to earth-based languages for a moment, it reminds me a bit of Bjarne Stroustrup’s “A Tour of C++”, which successfully sets out “modern C++” as if most of the awkward history of C++ had never happened. One could see this book as setting out a “modern Haskell” as if the actual Haskell had not happened. Idris is quite closely based on Haskell, and I think, as someone who has never been a Haskell programmer, that much of the syntax and prelude library are the same. The biggest difference is that Idris uses eager evaluation, where Haskell is lazily evaluated. (See 10 things Idris improved over Haskell for what appears to be an informed point of view.)

Then what happened?

How did I actually get on with it? After absorbing some of this, I set out to write my usual test case (see Four MLs and a Python) of reading a CSV file of numbers and adding them up. The goal is to read a text file, split each line into number columns, sum the columns across all lines, then print out a single line of comma-separated sums — and to do it in a streaming fashion without reading the whole file into memory.

Here’s my first cut at it. Note that this doesn’t actually use dependent types at all.

module Main

import Data.String

parseFields : List String -> Maybe (List Double)
parseFields strs =
  foldr (\str, acc => case (parseDouble str, acc) of
                           (Just d, Just ds) => Just (d :: ds)
                           _ => Nothing)
        (Just []) strs

parseLine : String -> Maybe (List Double)
parseLine str =
  parseFields $ Strings.split (== ',') str

sumFromFile : List Double -> File -> IO (Either String (List Double))
sumFromFile xs f =
  do False <- fEOF f
     | True => pure (Right xs)
     Right line <- fGetLine f
     | Left err => pure (Left "Failed to read line from file")
     if line == ""
     then sumFromFile xs f
     else case (xs, parseLine line) of
               ([],  Just xs2) => sumFromFile xs2 f
               (xs1, Just xs2) => if length xs1 == length xs2
                                  then sumFromFile (zipWith (+) xs1 xs2) f
                                  else pure (Left "Inconsistent-length rows")
               (_, Nothing) => pure (Left $ "Failed to parse line: " ++ line)

sumFromFileName : String -> IO (Either String (List Double))
sumFromFileName filename =
  do Right f <- openFile filename Read
     | Left err => pure (Left "Failed to open file")
     sumFromFile [] f

main : IO ()
main =
  do [_, filename] <- getArgs
     | _ => putStrLn "Exactly 1 filename must be given"
     Right result <- sumFromFileName filename
     | Left err => putStrLn err
     putStrLn (pack $ intercalate [','] $ map (unpack . show) $ result)

Not beautiful, and I had some problems getting this far. The first thing is that compile times are very, very slow. As soon as any I/O got involved, a simple program took 25 seconds or more to build. Surprisingly, almost all of the time was used by gcc, compiling the C output of the Idris compiler. The Idris type checker itself was fast enough that this at least didn’t affect the editing cycle too much.

The combination of monadic I/O and eager evaluation was also a problem for me. My test program needs to process lines from a file one at a time, without reading the whole file first. In an impure language like SML, this is easy: you can read a line in any function, without involving any I/O in the type signature of the function. In a pure language, you can only read in an I/O context. In a pure lazy language, you can read once in an I/O context and get back a lazily-evaluated list of all the lines in a file (Haskell has a prelude function called lines that does this), which makes toy examples like this simple without having to engage properly with your monad. In a pure eager language, it isn’t so simple: you have to actually “do the I/O” and permeate the I/O context through the program.

This conceptual difficulty was compounded by the book’s caginess about how to read from a file. It’s full of examples that read from the terminal, but the word “file” isn’t in the index. I have the impression this might be because the “right way” is to use a higher-level stream interface, but that interface maybe wasn’t settled enough to go into a textbook.

(As a non-Haskell programmer I find it hard to warm to the “do” syntax used as sugar for monadic I/O in Haskell and Idris. It is extremely ingenious, especially the way it allows alternation — see the lines starting with a pipe character in the above listing — as shorthand for handling error cases. But it troubles me, reminding me of working with languages that layered procedural sugar over Lisp, like the RLisp that the REDUCE algebra system was largely written in. Robert Harper once described Haskell as “the world’s best imperative programming language” and I can see where that comes from: if you spend all your time in I/O context, you don’t feel like you’re writing functional code at all.)

Anyway, let’s follow up my clumsy first attempt with a clumsy second attempt. This one makes use of what seems to be the “hello, world” of dependent typing, namely vectors whose length is part of their compile type. With this, I guess we can guarantee that the lowest-level functions, such as summing two vectors, are only called with the proper length arguments. That doesn’t achieve much for our program since we’re dealing directly with arbitrary-length user inputs anyway, but I can see it could be useful to bubble up error handling out of library code.

Here’s that second effort:

module Main

import Data.Vect
import Data.String

total
sumVects : Vect len Double -> Vect len Double -> Vect len Double
sumVects v1 v2 = 
  zipWith (+) v1 v2

total
parseFields : List String -> Maybe (List Double)
parseFields strs =
  foldr (\str, acc => case (parseDouble str, acc) of
                           (Just d, Just ds) => Just (d :: ds)
                           _ => Nothing)
        (Just []) strs

total
parseVect : String -> Maybe (len ** Vect len Double)
parseVect str =
  case parseFields $ Strings.split (== ',') str of
  Nothing => Nothing
  Just xs => Just (_ ** fromList xs)  

sumFromFile : Maybe (len ** Vect len Double) -> File -> 
              IO (Either String (len ** Vect len Double))
sumFromFile acc f =
  do False <- fEOF f
     | True => case acc of
               Nothing => pure (Right (_ ** []))
               Just v => pure (Right v)
     Right line <- fGetLine f
     | Left err => pure (Left "Failed to read line from file")
     if line == ""
     then sumFromFile acc f
     else case (acc, parseVect line) of
               (_, Nothing) => pure (Left $ "Failed to parse line: " ++ line)
               (Nothing, other) => sumFromFile other f
               (Just (len ** xs), Just (len' ** xs')) =>
                    case exactLength len xs' of
                         Nothing => pure (Left "Inconsistent-length rows")
                         Just xs' =>
                             sumFromFile (Just (len ** (sumVects xs xs'))) f

sumFromFileName : String -> IO (Either String (len ** Vect len Double))
sumFromFileName filename =
  do Right f <- openFile filename Read
     | Left err => pure (Left "Failed to open file")
     sumFromFile Nothing f

main : IO ()
main =
  do [_, filename] <- getArgs
     | _ => putStrLn "Exactly 1 filename must be given"
     Right (_ ** result) <- sumFromFileName filename
     | Left err => putStrLn err
     putStrLn (pack $ intercalate [','] $ map (unpack . show) $ toList result)

Horrible. This is complicated and laborious, at least twice as long as it should be, and makes me feel as if I’ve missed something essential about the nature of the problem. You can see that, as a programmer, I’m struggling a surprising amount here.

Both of these examples compile, run, and get the right answers. But when I tried them on a big input file, both versions took more than 8 minutes to process it — on a file that was processed in less than 30 seconds by each of the languages I tried out in for this post. A sampling profiler did not pick out any obvious single source of delay. Something strange is going on here.

All in all, not a wild success. That’s a pity, because:

Good Things

My failure to produce a nice program that worked efficiently didn’t entirely burst the bubble. I liked a lot in my first attempt to use Idris.

The language has a gloriously clean syntax. I know my examples show it in a poor light; compare instead this astonishingly simple typesafe implementation of a printf-like string formatting function, where format arguments are typechecked based on the content of the format string, something you can’t do at all in most languages. Idris tidies up a number of things from Haskell, and has a beautifully regular syntax for naming types.

Idris is also nice and straightforward, compared with other strongly statically-typed languages, when it comes to things like using arithmetic operators for different kinds of number, or mechanisms like “map” over different types. I assume this is down to the pervasive use of what in Haskell are known as type classes (here called interfaces).

I didn’t run into any problems using the compiler and tools. They were easy to obtain and run, and the compiler (version 1.1.1) gives very clear error messages considering how sophisticated some of the checks it does are. The compiler generally “feels” solid.

I’m not sold on the fact that these languages are whitespace-sensitive — life’s too short to indent code by hand — though this does contribute to the tidy syntax. Idris seems almost pathologically sensitive to indentation level, and a single space too many when indenting a “do” block can leave the compiler lost. But this turned out to be less of a problem in practice than I had expected.

Will I be using Idris for anything bigger? Not yet, but I’d like to get better at it. I’m going to keep an eye on it and drop in occasionally to try another introductory problem or two. Meanwhile if you’d like to improve on anything I’ve written here, please do post below.

Inadequate names for abstract affairs

Naming conventions in Standard ML

Many programming languages have a standard document that describes how to write and capitalise the names of functions, variables, and source files. It’s especially useful to have a standard for writing names made up from more than one word, where there are various options for how to join the words: “camel case”, which looks likeThis (with a capital letter “hump” in the middle), or “snake case”, which is underscore_separated.

I think Java in the mid-90s was the first really mainstream language to standardise file and variable naming conventions. The Java package mechanism requires files to be laid out in a particular way, and Sun published Java coding conventions which quickly became an effective standard for class and variable naming. Other languages followed. Python has had a standard that covers naming (PEP8) since 2001. More recent examples include Go and Swift.

Older languages tend to be less consistent. C++ is a mess: the standard library and most official example material uses snake_case for most names, but a great many developers, including those on most of the projects I’ve worked on, prefer camelCase, with capital initials for class names. File names are even more various: C++ source files are seen with .cpp, .cxx, .cc, and .C extensions; C++ header files with .h, .hpp, or no extension at all.

Standard ML (SML) is also a mess, and an interesting one because the language itself was standardised in 1990 and has been completely unchanged since the standard was revised in 1997. So although it is super-standardised, it’s a bit too old to have caught the wider shift in sentiment toward prescribing things like naming and file structure.

The SML standard is formal and very focused. It says nothing about coding style or naming, contains almost no examples using compound names, says nothing about filenames or file organisation, and specifies no way for one file to refer to another — the standard is indifferent to whether your source code is held in a file at all.

In trying to establish what naming conventions to use for my own code, I decided to look around at some existing libraries in SML to see what they had settled on.

The Basis library

SML has a standard library, the Basis library, which is a bit more recent than the language itself. Although it isn’t prescriptive, the library does use certain conventions itself and the introductory notes explain what they are. These cover only names of things within a program — not filenames, which are left up to the implementor of the standard. I’ll refer to them in the table below.

The Cornell style guide

Top search result for “SML naming conventions” for me is this online style guide for the Cornell CS312 course. It doesn’t cover file naming. Given the limited industry uptake for SML, an academic guide may be proportionately more influential than for other languages. I’ll mention this guide below as well.

Other code I looked at

I took a look at the following code:

  • The source of the MLton, MLKit, and SMLSharp compilers (excluding accompanying utility libraries)
  • The Basis library implementations shipped with MLton and SMLSharp
  • The SML/NJ extended library
  • The source of the Ur/Web language
  • The Ponyo library, an interesting fledgling effort to produce a broader base library than the Basis

In total, about 444,500 lines of code across 1790 SML source files. Some (presumably automatically-generated) source files are very long; while the mean file length is 248 lines including comments and blanks, the median is only 47.

Names within the language

The SML language has at least seven categories of things that need names: variables, type names, datatype constructors, exceptions, structures, signatures, and functors.

(By “variables” I really mean bindings, i.e. the vast majority of ordinary things with names: things that in a procedural language might include function names, variable names, and constant declarations. I’m using the word “variable” because it’s such a familiar everyday programming term.)

Source Variable Type name Datatype constructor Exception Structure Signature Functor
mlton variableName (mixed) DatatypeCtor ExceptionName* StructureName SIGNATURE_NAME FunctorName
mlkit (mixed) (mixed) DatatypeCtor* ExceptionName* StructureName SIGNATURE_NAME FunctorName
smlsharp variableName typeName* DATATYPE_CTOR* ExceptionName StructureName SIGNATURE_NAME FunctorName
basis variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName
smlnj-lib variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
urweb variableName type_name* DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
ponyo variableName typeName DatatypeCtor ExceptionName Structure_Name SIGNATURE_NAME Functor_Name
cornell variableName type_name DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorName

* mostly

Here’s what I found, categorised into universal conventions, usual conventions, and “other”.

Universal

The following is the only universal convention:

Signature
SIGNATURE_NAME

The only code I found that doesn’t follow this convention is in the SML standard itself, which omits the underscore (like SIGNATURENAME).

Usual

The following conventions are not universal, but more popular than any other.

Variable Type name Exception Structure Functor
variableName type_name ExceptionName StructureName FunctorName

Camel case is clearly idiomatic for everything except type names. MLKit contains some snake-cased bindings as well, but none of the other libraries did. I like snake case in SML and I’ve written a fair bit of code using it myself; I hadn’t realised until now how uncommon it was. (It’s more common in SML’s sibling language OCaml. Ironic that, of the three very similar languages SML, OCaml, and F#, the only one not to use camel case is called OCaml.)

I spotted a handful of all-caps exception names and some camel case type names, but no library preferred those consistently.

The Ponyo library differs from the above for structures (Structure_Name) and functors (Functor_Name).

The SML/NJ library sort-of differs for functors, which are given a Fn suffix (FunctorNameFn). But you could think of this as part of the name, in which case the convention is the same.

Most type and datatype names used in public APIs are single words, or even single letters, so the convention often doesn’t matter for those.

Other

There seems to be no consensus about datatype constructors — I found DatatypeConstructor and DATATYPE_CONSTRUCTOR in roughly equal number.

Filenames

Nothing in the SML standard or Basis library cares about what source files are called, what file extension they use, or how you divide your code up among them. Some compilers might care, but most don’t. The business of telling the compiler which files a program consists of, or of expressing any relationships between files, is left up to external tools. SML has neither header files nor import directives.

This makes fertile ground for variety in naming schemes.

I’m going to consider only filenames that are associated with a primary structure, signature, or functor. Here’s the table.

Source Structure Signature Functor
mlton structure-name.sml signature-name.sig functor-name.fun
mlkit StructureName.sml SIGNATURE_NAME.sml* FunctorName.sml
smlsharp StructureName.sml SIGNATURE_NAME.sig* FunctorName.sml
mlton-basis structure-name.sml signature-name.sig functor-name.fun
smlsharp-basis StructureName.sml SIGNATURE_NAME.sig (none)
snlnj-lib structure-name.sml signature-name-sig.sml functor-name-fn.sml
urweb structure_name.sml signature_name.sig (n/a)
ponyo Structure_Name.sml SIGNATURE_NAME.ML Functor_Name.sml

* mostly

Clearly very inconsistent. There are no universal or usual conventions, only “other”.

Behind this there is a wider question about code organisation in files — should each signature live in its own file? Each structure? In many cases they do, but that is also far from universal.

If you use a scheme in which filenames are clearly derived from signature and structure names, does that mean you shouldn’t put more than one structure in the same file? What do you do with code that is not in any structure? Really it’s a pity to have to think about filenames at all, in a language that is so completely indifferent to file structure.

A Reasonable Recommendation

A plausible set of rules based on the above.

For names within the language:

Variable Type name Datatype constructor Exception Structure Signature Functor
variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName

This is the style used by the Basis library. Apart from datatype constructors, everything here was in the majority within the libraries I looked at.

For datatype constructors it seems reasonable to pick the most visible option and one that is consistent with the names in Basis. (This differs from the Cornell guide, however.) There is no confusion between these and signature names, because signature names never appear anywhere except in the declaration lines for those signatures and the structures that implement them.

For filenames:

Structure Signature Functor
structure-name.sml signature-name.sig functor-name.sml

The logic here is:

  • It’s still not a great idea to expect a case sensitive filesystem, so all-one-case is good
  • Generally use .sml extension for SML source
  • But the .sig extension for signatures seems very widely used, and it’s fair to make public signatures as easy to spot as possible
  • The .ml extension is not a great idea because it clashes with OCaml
  • The .fun extension used by MLton is a bit obscure, and you don’t always want to separate out functors (if you want to make functors more distinctive, give them names ending in Fn, as the SML/NJ library does).

 

Code · Uncategorized

F♯ has possibilities

A couple of months ago, Microsoft announced that they were buying a company called Xamarin, co-founded by the admirable Miguel “you can now flame me, I am full of love” de Icaza. (No sarcasm — I think Miguel is terrific, and the delightfully positive email linked above really stuck with me; if only I could have that attitude more often.)

As I understand it, Xamarin makes

  1. the Mono runtime, a portable third-party implementation of Microsoft’s .NET runtime for the C# and F# programming languages
  2. the eponymous Xamarin frameworks, which can be used with .NET to develop mobile apps for iOS and Android
  3. plugins for the Visual Studio IDE on Windows and the MonoDevelop IDE on OS/X to support mobile platform builds using Xamarin (the MonoDevelop-plus-plugins combo is known as Xamarin Studio).

Then a couple of days ago, the newly-acquired Xamarin declared

  1. that the Mono runtime was switching from LGPL/GPL licenses to MIT, allowing no-cost use in commercial applications
  2. that Microsoft were providing a patent promise (which I have not closely read) to remove concerns for commercial users of Mono
  3. that the Xamarin frameworks for iOS and Android, and the IDE plugins, were now free (of cost)
  4. that at some future point the Xamarin frameworks would be open sourced

I’m trying to unpick exactly what this could mean to me.

According to this discussion on Hacker News, the IDE plugins are remaining proprietary (which appears to mean that no IDE on Linux will be supported, since the IDE plugins are not currently available for Linux) but that “the Xamarin runtime and all the commandline tools you need to build apps” will be open sourced.

What this means

as I understand it,

  • Developers working on proprietary .NET applications will be able to build and release versions for other platforms than Windows, using Mono, at no extra cost
  • Developers working on open source .NET applications will be able to publish the ensemble with Mono under the MIT license if desired and will (apparently) be free of patent concerns
  • Developers will be able to make both proprietary and open source .NET applications for iOS and Android at no cost using Windows and OS/X
  • There is a possibility of being able to do builds of the above using Linux as well once the SDK is open, though probably without an IDE

Unrelatedly, there are separate projects afoot to provide native code and to-Javascript compilers for .NET bytecode.

What I’m interested in

I do a range of programming including a mixture of signal-processing and UI work, and am interested in exploring comprehensible, straightforward functional languages in the ML family (I wrote a little post about that here). Unlike many audio developers I have relatively limited demands on real-time response, but everything I write really wants to be cross-platform, because I’ve got specialised users on pretty every common platform and I have limited time and funding. (I understand that cross-platform apps are often inferior to single-platform apps, but they’re better than no apps.)

Xamarin doesn’t quite meet my expectations because it’s not really a cross-platform framework in the manner of Qt (which I use) or JUCE (which is widely used by others in my field). Instead of providing a common “widget set” across all platforms, Xamarin provides a separate thin interface to the native UI logic for each platform. It’s hard to judge how much more work this is, without knowing where the abstraction boundaries lie, but it may be a more relevant and sensible distinction on mobile platforms (where the differences are often in interaction and layout) than desktops (where the differences are mostly about how large numbers of individual widgets look).

An ideal combination of language and framework for me goes something like

  • strongly-typed, mostly functional, mostly immutable data structures
  • efficient unboxed support for floating-point vector types, including SIMD support
  • simple syntax (SML is nice)
  • low-cost foreign-function interface for C integration
  • high-level approach to multithreading
  • can work with gross UI layout in HTML5 (possibly DOM-update reactive UI style?)
  • good libraries for e.g. audio file I/O, signal processing, matrix algebra
  • can develop on Linux and deploy to all of Linux, Windows, OS/X, iOS, Android
  • free (or cheap, for proprietary apps) and open source (for open source apps)
  • has indenting Emacs mode

Where F# appears to score

F#, Microsoft’s ML-derived functional language for the .NET CLR, hits several of these. It has the typing, mostly-functional style, syntax, FFI, multithreading, libraries, deployment and licensing, and potentially the development platform (if the open source Xamarin framework should lead to the ability to build mobile apps directly from Linux).

I’m not sure about floating-point and vectors or about reusable HTML-style UI. I’d like to make the time to do another comparison of some ML-family languages, focusing on DSP-style float activity and on threading. I’ve done a bit of related work in Standard ML, which I could use as a basis for comparison.

Unless and until I get to do that, I’d love to hear any thoughts about F# as a general-purpose DSP-and-UI language, for a developer whose home platform is Linux.

My impression from the feedback on my earlier post was that the F# community is both enthusiastic and polite, and I notice that F# is the third most-loved language in the StackOverflow’s 2016 survey. Imagine a language that is useful no matter what platform you’re targeting, and whose developers love it. I can hope.