SoundSoftware tutorial at AES 53

I’ll be co-presenting the first tutorial session at the Audio Engineering Society 53rd Conference on Semantic Audio, this weekend.

(It’s the society’s 53rd Conference, and it happens to be about semantic audio. It’s not their 53rd conference about semantic audio. In fact it’s their second: that was also the theme of the AES 42nd Conference in 2011.

What is semantic audio? Good question, glad you asked. I believe it refers to extraction or estimation of any semantic material from audio, including speech recognition and music information retrieval.)

My tutorial, for the SoundSoftware project, is about developing better and more reliable software during research work. That’s a very deep subject, so at best we’ll barely hint at a few techniques during one programming exercise:

  • making readable experimental code using the IPython Notebook, and sharing code for review with colleagues and supervisors;
  • using version control software to manage revisions and escape disaster;
  • modularising and testing any code that can be used in more than one experiment;
  • packaging, publishing, and licensing code;
  • and the motivations for doing the above.

We presented a session at the Digital Audio Effects (DAFx) conference in 2012 which covered much of this material in presentation style, and a tutorial at the International Society for Music Information Retrieval (ISMIR) in 2012 which featured a “live” example of test-driven development in research software. You can find videos and slides from those tutorials here. The theme of this one is similar, and I’ll be reusing some code from the ISMIR tutorial, but I hope we can make this one a bit more hands-on.

 

How Much Legacy Code Have You Written This Week?

I recently bought a copy (based on a recommendation) of Michael Feathers’ 2005 book Working Effectively with Legacy Code.

This excellent technical book is largely a compendium of refactoring strategies to help software developers insinuate unit tests into existing code.

What I found most striking, though, is a position stated right at the start of the book.

Most programmers have a broadly intuitive idea what “legacy code” means. It’s typically a euphemism for code that they resent, because it’s old and crufty and they have to maintain it unwillingly. There may be a hint of the positive meaning of “legacy”—code handed down to you that does at least work, which is why you have to maintain it. In my mind there’s also a suggestion that the code is alien in some way—written for an obsolete platform or in an obsolete language—which contributes to the difficulty of managing it.

Feathers dismisses most of these possible definitions, writing

In the industry, legacy code is often used as a slang term for difficult-to-change code that we don’t understand. But over years of working with teams, helping them get past serious code problems, I’ve arrived at a different definition.

To me, legacy code is simply code without tests.

[...] With tests, we can change the behavior of our code quickly and verifiably. Without them, we really don’t know if our code is getting better or worse.

That’s pretty uncompromising, but it certainly gives you perspective.

If you take this point of view, then every line of code you write that lacks tests—proper tests that properly test, and that other people can run—is an immediate, direct contribution to the sludgy mountain of cruddy code that nobody really understands and that probably doesn’t work.

(Of course, code with tests can still be useless. But at least it has potential.)

I’ve written a lot of legacy code over the past few years. I’ve only quite recently started unit testing routinely in new code I write, and I still need to crisp up my development practice—fewer lines of code per task, more of them demonstrably working. I might not find Feathers’ definition of legacy code entirely satisfactory as a cultural one, but it’s a highly useful stimulus.

SoundSoftware 2012 Workshop

Yesterday the SoundSoftware project, which I help to run, hosted the SoundSoftware 2012 Workshop at Queen Mary. This was a one-day workshop about working practices for researchers developing software and experiences they have had in software work, with an eye to subjects of interest to audio and music researchers.

You can read about the workshop at the earlier link; I’d just like to mention two talks that I found particularly interesting. These were the talk from Paul Walmsley followed by that of David Gavaghan.

Paul is a long-serving senior developer in the Sibelius team at Avid (a group I’m interested in already because of my former life working on the notation editor aspects of Rosegarden: Sibelius was always the gold standard for interactive notation editing). He’s an articulate advocate of unit testing and of what might be described as a decomposition of research work in such a way as to be able to treat every “research output” (for example, presenting a paper) as an experiment demanding reproducibility and traceable provenance.

Usefully, he was able to present ideas like these as simplifying concerns, rather than as arduous reporting requirements. At one point he remarked that he could have shaved six months off his PhD if he had known about unit testing at the time—a remark that proved a popular soundbite when we quoted it through the SoundSoftware tweeter.

(I have an “if only I’d been doing this earlier” feeling about this as well: Rosegarden now contains hundreds of thousands of lines of essentially untested code, much of which is very fragile. Paul noted that much of the Sibelius code also predates this style of working, but that they have been making progress in building up test cases from the low-level works upward.)

David Gavaghan took this theme and expanded on it, with the presentation of his biomedical project Chaste (for “cancer, heart, and soft tissue environment”). This remarkable project, from well outside the usual field we usually hear about in the Centre for Digital Music, was built from scratch in a test-driven development process—which David refers to as “testfirst”. It started with a very short intensive project for a number of students, which so exercised the people involved that they voluntarily continued work on it up to its present form: half a million lines of code with almost 100% test coverage that has proven to avoid many of the pitfalls found in other biomedical simulation software.

Small conclusions about APIs and testing

In my previous post I explained a small but significant API change for v0.9 of the Dataquay library.

Although there was nothing very deep about this change or its causes, I found it interesting partly because I had used a partly test-driven process to evolve the original API and I felt there may be a connection between the process and any resulting problems. Here are a few thoughts prompted by this change.

Passing the tests is not enough

Test-driven development is a satisfying and welcome prop. It allows you to reframe difficult questions of algorithm design in terms of easier questions about what an algorithm should produce.

But producing the right results in every test case you can think of is not enough. It’s possible to exercise almost the whole of your implementation in terms of static coverage, yet still have the wrong API.

In other words, it may be just as easy to overfit the API to the test cases as it is to overfit the test cases to the implementation.

Unit testing may be easier than API design

So, designing a good API is harder than writing tests for it. But to rephrase that more encouragingly: writing tests is easier than designing the API.

If, like me, you’re used to thinking of unit testing as requiring more effort than “just bunging together an API”, this should be a worthwhile corrective in both directions.

API design is harder than you think, but unit testing is easier. Having unit tests doesn’t make it any harder to change the API, either: maintaining tests during redesign is seldom difficult, and having tests helps to ensure the logic doesn’t get broken.

Types are not just annoying artifacts of the programming language

An unfortunate consequence of having worked with data representation systems like RDF mostly in the context of Web backends and scripting languages is that it leads to a tendency to treat everything as “just a string”.

This is fine if your string has enough syntax to be able to distinguish types properly by parsing it—for example, if you represent RDF using Turtle and query it using SPARQL.

But if you break down your data model into individual node components while continuing to represent those as untyped strings, you’re going to be in trouble. You can’t get away without understanding, and somewhere making explicit, the underlying type model.

Predictability contributes to simplicity

A simpler API is not necessarily one that leads to fewer or shorter lines of code. It’s one that leads to less confusion and more certainty, and carrying around type information helps, just as precondition testing and fail-fast principles can.

It’s probably still wrong

I’ve effectively found and fixed a bug, one that happened to be in the API rather than the implementation. But there are probably still many remaining. I need a broader population of software using the library before I can be really confident that the API works.

Of course it’s not unusual to see significant API holes in 1.0 releases of a library, and to get them tightened up for 2.0. It’s not the end of the world. But it ought to be easier and cheaper to fix these things earlier rather than later.

Now, I wonder what else is wrong…

Details of the Dataquay v0.9 API changes

Dataquay hasn’t seen a great deal of use yet. I’ve used it in a handful of personal projects that follow the same sort of model as the application it was first designed for, and that’s all.

But I’ve recently started to adapt it to a couple of programs whose RDF usage follows more traditional Linked Data usage patterns (Sonic Visualiser and Sonic Annotator), as a replacement for their fragile ad-hoc RDF C++ interfaces. And it became clear that in these contexts, the API wasn’t really working.

I can get away with changing the API now as Dataquay is still a lightly-used pre-1.0 library. But, for the benefit of the one or two other people out there who might be using it—what has changed, and why?

The main change

The rules for constructing Dataquay Uri, Node and Triple objects have been simplified, at the expense of making some common cases a little longer.

If you want to pass an RDF URI to a Dataquay function, you must now always use a Uri object, rather than a plain Qt string. And to create a Uri object you must have an absolute URI to pass to the Uri constructor, not a relative or namespaced one.

This means in practice that you’ll be calling store->expand() for all relative URIs:

Triple t(store->expand(":bob"), Uri("a"), store->expand("profession:Builder"));

(Note the magic word “a” still gets special treatment as an honorary absolute URI.)

Meanwhile, a bare string will be treated as an RDF literal, not a URI.

Why?

Here’s how the original API evolved. (This bit will be of limited interest to most, but I’ll base a few short conclusions on it in a later post.)

An RDF triple is a set of three nodes, each of which may be a URI (i.e. an “identity” node), a literal (a “data” node) or a blank node (an unnamed URI).

A useful RDF triple is a bit more limited. The subject must be a URI or blank node, and the predicate can only realistically be a URI.

Given these limitations, I arrived at the original API through a test-driven approach. I went through a number of common use cases and tried to determine, and write a unit test for, the simplest API that could satisfy them. Then I wrote the code to implement that, and fixed the situations in which it couldn’t work.

So I first came up with expressions like this Triple constructor:

Triple t(":bob", "a", "profession:Builder");

Here :bob and profession:Builder are relative URIs; bob is relative to the store’s local base URI and Builder is relative to a namespace prefix called “profession:”. When the triple goes into a store, the store will need to expand them (presumably, it knows what the prefixes expand to).

This constructor syntax resembles the way a statement of this kind is encoded in Turtle, and it’s fairly easy to read.

It quickly runs into trouble, though. Because the object part of a subject-predicate-object can be either a URI or a literal, profession:Builder is ambiguous; it could just be a string. So, I introduced a Uri class.[1]

Triple t(":bob", "a", Uri("profession:Builder"));

Now, the Uri object stores a URI string of some sort, but how do I know whether it’s a relative or an absolute URI? I need to make sure it’s an absolute URI already when it goes into the Uri constructor—and that means the store object must expand it, because the store is the thing that knows the URI prefixes.[2]

Triple t(":bob", "a", store->expand("profession:Builder"));

Now I can insist that the Uri class is only used for absolute URIs, not for relative ones. So the store knows which things it has to expand (strings), and which come ready-expanded (Uri objects).

Of course, if I want to construct a Triple whose subject is an absolute URI, I have another constructor to which I can pass a Uri instead of a string as the first argument:

Triple t(Uri("http://example.com/my/document#bob"), "a", store->expand("profession:Builder"));

This API got me through to v0.8, with a pile of unit tests and some serious use in a couple of applications, without complaint. Because it made for simple code and it clearly worked, I was happy enough with it.

What went wrong?

The logical problem is pretty obvious, but surprisingly hard to perceive when the tests pass and the code seems to work.

In the example

Triple t(":bob", "a", store->expand("profession:Builder"));

the first two arguments are just strings: they happen to contain relative URIs, but that seems OK because it’s clear from context that they aren’t allowed to be literals.

Unfortunately, just as in the Uri constructor example above, it isn’t clear that they can’t be absolute URIs. The store can’t really know whether it should expand them or not: it can only guess.

More immediately troublesome for the developer, the API is inconsistent.

Only the third argument is forced to be a Uri object. The other two can be strings, that happen to get treated as URIs when they show up at the store. Of course, the user will forget about that distinction and end up passing a string as the third argument as well, which doesn’t work.

This sort of thing is hard to remember, and puzzling when encountered for the first time independent of any documentation.

In limited unit tests, and in applications that mostly use the higher-level object mapping APIs, problems like these are not particularly apparent. Switch to a different usage pattern involving larger numbers of low-level queries, and then they become evident.


[1] Why invent a new Uri class, instead of using the existing QUrl?

Partly because it turns out that QUrl is surprisingly slow to construct. For a Uri all I need is a typed wrapper for an existing string. Early versions of the library did use QUrl, but profiling showed it to be a significant overhead. Also, I want to treat a URI, once expanded, as a simple opaque signifier rather than an object with a significant interface of its own.

So although Dataquay does use QUrl, it does so only for locations to be resolved and retrieved —network or filesystem locations. The lightweight Uri object is used for URIs within the graph.

[2] Dataquay’s Node and Triple objects represent the content of a theoretical node or triple, not the identity of any actual instance of a node or triple within a store.

A Triple, once constructed, might be inserted into any store—it doesn’t know which store it will be associated with. (This differs from some other RDF libraries and is one reason the class is called Triple rather than Statement: it’s just three things, not a part of a real graph.) So the store must handle expansions, not the node or triple.

Unit testing: Why bother?

I’ve just published an article about unit testing on the SoundSoftware site, focusing on how to justify the apparent extra effort—particularly with an eye to the academic researcher/developer. Thanks to James, Luke, Richard and Jazz for input.

As always, if you see any mistakes or have any thoughts on the subject, please feel free to post here or drop me a line!