Academics · Programs for Music

Note on “Explorations in Time-Frequency Analysis” by Patrick Flandrin

Patrick Flandrin is a physicist and signal-processing researcher whose name I first encountered as co-author (with François Auger) of a 1995 IEEE Transactions on Signal Processing paper called “Improving the Readability of Time-Frequency and Time-Scale Representations by the Reassignment Method”.

This crunchy publication (21 pages, dozens of equations and figures) took a pleasing idea — replacing the familiar grid-format time-frequency spectrogram with a field of precisely localised points calculated using both magnitude and phase of the frequency bins, rather than only magnitude as a traditional spectrogram does — and set out the mathematics of applying it to a number of different time-frequency and time-scale representations.

Illustration from Auger & Flandrin (1995)

I read this paper about 15 years ago and didn’t understand it. I have since realised this is partly because it isn’t all that clear with its notation, but there is also a big gap between the naive programmer’s view (that’s mine) of a spectrogram and the mathematical analysis used in the paper.

To explain. For a programmer, a spectrogram comes from taking short overlapping slices of a sampled signal, multiplying each by a smoothing window shape, applying a short-time Fourier transform, and taking the magnitudes of the complex output bins to get one column of the spectrogram per slice of input. The short slices are because you want a fixed, smallish number of output bins, and you have various tradeoffs — time and frequency resolution and computational efficiency — to consider in that. The smoothing window is because your Fourier transform — a thing which matches up sinusoids of different frequencies against a signal to identify which ones would add up to it — operates on an infinite signal, consisting of the input you give it repeated forever in both directions: this will have a discontinuity each time it wraps around, and the smoothing window removes some of the frequency artifacts from these discontinuities. There is nothing particularly mathematical about the implementation of this, and any intuition used by the programmer is a mixture of the visual and techniques from the world of engineering. The language used in a publication like the DAFx book is typical in this world.

The Auger & Flandrin paper instead comes from a world that summarises a spectrogram as a two-dimensional Wigner-Ville distribution filtered with a smoothing window leading to a time-frequency representation of the Cohen’s class. Signals are finite-energy functions over infinite domains, and a spectrogram is a double integral over time and angular frequency. Both time-domain functions and time-frequency representations are continuous, and practical questions about overlap and window length don’t arise. I can dimly remember this world, because my undergraduate degree — who am I kidding, my only degree — started out as pure maths, but I haven’t inhabited it for any of my working life.

So I didn’t really understand the paper, and a programmer has plenty to do, and that is one reason why Sonic Visualiser’s “Peak-Frequency Spectrogram” layer calculates instantaneous frequencies from the phase difference between successive columns, something which I found much easier to understand. (It turns out there are other good reasons one could make this choice, but I didn’t know that.1)

Returning to the paper recently, I learned that Flandrin had written a book on the subject, and I bought a copy hoping it might bridge the conceptual gap. It turned out to be a good experience.

* * *

“Explorations in Time-Frequency Analysis” is a monograph digressing on things the author has found interesting in the past 30 years, which — what luck! — happen to be about time-frequency analysis. It’s short, about 200 pages, and nicely printed. There are lots of diagrams, and although equation-heavy it doesn’t hang about proving things, sending you to the references instead. It begins with a glossary of notation (I like it when books do this) and ends with a 9-page bibliography. The writing is crisp and friendly and the scene is set by the first two chapters, a philosophical outline and a chapter of examples with the lovely title “Small Data Are Beautiful”.

Although the book provides a lot of the background to the paper that defeated me, I still spent a potentially embarrassing amount of thought on things I imagine that anyone properly within the target market finds obvious. An example is what it means for a Gaussian function to be “circular” in time and frequency. The book goes over this in far more detail, but briefly a Gaussian — the bell-shaped normal distribution curve found in probability — has the property that its Fourier transform is also a Gaussian. The “wider” the bell shape in the time domain, the “narrower” in the frequency domain: at some point it must be equal in both, and then if you plot it in a spectrogram-like heat map you will see a circle. When does this happen? It’s shown that it happens for the Gaussian corresponding to a normal distribution of variance 1. But at this point I am worrying about units. What does it mean to be circular? The figures illustrating this lack units in either axis — in fact detail-wise many of the figures are more like sketches — and the little bit of engineer in me is wondering: how can you possibly have a circle if you lack units?

The answer I eventually recalled is that the units in one domain define those in the other. In this case, if the time axis is in seconds then angular frequency is radians per second, and a circle is a distribution whose extent in seconds is the same as that in radians per second. Other units such as samples (in time) or STFT bins (in frequency) have similar correspondences in the other domain. This is a place where going back to basics took significant thought, but I did actually appreciate being expected to think about it.

So a nice rehearsal with some interesting bumps, but for me the thrilling twist arrives in chapter 12, “Spectrogram Geometry 2”. This reframes the spectrogram as a complex plane and the reassignment operator in terms of motion in a potential field proportional to the log-spectrogram. This mathematical leap is also an intuitively visual one, and it’s exciting for me because it is a little like how I pictured the spectrogram, with no meaningful mathematical analysis, when developing a certain feature of the Rubber Band timestretcher.2 This chapter is like seeing the vaguely-realised ground beneath your feet resolve into a larger, recognisable object — the moment when you realise you are standing on the back of a giant Pokémon, if you will.

There is a lot more in this book, and I think it will repay repeated visits. I’m not sure whether you could implement anything directly from it, but you could, say, pick a random page and follow up all the references until you really feel you understand it. I think this would be a rewarding exercise that, for someone like me, would probably take around a month per page.

* * *

On that note, one of the first references given is to a book called “Visible Speech” by Potter, Kopp, and Green, 1947. I looked this up and was so intrigued that I tracked down an ex-library copy. It is a lavish presentation, perhaps with both training and PR elements, of a then-new idea called the “sound spectrograph”, i.e. a spectrogram. The title “Visible Speech”, incidentally, is borrowed with attribution from an earlier (1867) work about phonetic alphabets.

The authors of the 1947 book were writing about work done at Bell Labs to try to make the telephone accessible to the deaf. Their experimental devices used paper tape or phosphor display to show spectrographs of the speech sounds, and users were specially trained to interpret speech from them. Here’s a picture from the book of someone using one.

Operator sitting at a table in front of a large box with a tiny screen on itThe spectrographs were produced by automatically recording the speech to tape and playing the tape repeatedly through a filter of 300Hz bandwidth, whose centre frequency was incremented linearly between passes in 15Hz steps from 0-3500Hz. (They also had a version using 45Hz bandwidth filters, but it was found to be less legible.) The system was of course analogue.

In this image the top spectrograph is the one with 45Hz bandwidth, which is used to point out some interesting features, but the 300Hz bandwidth spectrograph below it is the form used throughout the rest of the book:

It’s striking how clear these spectrographs are, and it makes a useful reminder that we really aren’t always looking for the most precise representation of something — 300Hz bandwidth at speech frequencies is pretty wide! — but instead the most appropriate in some human dimension.

 


1 The Sonic Visualiser peak-frequency spectrogram precisely localises stable frequencies, but for each frequency bin it draws a short horizontal line across the whole duration of the bin at the proper frequency rather than localise the bin to a point in time. A very similar output could have been produced using reassignment, because the frequency calculated from phase difference should be very close to that calculated with reassignment. But a decision to do that would have meant ignoring the other reassignment operator, localisation in time, which gives a single point rather than a horizontal line for each bin. Had I understood the reassignment paper, I would probably have felt compelled to do that part properly. For it to work well, a greater bin overlap and much more sophisticated rendering would have been needed, and the result would have been much slower and possibly less clear for real music. I think.

2 This feature, which I gave the vague name “phase lamination”, was worked out in a hurry after discovering that the “phase locking” technique of Jean Laroche and Mark Dolson which I had used in the very first release of Rubber Band was patented. Phase locking reduced audible phasiness with the nice side-effect of making the phase vocoder faster to compute, but it also lent a robotic tang to the sound which certain listeners found even more unpleasant than the phasiness. The scheme I came up with to replace it was based on picturing a gradient field and making adjustments to bins near a peak or trough in proportion to the distance from it — tuned by ear rather than worked out mathematically. Although it lost the improved speed of phase locking, it usually sounds better. The idea seems reasonably obvious, but I hadn’t seen it described anywhere else and I was delighted to find it.

Academics · Code · Programs for Music · Work

Chordino troubles

On September the 9th, I released a v1.0 build of the Chordino and NNLS Chroma Vamp plugin. This plugin analyses audio recordings of music and calculates some harmonic features, including an estimated chord transcription. When used with Sonic Visualiser, Chordino is potentially very useful for anyone who likes to play along with songs, as well as for research.

Chordino was written by Matthias Mauch, based on his own research. Although I made this 1.0 release, my work on it only really extended as far as fixing some bugs found in earlier releases using the Vamp Plugin Tester.

Unfortunately, with one of those fixes, I broke the plugin. The supposedly more reliable 1.0 update was substantially less accurate at identifying both chord-change boundaries and the chords themselves than any previous version.

I didn’t notice. Nor did Matthias, who had recently left our research group and was busy starting at a new job. One colleague sent me an email saying he had problems with the new release, but I jumped to the completely wrong conclusion that this had something to do with parameter settings having changed since the last release, and suggested he raise it with Matthias.

I only realised what had happened after we submitted the plugin to MIREX evaluation, something we do routinely every year for plugins published by C4DM, when the MIREX task captain Johan Pauwels emailed to ask whether I had expected its scores to drop by 15% from the previous year’s submission (see results pages for 2014, 2015). By that time, the broken plugin had been available for over a month.

This is obviously hugely embarrassing—perhaps the most unambiguous screwup of my whole programming career so far. As the supposed professional software developer of my research group, I took someone else’s working code, broke it, published and promoted the broken version with his name all over it, and then submitted it to a public evaluation, again with his name on it, where its brokenness was finally made pointed out to me, a month later, by someone else. Any regression test, even on only a single audio file, would have shown up the problem immediately. Regression-testing this sort of software can be tricky, but the simplest possible test would have worked here. And a particularly nice irony is provided by the fact that I’ve just come from a four-year project whose aims included trying to improve the way software is tested in academia.

I’ve now published a fixed version of the plugin (v1.1), available for download here. This one has been regression tested against known-good output, and the tests are in the repository for future use. The broken version is actually gone from the download page (though of course it is still tagged in the source repository), to avoid anyone getting the wrong one by accident.

I’m also working on a way to make simple regression tests easier to provide and run, for the other plugins I work on.

That’s all for the “public service announcement” bit of this post; read on only if you’re interested in the details.

What was the change that broke it? Well, it was a change I made after running the plugin through the Vamp Plugin Tester, a sort of automated fuzz-testing tool that helps you find problems with your code. (Again, there’s an irony here. Using this tool is undoubtedly a good practice, as it can show up all sorts of problems that might not be apparent to developers otherwise. Even so, I should have known well how common it is to introduce bugs while fixing things like compiler warnings and static analysis tests.)

The problem I was trying to fix here was that intermediate floating-point divisions sometimes overflowed, resulting in infinity values in the output. This only happened for unusual inputs, so it appeared reasonable to fix it by clamping intermediate values when they appeared to be blowing up out of the expected range. But I set the threshold too low, so that many intermediate values from legitimate inputs were also being mangled. I then also made a stupid typo that made the results a bit worse still (you can see the change in question around line 500 of the file in this diff).

Note that this only broke the output from the Chordino chord estimator, not the other features calculated by NNLS Chroma.

A digression. An ongoing topic of debate in the world of the Research Software Engineer is whether software development resources in academia should be localised within research groups, or centralised.

The localised approach, which my research group has taken with my own position, employs developers directly within a research subject. The centralised approach, typified by the Research Software Development group at UCL, proposes a group of software developers who are loaned or hired out to research groups according to need and availability.

In theory, the localised approach can be simpler to manage and should increase the likelihood of developers being available to help with small pieces of work requiring subject knowledge at short notice. The centralised approach has the advantage that all developers can share the non-subject-specific parts of their workload and knowhow.

I believe that in general a localised approach is useful, and I suspect it is easier to hire developers for a specific research group than to find developers good enough to be able to parachute in to anywhere from a central team.

In a case like this, though, the localised approach makes for quite a lonely situation.

Companies that produce large software products that work do so not because they employ amazing developers but because they have systems in place to support them: code review, unit testing, regression tests, continuous integration, user acceptance tests.

But for me as a lone professional developer in a research group, it’s essentially my responsibility to provide those safety nets as well as to use them. I had some of them in place for most of the code I work on, but there was a big hole for this particular project. I broke the code, and I didn’t notice because I didn’t have the right tests ready. Neither did the researcher who wrote most of this code, but that wasn’t his job. When some software goes out from this group that I have worked on, it’s my responsibility to make sure that the code aspects of it (as opposed to the underlying methods) work correctly. Part of my job has to be to assume that nobody else will be in a position to help.

 

Academics · Code

A release! Tony v1.0

Just a few days after my last post, I did finally manage to finish packaging the release of Tony v1.0. This followed a two-week blitz of fixing, tidying, arguing, etc., with the instigator of the Tony project, my colleague Matthias Mauch. We’re pretty happy about the results.

Tony is a program for pitch and note transcription of audio, mainly intended for precise and high-quality guided annotation of monophonic recordings—such as of unaccompanied singing. It has a fairly simple user interface, that we have put some care into designing for this specialised task.

Tony v1.0 on Linux
Tony v1.0 on Linux

Matthias (and several co-authors, but primarily he) wrote a conference paper about Tony which can tell you more about the motivation for and design of the program. If you’re interested, find that here.

Tony is free, open-source software and is available pre-compiled for Windows, OS/X, and Linux. See its project page for more.

Academics · Code · Uncategorized

Undergraduate programming languages

I read two quite different articles about programming in academia today.

I don’t know Yossi Kreinin, and when his piece Why bad scientific code beats code following “best practices” appeared on the Hacker News front page, I guessed that I probably wouldn’t agree with it. I’m a programmer working in academia who has spent some time trying to find ways to improve the code that researchers write, and I don’t want to be told I’m wasting my time.

But on the whole I agreed with him. A totally naive programmer is going to produce messy, organic, but basically linear code that will usually be easier to understand and work with than the code of someone who has learned a bit of Java and wants to put everything in a factory class. The part I don’t agree with of course, and that I suspect he put in just for the sake of the headline, is that these really are best practices.

I think that one of the hidden goals of a project like Software Carpentry is to teach that the reason software developers appear to be “too clever”, and somehow unreachable for the scientist outside the software field, is that they are being too clever. Programming gets over-complicated; you can learn to write good code (that is, readable code) more easily than you can learn to write bad code the professional programmers’ way.

I also identified with Yossi’s line that “I claim to have repented, mostly”. Most of the code I’ve written in my career is not very good, by this standard. It’s a long path to enlightenment.

The other article was by my friend Christophe Rhodes: What is a good language to teach at undergraduate level for Computing degrees?

A computing degree is an odd thing. Computer science is the theory of how computers and programs work. Computing, as a subject, is computer science plus some stuff about how we should actually program them in order to get a job done. The two are very different.

Christophe breaks down objectives of a computing degree: Think, experiment, job, career, study, society. I’m not very familiar with the study objective, which refers to postgraduate study in a computing department (something I never did). Think refers to teaching students how to think and solve problems computationally. I think he may be missing a step, and it may be useful to separate thinking about how the computer works (“understand”?) from thinking about how the programmer can work.

There is also, of course, the question of how to avoid making your programmer feel too clever.

As an undergraduate in 1990-94, I was taught, in this order, Prolog, C, ML, 68k assembly language, and C++. I also learned some Lisp, though I can’t remember being formally taught it. Prolog, C, and the assembly language were all good bases for what I referred to as the understand objective, getting something out of the history of computing and learning about how computers work. ML was a wonderful introduction to thinking, and it’s no coincidence that I’ve recently been programming in an ML variant as an engaging alternative to work.

The hard one to evaluate is C++. It was a very poor teaching language for object-oriented programming, which is a pity because at the time we learned it, I didn’t get object-oriented programming at all. And we learned nothing about any other kind of programming from it, having already been taught three different high-level languages. So it contributed very poorly to the think objective and not at all to the understand one. It’s an awful language to learn to write clear, reliable code in, therefore bad from the society perspective, and you can (as I have) spend 20 years learning to write it, which is obviously ridiculous, hence you would think also bad from the job perspective.

But it turns out that C++ has been the most valuable job language there could be, because (a) it is resiliently portable: it turns out that write-once-run-anywhere with a virtual machine comes and goes in waves, depending on the whim of the top operating system provider of the time, but compiling to machine code seems to be eternal; (b) it is just about able to ape a number of programming paradigms, so you can get away with adopting a style without adopting another language; and (c) its complexity means that it looks good on your CV. I honestly wish, as a 20+-year C++ programmer, that we could make it so that nobody ever had to learn C++ again, but even now I don’t think that is the case.

The one goal completely unaddressed by my own degree, and almost every language I’ve learned since, is experiment. I think there are two essentials for this: a responsive interactive interpreter, and visual responsiveness, like a graphical scene environment or plotting tools. Most of the things you want to do here are probably growing up in Javascript and HTML5.

I don’t really know anything about teaching, about the actual on-the-ground business of making people learn stuff. So you shouldn’t listen to me on this next bit, I’m just daydreaming. But if I were planning a 3-year computing degree course in my head now, I think I would aim to teach, in this order:

  • Python, for the basics of procedural computing. It’s the cleanest language for doing satisfying simple loops and input/output transformations, and is a sound general-purpose language.
  • Clojure. I’ve decided now that I don’t so much care for Lisp syntax day-to-day, but a Lisp gives you so much to talk about and investigate without getting lost in the specific requirements of the language. And a Lisp on the JVM gives you more depth: advanced students could learn quite a lot about Java without ever actually being taught Java.
  • Some assembly language, probably ARM and preferably with a nice visible bit of circuit board on hand.
  • Javascript, but not aiming to teach the language so much as using a specific interactive framework and with a specific small game-development project to complete. I would probably also use Typescript rather than untyped Javascript.
  • Haskell. As an ML guy it was always my enemy, but Haskell is the functional language that has endured. It’s a follow-on course from Lisp.
  • C++. Because, as far as I can see at the moment, you pretty much have to. But please, give them modern-style pointer ownership and RAII (i.e. avoiding explicit heap allocation).
  • Python again, in a closing course that taught people how best to do things in an actual working environment. Testing, not being too much of a smartarse, etc.

But the odd thing about that set of languages is that, just as with my own degree, you never get a very good dedicated object-oriented language. Maybe Objective-C could replace C++; it’s probably a clearer pedagogical object language, but C++ is everywhere while Objective-C is effectively platform-specific, even if it is a very popular platform.

And be sure to teach them version control.

Oh, and don’t forget to add a double-entry book-keeping course. It’s probably more useful than the programming stuff.

 

Academics · Code · Flat Things

Can you develop research software on an iPad?

I’ve just written up a blog article for the Software Sustainability Institute about research software development in a “post-PC” world. (Also available on my project’s own site.)

Apart from using the terms “post-PC”, “touch tablet”, “app store”, and “cloud” a disgracefully large number of times, this article sets out a problem that’s been puzzling me for a while.

We’re increasingly trying to use, for everyday computing, devices that are locked down to very limited software distribution channels. They’re locked down to a degree that would have seemed incomprehensible to many developers ten or twenty years ago. Over time, these devices are more and more going to replace PCs as the public idea of what represents a normal computer. As this happens, where will we find scientific software development and the ideals of open publication and software reuse?

I recognise that not all “post-PC” devices (there we go again) have the same terms for software distribution, and that Android in particular is more open than others. (A commenter on Twitter has already pointed out another advantage of Google’s store that I had overlooked in the article.) The “openness” of Android has been widely criticised, but I do believe that its openness in this respect is important; it matters.

Perhaps the answer, then—at least the principled answer—to the question of how to use these devices in research software development is: bin the iPad; use something more open.

But I didn’t set out to make that point, except by implication, because I’m afraid it simply won’t persuade many people. In the audio and music field I work in, Apple already provide the predominant platform across all sizes of device. If there’s one thing I do believe about this technology cycle, it’s that people choose their platform first based on appeal and evident convenience, and then afterwards wonder what else they can do with it. And that’s not wrong. The trick is how to ensure that it is possible to do something with it, and preferably something that can be shared, published, and reused. How to subvert the system, in the name of science.

Any interesting schemes out there?

Academics · Code

SoundSoftware 2012 Workshop

Yesterday the SoundSoftware project, which I help to run, hosted the SoundSoftware 2012 Workshop at Queen Mary. This was a one-day workshop about working practices for researchers developing software and experiences they have had in software work, with an eye to subjects of interest to audio and music researchers.

You can read about the workshop at the earlier link; I’d just like to mention two talks that I found particularly interesting. These were the talk from Paul Walmsley followed by that of David Gavaghan.

Paul is a long-serving senior developer in the Sibelius team at Avid (a group I’m interested in already because of my former life working on the notation editor aspects of Rosegarden: Sibelius was always the gold standard for interactive notation editing). He’s an articulate advocate of unit testing and of what might be described as a decomposition of research work in such a way as to be able to treat every “research output” (for example, presenting a paper) as an experiment demanding reproducibility and traceable provenance.

Usefully, he was able to present ideas like these as simplifying concerns, rather than as arduous reporting requirements. At one point he remarked that he could have shaved six months off his PhD if he had known about unit testing at the time—a remark that proved a popular soundbite when we quoted it through the SoundSoftware tweeter.

(I have an “if only I’d been doing this earlier” feeling about this as well: Rosegarden now contains hundreds of thousands of lines of essentially untested code, much of which is very fragile. Paul noted that much of the Sibelius code also predates this style of working, but that they have been making progress in building up test cases from the low-level works upward.)

David Gavaghan took this theme and expanded on it, with the presentation of his biomedical project Chaste (for “cancer, heart, and soft tissue environment”). This remarkable project, from well outside the usual field we usually hear about in the Centre for Digital Music, was built from scratch in a test-driven development process—which David refers to as “testfirst”. It started with a very short intensive project for a number of students, which so exercised the people involved that they voluntarily continued work on it up to its present form: half a million lines of code with almost 100% test coverage that has proven to avoid many of the pitfalls found in other biomedical simulation software.