SoundSoftware tutorial at AES 53

I’ll be co-presenting the first tutorial session at the Audio Engineering Society 53rd Conference on Semantic Audio, this weekend.

(It’s the society’s 53rd Conference, and it happens to be about semantic audio. It’s not their 53rd conference about semantic audio. In fact it’s their second: that was also the theme of the AES 42nd Conference in 2011.

What is semantic audio? Good question, glad you asked. I believe it refers to extraction or estimation of any semantic material from audio, including speech recognition and music information retrieval.)

My tutorial, for the SoundSoftware project, is about developing better and more reliable software during research work. That’s a very deep subject, so at best we’ll barely hint at a few techniques during one programming exercise:

  • making readable experimental code using the IPython Notebook, and sharing code for review with colleagues and supervisors;
  • using version control software to manage revisions and escape disaster;
  • modularising and testing any code that can be used in more than one experiment;
  • packaging, publishing, and licensing code;
  • and the motivations for doing the above.

We presented a session at the Digital Audio Effects (DAFx) conference in 2012 which covered much of this material in presentation style, and a tutorial at the International Society for Music Information Retrieval (ISMIR) in 2012 which featured a “live” example of test-driven development in research software. You can find videos and slides from those tutorials here. The theme of this one is similar, and I’ll be reusing some code from the ISMIR tutorial, but I hope we can make this one a bit more hands-on.

 

Looking at the Sonic Visualiser user survey (part 1)

Ever since Sonic Visualiser hit version 1.7 in mid-2009, it has included a survey feature to find out what its users think of it.

It waits until you’ve used it a few times. Then it pops up a dialog, just once, asking if you’d like to fill in the survey.

If you say yes, you get the survey page in your browser. If you say no, it won’t ask again—not even after an upgrade to a new version (unless you reinstall on a different machine).

This survey has been running ever since, unchanged, and has been completed over 1000 times. We’ve periodically read through the survey submissions, but we haven’t previously published any results from it. Since the survey was designed rather hastily four years ago and it’s high time we updated it, this is probably a good time to catch up on the responses before we do that.

What’s in this post

The survey had both open questions (with big text fields) and simple multiple-choice ones. This post will deal with numerical results from the simple questions.

Many of these results are pretty basic, so please don’t be disappointed if the analysis doesn’t turn out to be all that exciting. If you have any suggestions or questions, please do post a comment!

I intend to follow up by summarising the open questions in another post.

Number and distribution of responses

We have 1071 responses in total, from 6 October 2009 to 25 April 2013 (as of this analysis—the survey is still open).

However, I won’t be using all of those here. Owing to “technical problems” (and/or my incompetence) some responses from mid-2010 have been lost, so to ensure the record doesn’t have any holes in it, I’ll be limiting this post to the 821 responses from 11 Oct 2010 onwards. Here’s the number of responses per quarter:

Note that the most recent quarter (starting April 2013) only has three weeks’ worth of responses.

(Every chart in this post is linked to the data in text format, so click through if you’re interested in the numbers.)

Who are these people?

We asked,

Which of the following best describes your position?

  • A student, researcher, or academic in music
  • A student, researcher, or academic in audio engineering, audio analysis, multimedia, or a related discipline
  • I am employed in some field that is related to my use of Sonic Visualiser
  • I use Sonic Visualiser solely for personal purposes
  • None of the above

Sonic Visualiser comes from an academic environment, and if you add up the slightly arbitrary academic subdivisions they’re close to an overall majority, but there are plenty of personal-use responses and quite a few professionals:


Approximate IP geolocation shows that most respondents come from the US and Europe. Here are the top ten countries:

But 66 countries are represented in total, and the top ten only make up 70% of responses.

Platform, browser, and software version

Windows users are most numerous, while Linux users appear to be relatively on the wane. (Their numbers aren’t actually decreasing, they just haven’t increased as much). Neither of these surprises me, but I am surprised that Windows has been going up more than OS/X. Maybe Mac users don’t like being asked to fill in surveys.

As you might expect, academics, particularly in music, are relatively likely to be using OS/X, while a high proportion of those using SV for personal use are doing so on Windows.

Linux is overrepresented in France, which makes sense, as it is a civilised nation.

Firefox is the most common browser, but it’s been losing out here as everywhere recently. I’m a bit surprised that IE is only in third place even on Windows. I’m probably just a decade or so behind the times.

Few surprises in the breakdown of Sonic Visualiser version number. New versions take over fairly quickly after each release, but that’s to be expected because the survey only polls new installations—this doesn’t tell us anything about upgrade rates.

Linux users seem more likely to be using an old version, presumably because they often install from distribution packages.

Ease of use and general contentment

We asked,

Do you enjoy using Sonic Visualiser?

  • Yes, I do!
  • I have no strong feelings about it
  • I don’t enjoy using it, but I haven’t found any other software to replace it
  • I don’t enjoy using it, I use it because I’ve been told to (by a teacher, for example)

and

How easy do you find Sonic Visualiser to use?

  • I find it straightforward to use
  • Getting started was tricky, but I’m OK with it now
  • I can get things done, but it’s frustrating and I’m often caught out by unexpected behaviour
  • I can use a few features, but I don’t understand most of it
  • I don’t understand it at all

Most respondents are happy, but the results for ease of use are less satisfactory:

A great many respondents checked the “getting started was tricky” or “I don’t understand most of it” boxes. I think there is room for a simpler Sonic Visualiser. The open survey questions, to be covered in a subsequent post, might give us more ideas.

Features and plugins

We asked,

Which of the following features of Sonic Visualiser have you used? (Please select all that apply, or none.)

  • Saving and reloading complete sessions
  • Running Vamp plugins
  • Speeding up or slowing down playback
  • Annotation by tapping using the computer keyboard
  • Annotation by tapping using a MIDI keyboard
  • Data import or export using RDF formats
  • Audio alignment using the MATCH plugin
  • Editing note or region layers
  • Image layers

This isn’t a well-judged question. It has too many options and some of them are too ambiguous. In particular, “image layers” was intended to refer to layers in which external images can be attached—quite a niche feature—yet it appears as the third most popular option in the survey:

I assume this means people were (quite reasonably) interpreting “image layers” as meaning “any layers that look like images”, such as spectrograms.

Looking more closely at this, it seems that users who said they used the “image layer” feature were less likely to also report using common features such as session save/load or Vamp plugins, but more likely to report using uncommon features such as MIDI tapping or alignment.

This suggests these respondents could probably be clustered into a large group of novice users who use only the built-in analysis tools on a single audio file at a time (for whom “image layers” means spectrograms), and a smaller group who use many features and for whom, perhaps, “image layers” means layers of image type.

Also worth noting is the generally low number of people reporting use of any single feature—none of the features listed here gained support from more than 50% of respondents. Yet more than 90% of respondents checked at least one box. It seems there are different sets of users starting out with quite disjoint needs.

The survey also included a record of which Vamp plugins were installed. Here are the top ten overall:

Programming

We asked whether users were familiar with any programming languages (from a fixed multiple selection list, plus “Others” box) and whether they would have any interest in developing new plugins.

I was surprised by the language familiarity question: nearly 60% of respondents checked at least one box, and over a third claimed familiarity with C or C++. That’s far more than for Python, MATLAB, Java, Javascript or PHP, but all of those have pretty good showings even so.

Even among academics in the music field, over 40% professed familiarity with some programming language and over 20% with C.

I’m not quite sure what to make of this. Perhaps Sonic Visualiser is so hard to get started with that only very technically-minded users get as far as answering a survey about it!

Some respondents mentioned further languages in the Other box; these are the ones that appeared most often:

(BASIC includes Visual Basic; Lisp variants include Scheme and Clojure.)

Having asked about programming languages, we asked:

Have you ever considered writing Vamp plugins for use in Sonic Visualiser or any other host application?

  • Yes, I have written some plugins already
  • Yes, I’m interested in the idea
  • No, I wouldn’t be technically capable
  • No, I don’t see any reason to
  • No, I’ve looked at Vamp and found the format unsatisfactory in some way

As you can see, most respondents thought they wouldn’t be technically capable, but a pretty high number did express an interest.

Can you develop research software on an iPad?

I’ve just written up a blog article for the Software Sustainability Institute about research software development in a “post-PC” world. (Also available on my project’s own site.)

Apart from using the terms “post-PC”, “touch tablet”, “app store”, and “cloud” a disgracefully large number of times, this article sets out a problem that’s been puzzling me for a while.

We’re increasingly trying to use, for everyday computing, devices that are locked down to very limited software distribution channels. They’re locked down to a degree that would have seemed incomprehensible to many developers ten or twenty years ago. Over time, these devices are more and more going to replace PCs as the public idea of what represents a normal computer. As this happens, where will we find scientific software development and the ideals of open publication and software reuse?

I recognise that not all “post-PC” devices (there we go again) have the same terms for software distribution, and that Android in particular is more open than others. (A commenter on Twitter has already pointed out another advantage of Google’s store that I had overlooked in the article.) The “openness” of Android has been widely criticised, but I do believe that its openness in this respect is important; it matters.

Perhaps the answer, then—at least the principled answer—to the question of how to use these devices in research software development is: bin the iPad; use something more open.

But I didn’t set out to make that point, except by implication, because I’m afraid it simply won’t persuade many people. In the audio and music field I work in, Apple already provide the predominant platform across all sizes of device. If there’s one thing I do believe about this technology cycle, it’s that people choose their platform first based on appeal and evident convenience, and then afterwards wonder what else they can do with it. And that’s not wrong. The trick is how to ensure that it is possible to do something with it, and preferably something that can be shared, published, and reused. How to subvert the system, in the name of science.

Any interesting schemes out there?

Is music recommendation difficult?

My research department works on programming computers to analyse music.

In this field, researchers like to have some idea of whether a problem is naturally easy or difficult for humans.

For example, tapping along with the beat of a musical recording is usually easy, and it’s fairly instinctive—you don’t need much training to do it.

Identifying the instrument that is playing a solo section takes some context. (You need to learn what the instruments sound like.) But we seem well-equipped to do it once we’ve heard the possible instruments a few times.

Naming the key of a piece while listening to it is hard, or impossible, without training, but some listeners can do it easily when practised.

Tasks that a computer scientist might think of as “search problems”, such as identifying performances that are actually the same while disregarding background noise and other interference, tend to be difficult for humans no matter how much experience they have.

Ground truth

It matters to a researcher whether the problem they’re studying is easy or difficult for humans.  They need to be able to judge how successful their methods are, and to do that they need to have something to compare them with.  If a problem is straightforward for humans, then there’s no problem—they can just see how closely their results match those from normal people.

But if it’s a problem that humans find difficult too, that won’t work. Being as good as a human isn’t such a great result if you’re trying to do something humans are no good at.

Researchers use the term “ground truth” to refer to something they can evaluate their work against. The idea, of course, is that the ground truth is known to be true, and computer methods are supposed to approach it more or less closely depending on how good they are. (The term comes from satellite image sensing, where the ground truth is literally the set of objects on the ground that the satellite is trying to detect.)

Music recommendation

Can there be a human “ground truth” for music recommendation?

When it comes to suggesting music that a listener might like, based on the music they’ve apparently enjoyed in the past—should computers be trying to approach “human” reliability? How else should we decide whether a recommendation method is successful or not?

What do you think?

How good are you at recommending music to the people you know best?

Can a human recommend music to another human better than a computer ever could? Under what circumstances? What does “better” mean anyway?

Or should a computer be able to do better than a human? Why?

(I’m not looking for academically rigorous replies—I’m just trying to get more of an idea about the fuzzy human and emotional factors that research methods would have to contend with in practice.)

 

My first conference paper

With my colleagues Luís and Mark, I’ve had a paper accepted for the ICASSP 2012 signal-processing conference:

http://soundsoftware.ac.uk/icassp-2012-accepted

I’ve previously co-written a journal paper and a couple of posters, and I’ve done demos, but this is the first conference paper I’ve ever been the primary author of.

(Though it may turn out to be a poster when presented—I gather the organisers decide after acceptance whether to allocate a poster or paper slot based on the presentation schedule.)