Mp3 decoding with the MAD library: We’ve all been doing it wrong

The MAD mp3 decoder library is widely used in open source applications that play or edit mp3 audio files.

It’s a respected library that consists of high quality C code, has a fairly friendly API, and was evidently written with great care. It’s now getting old (last updated in 2004) but people trust it.

I discovered this week that I’ve been using this library wrong for many years in a couple of small ways. I checked the code of a few other open source applications that use it, and found that all of them (including widely-used programs like Audacity) suffered at least one of the same problems as mine did. We’ve all been doing it wrong.

Here’s what almost every user of this library seems to be doing wrong:

  1. If an mp3 file starts with a Xing/LAME information frame, they are feeding that frame to the mp3 decoder rather than filtering it out, resulting in an unnecessary 1152 samples of silence at the start of the decoded audio. (This is in addition to the variable mp3 encoder delay, and note that the metadata frame is not the same thing as an id3 tag — those are not actually mp3 frames and so don’t have the same problem.)
  2. More importantly, they aren’t providing the decoder an expected but undocumented small block of zero data at the end of the file. Without this, it loses synchronisation on the last mp3 frame, which is consequently never decoded. This causes the decoded audio to be truncated by up to 1152 samples.

Here’s an example audio file you can use to check an application: (audio file link). This file contains two very short bursts of noise, one right at the start of the file and the other at the end, separated by a second and a half or so of silence.

After decoding with MAD, the first burst should start around 0.025 seconds in, and the second should finish just before the end of the decoded audio.

If you load this in an application that uses MAD and find the first burst starts around 0.05 sec, then you have the first of the above problems. If only one of the two bursts is there, or the second is shorter than the first, then you have the second.

My own Sonic Visualiser v2.5 suffers from both problems:

screenshot-from-2016-11-26-19-22-00

But both are fixed in the repository, and will be fixed in the forthcoming release:

screenshot-from-2016-11-26-19-23-28

(If both bursts are there and they appear exactly at the start and end of the file without any padding silence at all, then your decoder not only handles these details correctly but also interprets the LAME information frame and accounts for the encoder delay and padding listed in there. Sonic Visualiser doesn’t do that even after this fix, but that could change!)

I’ve also started feeding some fixes to a few other projects (e.g. this pull request for the more serious of those problems in Audacity).

The root of the problem I think is that MAD is an mp3 stream decoder and not an mp3 file decoder. These two things are almost the same, as an mp3 file is just a sequence of stream frames with no file header: if you concatenate two mp3 files you get a valid mp3 file containing the concatenation of the two audio streams. But the fact that MAD doesn’t deal with files means that it doesn’t know when a file has ended, and it doesn’t know about file metadata frames, and these turn out to be things you have to handle in the calling code.

Users of the library maybe don’t realise this because the documentation is quite limited. Developers are pointed to an example program (called minimad) which itself fails to deal with either of these things. There is an official program called madplay that handles both of them properly and could serve as an example, but people don’t seem to be all that conscious of it — it isn’t widely packaged for Linux distributions for example, and until this week I had never looked at its source code.

There ought to be lessons here for both library users and library authors, but I’m not completely sure what those lessons are.

Library users should be testing their import code by comparison with expected decoded data, but I was actually already doing that and I still missed both problems. (I allowed for the mp3 encoder delay by accommodating any amount of leading silence in my tests, so I missed that there was more than there should be; and I foolishly checked whether the decoded data matched the expected data throughout its extent rather than the other way around, so missing that it had been truncated.)

This is probably also a case for using higher-level libraries like CoreAudio (or gstreamer, except that I think gstreamer also gets this wrong in its MAD plugin). Using format-specific open source libraries gives you consistent portability across platforms from a single codebase, but that doesn’t help much if you are deceived by the differences between different format libraries and end up not using them correctly.

For library authors the lesson really seems to be that people will copy the code you give them expecting it to be a complete example for the most obvious use case. If the two don’t match, there’ll be trouble.

I’d be interested to hear about any examples of open source software that get the MAD decoder right.

MIREX 2016 submissions

This year, for the fourth year in a row, we submitted a number of Vamp audio analysis plugins published by the Centre for Digital Music to the annual MIREX evaluation. The motivation is to give other methods a baseline to compare against, to compare one year’s evaluation metrics and datasets against the next year’s, and to give our group a bit of visibility. See my posts about this process in 2015, 2014, and 2013.

Here’s a review of how we got on this year. We entered an extra category compared to last year, a makeshift entry in the audio downbeat estimation task, making this the widest range of categories we’ve covered with these plugins in MIREX so far.

Structural Segmentation

Results for the four datasets are here, here, here, and here. I don’t find the evaluations any easier to follow than I did last year, but I can see that both of our submissions (Segmentino from Matthias Mauch and the older QM Segmenter from Mark Levy) produced the same results as expected from previous years.

Segmentino actually comes across well in this year’s results, not least because the authors of last year’s best method (Thomas Grill and Jan Schlüter) didn’t submit anything this time.

Multiple Fundamental Frequency Estimation and Tracking

Results here and here. Our Silvet plugin performed much as before: reasonably well, though as usual in such a hard task, with hugely varying results from one test case to another.

Audio Onset Detection

Results here. Many more submissions than last year, which was already a broader field
than the year before. Our two old plugins score the same as they did last year, but are no longer placed last, as three of the new submissions have lower scores.

Audio Beat Tracking

Results here, here, and here. Our BeatRoot and QM Tempo Tracker are once again placed near the back. There’s little change from last year at the top, still occupied by the work of Sebastian Böck and Florian Krebs — work which the authors have, to their great credit, made available as freely-licensed, readable, and well-documented Python code in the madmom library.

Audio Tempo Estimation

Results here. Only two entries this year, our QM Tempo Tracker and Sebastian Böck’s entry from the aforementioned madmom.

Audio Downbeat Estimation

Results here. In this category we submitted the QM Bar and Beat Tracker plugin by Matthew Davies, which has been around for a few years; it’s based on the QM Tempo Tracker with an additional downbeat estimator.

The results don’t come across very well, for varying reasons according to the dataset. The QM Bar and Beat Tracker needs to be prompted with the time signature and (following a last-minute decision to enter the category this year) I submitted a script which assumed fixed 4/4 time. This meant we knowingly threw away the Ballroom category, which was all 3/4, but the plugin was also ill-suited to several of the other categories. Not a strong submission then, but interesting to see.

Audio Key Detection

Results here and here. Last year I lamented the lack of any other entries than ours, since the category had just gained a second (and more realistic) test dataset. So I’m delighted to see a couple of new submissions this year, including one from Gilberto Bernardes and Matthew Davies at INESC in Porto which appears to perform well.

Audio Chord Estimation

Results here, now up to five test datasets. Last year saw a torrid time with a bug in the Chordino plugin, but this year it’s back to normal. Chordino still performs well, but in a strong category this year it’s no longer one of the top performers.

 

Why I will be voting “in” this Thursday

Although the public debate about this week’s EU referendum in the UK has become absurdly bitter on both sides, I have had some constructive talks about the subject with people around me, even where we have disagreed. There is, or was, a reasonable debate to be had and it’s a pity we haven’t seen a sensible national discussion about it.

In the spirit of trying to be positive: here are five reasons why I would like the UK to remain in the EU, without talking about the personalities or made-up economic projections coming from the campaigns on either side.

1. The EU has a useful role in the UK in terms of long-term oversight

This country has no written constitution and has an effectively two-party parliamentary system in which each new government starts by setting out to undo whatever its predecessor did. Institutions like the European Court of Human Rights give us both longer-term continuity and a moderating influence across the various ideologies of the European states. They’re a good thing.

I might feel differently on this if I thought the Leave campaign were keen to make up for exit with better constitutional protections in the UK. Unfortunately the impression I get is the opposite.

(I think this argument holds even for lower-level things like food labelling and sourcing regulations. After all, those are also the regulations that mean a Cornish pasty is a pasty from Cornwall wherever you buy it in the EU, not just a meat pie from a factory in Denmark with Cornish Pasty printed on the pack.)

2. Our position within the EU is a great one

We have full membership of the EU without the tricky bit (the Euro) and with a membership rebate that we could never negotiate again. It’s the best of both worlds already. Any country in the world would envy that.

3. Leaving won’t give us more independence

I understand the argument that a state should strive to be self-determining as far as possible. I just don’t think that leaving the EU would have a happy outcome in that respect.

It wouldn’t change anything about who runs this country or how they run it, and it wouldn’t send a message that anybody would be equipped to act on. Our government would continue to have the same pro-business pro-international-collaboration outlook, for good or bad. We would almost certainly end up leaning more than ever on the USA, a country we would no longer have much to offer in return, while scrabbling around for other partnerships and making poorer deals with other European states.

4. Immigration is a red herring, but freedom of movement is a good thing

Immigration is clearly a subject that people feel viscerally about. But the sort of mass migration being exploited for this argument, of refugees from Syria for example, has nothing to do with the subject we’re supposed to be deciding on — we already turn those people away (Calais, remember?). I obviously have views about that (who doesn’t) but it makes no sense for it to be a pivotal subject for this referendum.

What is relevant is freedom of movement for workers within the EU. I think this is a good thing, partly because it’s how we can have world-leading research labs like (ahem) the one I work in, and partly because it cuts both ways — Britons can and do move abroad as well (permanently or temporarily) and this openness is a great part of providing opportunities and prospects for future generations.

People of my age or older may remember the 80s TV series Auf Wiedersehen, Pet, a comedy about British builders working in Germany. A central prop of that programme was that there was something ramshackle about their arrangement and that they were at the mercy of exploitative employers and tax rules as migrant workers. We’ve become unused to thinking of British migrant workers as being exploited in this way.

I know that there is also a narrative about other EU citizens coming to the UK simply to claim benefits. The great majority of people who move here do so either to work or to study, or because they are married to British citizens. Many British citizens draw benefits abroad as well. The overall balance of numbers doesn’t in any way reflect the anxiety people have about it. That anxiety is serious, but it isn’t something that this referendum can properly address with either outcome.

The question of what would happen to EU workers who are already in the UK, if we left, seems like such a massive quagmire that I don’t want to think about it. I don’t think it could be very harmonious.

5. I’d like to see positivity prevail

There’s something very British about willingly engaging in an endeavour (after a referendum!) and then whingeing about it constantly for the next 40 years.

The tone from British media and politicians for decades now has been mostly about how onerous the EU is and “what can it do for us?”, very seldom about the power it gives us or what we can do together with the other countries within it. This negative guff is forced on us by media barons who genuinely have no reason to give a damn about us in the first place, and it ends up setting a very miserable tone. Let’s resist!

 

Naming conventions in Standard ML

Many programming languages have a standard document that describes how to write and capitalise the names of functions, variables, and source files. It’s especially useful to have a standard for writing names made up from more than one word, where there are various options for how to join the words: “camel case”, which looks likeThis (with a capital letter “hump” in the middle), or “snake case”, which is underscore_separated.

I think Java in the mid-90s was the first really mainstream language to standardise file and variable naming conventions. The Java package mechanism requires files to be laid out in a particular way, and Sun published Java coding conventions which quickly became an effective standard for class and variable naming. Other languages followed. Python has had a standard that covers naming (PEP8) since 2001. More recent examples include Go and Swift.

Older languages tend to be less consistent. C++ is a mess: the standard library and most official example material uses snake_case for most names, but a great many developers, including those on most of the projects I’ve worked on, prefer camelCase, with capital initials for class names. File names are even more various: C++ source files are seen with .cpp, .cxx, .cc, and .C extensions; C++ header files with .h, .hpp, or no extension at all.

Standard ML (SML) is also a mess, and an interesting one because the language itself was standardised in 1990 and has been completely unchanged since the standard was revised in 1997. So although it is super-standardised, it’s a bit too old to have caught the wider shift in sentiment toward prescribing things like naming and file structure.

The SML standard is formal and very focused. It says nothing about coding style or naming, contains almost no examples using compound names, says nothing about filenames or file organisation, and specifies no way for one file to refer to another — the standard is indifferent to whether your source code is held in a file at all.

In trying to establish what naming conventions to use for my own code, I decided to look around at some existing libraries in SML to see what they had settled on.

The Basis library

SML has a standard library, the Basis library, which is a bit more recent than the language itself. Although it isn’t prescriptive, the library does use certain conventions itself and the introductory notes explain what they are. These cover only names of things within a program — not filenames, which are left up to the implementor of the standard. I’ll refer to them in the table below.

The Cornell style guide

Top search result for “SML naming conventions” for me is this online style guide for the Cornell CS312 course. It doesn’t cover file naming. Given the limited industry uptake for SML, an academic guide may be proportionately more influential than for other languages. I’ll mention this guide below as well.

Other code I looked at

I took a look at the following code:

  • The source of the MLton, MLKit, and SMLSharp compilers (excluding accompanying utility libraries)
  • The Basis library implementations shipped with MLton and SMLSharp
  • The SML/NJ extended library
  • The source of the Ur/Web language
  • The Ponyo library, an interesting fledgling effort to produce a broader base library than the Basis

In total, about 444,500 lines of code across 1790 SML source files. Some (presumably automatically-generated) source files are very long; while the mean file length is 248 lines including comments and blanks, the median is only 47.

Names within the language

The SML language has at least seven categories of things that need names: variables, type names, datatype constructors, exceptions, structures, signatures, and functors.

(By “variables” I really mean bindings, i.e. the vast majority of ordinary things with names: things that in a procedural language might include function names, variable names, and constant declarations. I’m using the word “variable” because it’s such a familiar everyday programming term.)

Source Variable Type name Datatype constructor Exception Structure Signature Functor
mlton variableName (mixed) DatatypeCtor ExceptionName* StructureName SIGNATURE_NAME FunctorName
mlkit (mixed) (mixed) DatatypeCtor* ExceptionName* StructureName SIGNATURE_NAME FunctorName
smlsharp variableName typeName* DATATYPE_CTOR* ExceptionName StructureName SIGNATURE_NAME FunctorName
basis variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName
smlnj-lib variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
urweb variableName type_name* DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
ponyo variableName typeName DatatypeCtor ExceptionName Structure_Name SIGNATURE_NAME Functor_Name
cornell variableName type_name DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorName

* mostly

Here’s what I found, categorised into universal conventions, usual conventions, and “other”.

Universal

The following is the only universal convention:

Signature
SIGNATURE_NAME

The only code I found that doesn’t follow this convention is in the SML standard itself, which omits the underscore (like SIGNATURENAME).

Usual

The following conventions are not universal, but more popular than any other.

Variable Type name Exception Structure Functor
variableName type_name ExceptionName StructureName FunctorName

Camel case is clearly idiomatic for everything except type names. MLKit contains some snake-cased bindings as well, but none of the other libraries did. I like snake case in SML and I’ve written a fair bit of code using it myself; I hadn’t realised until now how uncommon it was. (It’s more common in SML’s sibling language OCaml. Ironic that, of the three very similar languages SML, OCaml, and F#, the only one not to use camel case is called OCaml.)

I spotted a handful of all-caps exception names and some camel case type names, but no library preferred those consistently.

The Ponyo library differs from the above for structures (Structure_Name) and functors (Functor_Name).

The SML/NJ library sort-of differs for functors, which are given a Fn suffix (FunctorNameFn). But you could think of this as part of the name, in which case the convention is the same.

Most type and datatype names used in public APIs are single words, or even single letters, so the convention often doesn’t matter for those.

Other

There seems to be no consensus about datatype constructors — I found DatatypeConstructor and DATATYPE_CONSTRUCTOR in roughly equal number.

Filenames

Nothing in the SML standard or Basis library cares about what source files are called, what file extension they use, or how you divide your code up among them. Some compilers might care, but most don’t. The business of telling the compiler which files a program consists of, or of expressing any relationships between files, is left up to external tools. SML has neither header files nor import directives.

This makes fertile ground for variety in naming schemes.

I’m going to consider only filenames that are associated with a primary structure, signature, or functor. Here’s the table.

Source Structure Signature Functor
mlton structure-name.sml signature-name.sig functor-name.fun
mlkit StructureName.sml SIGNATURE_NAME.sml* FunctorName.sml
smlsharp StructureName.sml SIGNATURE_NAME.sig* FunctorName.sml
mlton-basis structure-name.sml signature-name.sig functor-name.fun
smlsharp-basis StructureName.sml SIGNATURE_NAME.sig (none)
snlnj-lib structure-name.sml signature-name-sig.sml functor-name-fn.sml
urweb structure_name.sml signature_name.sig (n/a)
ponyo Structure_Name.sml SIGNATURE_NAME.ML Functor_Name.sml

* mostly

Clearly very inconsistent. There are no universal or usual conventions, only “other”.

Behind this there is a wider question about code organisation in files — should each signature live in its own file? Each structure? In many cases they do, but that is also far from universal.

If you use a scheme in which filenames are clearly derived from signature and structure names, does that mean you shouldn’t put more than one structure in the same file? What do you do with code that is not in any structure? Really it’s a pity to have to think about filenames at all, in a language that is so completely indifferent to file structure.

A Reasonable Recommendation

A plausible set of rules based on the above.

For names within the language:

Variable Type name Datatype constructor Exception Structure Signature Functor
variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName

This is the style used by the Basis library. Apart from datatype constructors, everything here was in the majority within the libraries I looked at.

For datatype constructors it seems reasonable to pick the most visible option and one that is consistent with the names in Basis. (This differs from the Cornell guide, however.) There is no confusion between these and signature names, because signature names never appear anywhere except in the declaration lines for those signatures and the structures that implement them.

For filenames:

Structure Signature Functor
structure-name.sml signature-name.sig functor-name.sml

The logic here is:

  • It’s still not a great idea to expect a case sensitive filesystem, so all-one-case is good
  • Generally use .sml extension for SML source
  • But the .sig extension for signatures seems very widely used, and it’s fair to make public signatures as easy to spot as possible
  • The .ml extension is not a great idea because it clashes with OCaml
  • The .fun extension used by MLton is a bit obscure, and you don’t always want to separate out functors (if you want to make functors more distinctive, give them names ending in Fn, as the SML/NJ library does).

 

Console games and local multiplayer

We just got a Playstation 4, and have been a bit disappointed by the lack of good local multiplayer support in the games we’ve tried so far. I reckon every console game should support local multiplayer if it can: after all that’s the main thing that makes a console different from a PC.

(To be honest it pains me even having to write “local” multiplayer. I think of it as just multiplayer. The idea that players could exist elsewhere on the internet is flying in the face of nature.)

We got two games bundled with our PS4, Overwatch and Ratchet & Clank, both of which are neat games but neither of which supports local multiplayer at all. In both cases this is naively a bit of a letdown. Overwatch is a team game, exactly the sort of thing you want to play while sitting around with your friends (but you can’t! sorry), and Ratchet & Clank is a game with two protagonists that you might hope to be able to control independently in the style of the Lego adventures (but oops! sorry again).

It’s possible we just made a bad choice of bundled games, as there were other options. I don’t think any of the others were any better, but it’s not easy to tell from reading around, because summaries of the games online often don’t talk about this. (Does Uncharted 4 have local multiplayer? Does the PS4 version of DOOM? Though that’d be a bit wrong, DOOM is supposed to be played alone, jumpy and sweating and with the lights off.)

We’re on the lookout for possibilities. I’m sure there must be plenty; they’re just perhaps not the most promoted titles, maybe because their prime audience is not game journalists. So far we have Rocket League and the evergreen FIFA, both of which are pretty nice. I’m fairly clueless about the Playstation landscape, having a general affinity for Nintendo. Let me know if you have any more suggestions.

 

F♯ has possibilities

A couple of months ago, Microsoft announced that they were buying a company called Xamarin, co-founded by the admirable Miguel “you can now flame me, I am full of love” de Icaza. (No sarcasm — I think Miguel is terrific, and the delightfully positive email linked above really stuck with me; if only I could have that attitude more often.)

As I understand it, Xamarin makes

  1. the Mono runtime, a portable third-party implementation of Microsoft’s .NET runtime for the C# and F# programming languages
  2. the eponymous Xamarin frameworks, which can be used with .NET to develop mobile apps for iOS and Android
  3. plugins for the Visual Studio IDE on Windows and the MonoDevelop IDE on OS/X to support mobile platform builds using Xamarin (the MonoDevelop-plus-plugins combo is known as Xamarin Studio).

Then a couple of days ago, the newly-acquired Xamarin declared

  1. that the Mono runtime was switching from LGPL/GPL licenses to MIT, allowing no-cost use in commercial applications
  2. that Microsoft were providing a patent promise (which I have not closely read) to remove concerns for commercial users of Mono
  3. that the Xamarin frameworks for iOS and Android, and the IDE plugins, were now free (of cost)
  4. that at some future point the Xamarin frameworks would be open sourced

I’m trying to unpick exactly what this could mean to me.

According to this discussion on Hacker News, the IDE plugins are remaining proprietary (which appears to mean that no IDE on Linux will be supported, since the IDE plugins are not currently available for Linux) but that “the Xamarin runtime and all the commandline tools you need to build apps” will be open sourced.

What this means

as I understand it,

  • Developers working on proprietary .NET applications will be able to build and release versions for other platforms than Windows, using Mono, at no extra cost
  • Developers working on open source .NET applications will be able to publish the ensemble with Mono under the MIT license if desired and will (apparently) be free of patent concerns
  • Developers will be able to make both proprietary and open source .NET applications for iOS and Android at no cost using Windows and OS/X
  • There is a possibility of being able to do builds of the above using Linux as well once the SDK is open, though probably without an IDE

Unrelatedly, there are separate projects afoot to provide native code and to-Javascript compilers for .NET bytecode.

What I’m interested in

I do a range of programming including a mixture of signal-processing and UI work, and am interested in exploring comprehensible, straightforward functional languages in the ML family (I wrote a little post about that here). Unlike many audio developers I have relatively limited demands on real-time response, but everything I write really wants to be cross-platform, because I’ve got specialised users on pretty every common platform and I have limited time and funding. (I understand that cross-platform apps are often inferior to single-platform apps, but they’re better than no apps.)

Xamarin doesn’t quite meet my expectations because it’s not really a cross-platform framework in the manner of Qt (which I use) or JUCE (which is widely used by others in my field). Instead of providing a common “widget set” across all platforms, Xamarin provides a separate thin interface to the native UI logic for each platform. It’s hard to judge how much more work this is, without knowing where the abstraction boundaries lie, but it may be a more relevant and sensible distinction on mobile platforms (where the differences are often in interaction and layout) than desktops (where the differences are mostly about how large numbers of individual widgets look).

An ideal combination of language and framework for me goes something like

  • strongly-typed, mostly functional, mostly immutable data structures
  • efficient unboxed support for floating-point vector types, including SIMD support
  • simple syntax (SML is nice)
  • low-cost foreign-function interface for C integration
  • high-level approach to multithreading
  • can work with gross UI layout in HTML5 (possibly DOM-update reactive UI style?)
  • good libraries for e.g. audio file I/O, signal processing, matrix algebra
  • can develop on Linux and deploy to all of Linux, Windows, OS/X, iOS, Android
  • free (or cheap, for proprietary apps) and open source (for open source apps)
  • has indenting Emacs mode

Where F# appears to score

F#, Microsoft’s ML-derived functional language for the .NET CLR, hits several of these. It has the typing, mostly-functional style, syntax, FFI, multithreading, libraries, deployment and licensing, and potentially the development platform (if the open source Xamarin framework should lead to the ability to build mobile apps directly from Linux).

I’m not sure about floating-point and vectors or about reusable HTML-style UI. I’d like to make the time to do another comparison of some ML-family languages, focusing on DSP-style float activity and on threading. I’ve done a bit of related work in Standard ML, which I could use as a basis for comparison.

Unless and until I get to do that, I’d love to hear any thoughts about F# as a general-purpose DSP-and-UI language, for a developer whose home platform is Linux.

My impression from the feedback on my earlier post was that the F# community is both enthusiastic and polite, and I notice that F# is the third most-loved language in the StackOverflow’s 2016 survey. Imagine a language that is useful no matter what platform you’re targeting, and whose developers love it. I can hope.