Why I will be voting “in” this Thursday

Although the public debate about this week’s EU referendum in the UK has become absurdly bitter on both sides, I have had some constructive talks about the subject with people around me, even where we have disagreed. There is, or was, a reasonable debate to be had and it’s a pity we haven’t seen a sensible national discussion about it.

In the spirit of trying to be positive: here are five reasons why I would like the UK to remain in the EU, without talking about the personalities or made-up economic projections coming from the campaigns on either side.

1. The EU has a useful role in the UK in terms of long-term oversight

This country has no written constitution and has an effectively two-party parliamentary system in which each new government starts by setting out to undo whatever its predecessor did. Institutions like the European Court of Human Rights give us both longer-term continuity and a moderating influence across the various ideologies of the European states. They’re a good thing.

I might feel differently on this if I thought the Leave campaign were keen to make up for exit with better constitutional protections in the UK. Unfortunately the impression I get is the opposite.

(I think this argument holds even for lower-level things like food labelling and sourcing regulations. After all, those are also the regulations that mean a Cornish pasty is a pasty from Cornwall wherever you buy it in the EU, not just a meat pie from a factory in Denmark with Cornish Pasty printed on the pack.)

2. Our position within the EU is a great one

We have full membership of the EU without the tricky bit (the Euro) and with a membership rebate that we could never negotiate again. It’s the best of both worlds already. Any country in the world would envy that.

3. Leaving won’t give us more independence

I understand the argument that a state should strive to be self-determining as far as possible. I just don’t think that leaving the EU would have a happy outcome in that respect.

It wouldn’t change anything about who runs this country or how they run it, and it wouldn’t send a message that anybody would be equipped to act on. Our government would continue to have the same pro-business pro-international-collaboration outlook, for good or bad. We would almost certainly end up leaning more than ever on the USA, a country we would no longer have much to offer in return, while scrabbling around for other partnerships and making poorer deals with other European states.

4. Immigration is a red herring, but freedom of movement is a good thing

Immigration is clearly a subject that people feel viscerally about. But the sort of mass migration being exploited for this argument, of refugees from Syria for example, has nothing to do with the subject we’re supposed to be deciding on — we already turn those people away (Calais, remember?). I obviously have views about that (who doesn’t) but it makes no sense for it to be a pivotal subject for this referendum.

What is relevant is freedom of movement for workers within the EU. I think this is a good thing, partly because it’s how we can have world-leading research labs like (ahem) the one I work in, and partly because it cuts both ways — Britons can and do move abroad as well (permanently or temporarily) and this openness is a great part of providing opportunities and prospects for future generations.

People of my age or older may remember the 80s TV series Auf Wiedersehen, Pet, a comedy about British builders working in Germany. A central prop of that programme was that there was something ramshackle about their arrangement and that they were at the mercy of exploitative employers and tax rules as migrant workers. We’ve become unused to thinking of British migrant workers as being exploited in this way.

I know that there is also a narrative about other EU citizens coming to the UK simply to claim benefits. The great majority of people who move here do so either to work or to study, or because they are married to British citizens. Many British citizens draw benefits abroad as well. The overall balance of numbers doesn’t in any way reflect the anxiety people have about it. That anxiety is serious, but it isn’t something that this referendum can properly address with either outcome.

The question of what would happen to EU workers who are already in the UK, if we left, seems like such a massive quagmire that I don’t want to think about it. I don’t think it could be very harmonious.

5. I’d like to see positivity prevail

There’s something very British about willingly engaging in an endeavour (after a referendum!) and then whingeing about it constantly for the next 40 years.

The tone from British media and politicians for decades now has been mostly about how onerous the EU is and “what can it do for us?”, very seldom about the power it gives us or what we can do together with the other countries within it. This negative guff is forced on us by media barons who genuinely have no reason to give a damn about us in the first place, and it ends up setting a very miserable tone. Let’s resist!

 

Naming conventions in Standard ML

Many programming languages have a standard document that describes how to write and capitalise the names of functions, variables, and source files. It’s especially useful to have a standard for writing names made up from more than one word, where there are various options for how to join the words: “camel case”, which looks likeThis (with a capital letter “hump” in the middle), or “snake case”, which is underscore_separated.

I think Java in the mid-90s was the first really mainstream language to standardise file and variable naming conventions. The Java package mechanism requires files to be laid out in a particular way, and Sun published Java coding conventions which quickly became an effective standard for class and variable naming. Other languages followed. Python has had a standard that covers naming (PEP8) since 2001. More recent examples include Go and Swift.

Older languages tend to be less consistent. C++ is a mess: the standard library and most official example material uses snake_case for most names, but a great many developers, including those on most of the projects I’ve worked on, prefer camelCase, with capital initials for class names. File names are even more various: C++ source files are seen with .cpp, .cxx, .cc, and .C extensions; C++ header files with .h, .hpp, or no extension at all.

Standard ML (SML) is also a mess, and an interesting one because the language itself was standardised in 1990 and has been completely unchanged since the standard was revised in 1997. So although it is super-standardised, it’s a bit too old to have caught the wider shift in sentiment toward prescribing things like naming and file structure.

The SML standard is formal and very focused. It says nothing about coding style or naming, contains almost no examples using compound names, says nothing about filenames or file organisation, and specifies no way for one file to refer to another — the standard is indifferent to whether your source code is held in a file at all.

In trying to establish what naming conventions to use for my own code, I decided to look around at some existing libraries in SML to see what they had settled on.

The Basis library

SML has a standard library, the Basis library, which is a bit more recent than the language itself. Although it isn’t prescriptive, the library does use certain conventions itself and the introductory notes explain what they are. These cover only names of things within a program — not filenames, which are left up to the implementor of the standard. I’ll refer to them in the table below.

The Cornell style guide

Top search result for “SML naming conventions” for me is this online style guide for the Cornell CS312 course. It doesn’t cover file naming. Given the limited industry uptake for SML, an academic guide may be proportionately more influential than for other languages. I’ll mention this guide below as well.

Other code I looked at

I took a look at the following code:

  • The source of the MLton, MLKit, and SMLSharp compilers (excluding accompanying utility libraries)
  • The Basis library implementations shipped with MLton and SMLSharp
  • The SML/NJ extended library
  • The source of the Ur/Web language
  • The Ponyo library, an interesting fledgling effort to produce a broader base library than the Basis

In total, about 444,500 lines of code across 1790 SML source files. Some (presumably automatically-generated) source files are very long; while the mean file length is 248 lines including comments and blanks, the median is only 47.

Names within the language

The SML language has at least seven categories of things that need names: variables, type names, datatype constructors, exceptions, structures, signatures, and functors.

(By “variables” I really mean bindings, i.e. the vast majority of ordinary things with names: things that in a procedural language might include function names, variable names, and constant declarations. I’m using the word “variable” because it’s such a familiar everyday programming term.)

Source Variable Type name Datatype constructor Exception Structure Signature Functor
mlton variableName (mixed) DatatypeCtor ExceptionName* StructureName SIGNATURE_NAME FunctorName
mlkit (mixed) (mixed) DatatypeCtor* ExceptionName* StructureName SIGNATURE_NAME FunctorName
smlsharp variableName typeName* DATATYPE_CTOR* ExceptionName StructureName SIGNATURE_NAME FunctorName
basis variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName
smlnj-lib variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
urweb variableName type_name* DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
ponyo variableName typeName DatatypeCtor ExceptionName Structure_Name SIGNATURE_NAME Functor_Name
cornell variableName type_name DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorName

* mostly

Here’s what I found, categorised into universal conventions, usual conventions, and “other”.

Universal

The following is the only universal convention:

Signature
SIGNATURE_NAME

The only code I found that doesn’t follow this convention is in the SML standard itself, which omits the underscore (like SIGNATURENAME).

Usual

The following conventions are not universal, but more popular than any other.

Variable Type name Exception Structure Functor
variableName type_name ExceptionName StructureName FunctorName

Camel case is clearly idiomatic for everything except type names. MLKit contains some snake-cased bindings as well, but none of the other libraries did. I like snake case in SML and I’ve written a fair bit of code using it myself; I hadn’t realised until now how uncommon it was. (It’s more common in SML’s sibling language OCaml. Ironic that, of the three very similar languages SML, OCaml, and F#, the only one not to use camel case is called OCaml.)

I spotted a handful of all-caps exception names and some camel case type names, but no library preferred those consistently.

The Ponyo library differs from the above for structures (Structure_Name) and functors (Functor_Name).

The SML/NJ library sort-of differs for functors, which are given a Fn suffix (FunctorNameFn). But you could think of this as part of the name, in which case the convention is the same.

Most type and datatype names used in public APIs are single words, or even single letters, so the convention often doesn’t matter for those.

Other

There seems to be no consensus about datatype constructors — I found DatatypeConstructor and DATATYPE_CONSTRUCTOR in roughly equal number.

Filenames

Nothing in the SML standard or Basis library cares about what source files are called, what file extension they use, or how you divide your code up among them. Some compilers might care, but most don’t. The business of telling the compiler which files a program consists of, or of expressing any relationships between files, is left up to external tools. SML has neither header files nor import directives.

This makes fertile ground for variety in naming schemes.

I’m going to consider only filenames that are associated with a primary structure, signature, or functor. Here’s the table.

Source Structure Signature Functor
mlton structure-name.sml signature-name.sig functor-name.fun
mlkit StructureName.sml SIGNATURE_NAME.sml* FunctorName.sml
smlsharp StructureName.sml SIGNATURE_NAME.sig* FunctorName.sml
mlton-basis structure-name.sml signature-name.sig functor-name.fun
smlsharp-basis StructureName.sml SIGNATURE_NAME.sig (none)
snlnj-lib structure-name.sml signature-name-sig.sml functor-name-fn.sml
urweb structure_name.sml signature_name.sig (n/a)
ponyo Structure_Name.sml SIGNATURE_NAME.ML Functor_Name.sml

* mostly

Clearly very inconsistent. There are no universal or usual conventions, only “other”.

Behind this there is a wider question about code organisation in files — should each signature live in its own file? Each structure? In many cases they do, but that is also far from universal.

If you use a scheme in which filenames are clearly derived from signature and structure names, does that mean you shouldn’t put more than one structure in the same file? What do you do with code that is not in any structure? Really it’s a pity to have to think about filenames at all, in a language that is so completely indifferent to file structure.

A Reasonable Recommendation

A plausible set of rules based on the above.

For names within the language:

Variable Type name Datatype constructor Exception Structure Signature Functor
variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName

This is the style used by the Basis library. Apart from datatype constructors, everything here was in the majority within the libraries I looked at.

For datatype constructors it seems reasonable to pick the most visible option and one that is consistent with the names in Basis. (This differs from the Cornell guide, however.) There is no confusion between these and signature names, because signature names never appear anywhere except in the declaration lines for those signatures and the structures that implement them.

For filenames:

Structure Signature Functor
structure-name.sml signature-name.sig functor-name.sml

The logic here is:

  • It’s still not a great idea to expect a case sensitive filesystem, so all-one-case is good
  • Generally use .sml extension for SML source
  • But the .sig extension for signatures seems very widely used, and it’s fair to make public signatures as easy to spot as possible
  • The .ml extension is not a great idea because it clashes with OCaml
  • The .fun extension used by MLton is a bit obscure, and you don’t always want to separate out functors (if you want to make functors more distinctive, give them names ending in Fn, as the SML/NJ library does).

 

Console games and local multiplayer

We just got a Playstation 4, and have been a bit disappointed by the lack of good local multiplayer support in the games we’ve tried so far. I reckon every console game should support local multiplayer if it can: after all that’s the main thing that makes a console different from a PC.

(To be honest it pains me even having to write “local” multiplayer. I think of it as just multiplayer. The idea that players could exist elsewhere on the internet is flying in the face of nature.)

We got two games bundled with our PS4, Overwatch and Ratchet & Clank, both of which are neat games but neither of which supports local multiplayer at all. In both cases this is naively a bit of a letdown. Overwatch is a team game, exactly the sort of thing you want to play while sitting around with your friends (but you can’t! sorry), and Ratchet & Clank is a game with two protagonists that you might hope to be able to control independently in the style of the Lego adventures (but oops! sorry again).

It’s possible we just made a bad choice of bundled games, as there were other options. I don’t think any of the others were any better, but it’s not easy to tell from reading around, because summaries of the games online often don’t talk about this. (Does Uncharted 4 have local multiplayer? Does the PS4 version of DOOM? Though that’d be a bit wrong, DOOM is supposed to be played alone, jumpy and sweating and with the lights off.)

We’re on the lookout for possibilities. I’m sure there must be plenty; they’re just perhaps not the most promoted titles, maybe because their prime audience is not game journalists. So far we have Rocket League and the evergreen FIFA, both of which are pretty nice. I’m fairly clueless about the Playstation landscape, having a general affinity for Nintendo. Let me know if you have any more suggestions.

 

F♯ has possibilities

A couple of months ago, Microsoft announced that they were buying a company called Xamarin, co-founded by the admirable Miguel “you can now flame me, I am full of love” de Icaza. (No sarcasm — I think Miguel is terrific, and the delightfully positive email linked above really stuck with me; if only I could have that attitude more often.)

As I understand it, Xamarin makes

  1. the Mono runtime, a portable third-party implementation of Microsoft’s .NET runtime for the C# and F# programming languages
  2. the eponymous Xamarin frameworks, which can be used with .NET to develop mobile apps for iOS and Android
  3. plugins for the Visual Studio IDE on Windows and the MonoDevelop IDE on OS/X to support mobile platform builds using Xamarin (the MonoDevelop-plus-plugins combo is known as Xamarin Studio).

Then a couple of days ago, the newly-acquired Xamarin declared

  1. that the Mono runtime was switching from LGPL/GPL licenses to MIT, allowing no-cost use in commercial applications
  2. that Microsoft were providing a patent promise (which I have not closely read) to remove concerns for commercial users of Mono
  3. that the Xamarin frameworks for iOS and Android, and the IDE plugins, were now free (of cost)
  4. that at some future point the Xamarin frameworks would be open sourced

I’m trying to unpick exactly what this could mean to me.

According to this discussion on Hacker News, the IDE plugins are remaining proprietary (which appears to mean that no IDE on Linux will be supported, since the IDE plugins are not currently available for Linux) but that “the Xamarin runtime and all the commandline tools you need to build apps” will be open sourced.

What this means

as I understand it,

  • Developers working on proprietary .NET applications will be able to build and release versions for other platforms than Windows, using Mono, at no extra cost
  • Developers working on open source .NET applications will be able to publish the ensemble with Mono under the MIT license if desired and will (apparently) be free of patent concerns
  • Developers will be able to make both proprietary and open source .NET applications for iOS and Android at no cost using Windows and OS/X
  • There is a possibility of being able to do builds of the above using Linux as well once the SDK is open, though probably without an IDE

Unrelatedly, there are separate projects afoot to provide native code and to-Javascript compilers for .NET bytecode.

What I’m interested in

I do a range of programming including a mixture of signal-processing and UI work, and am interested in exploring comprehensible, straightforward functional languages in the ML family (I wrote a little post about that here). Unlike many audio developers I have relatively limited demands on real-time response, but everything I write really wants to be cross-platform, because I’ve got specialised users on pretty every common platform and I have limited time and funding. (I understand that cross-platform apps are often inferior to single-platform apps, but they’re better than no apps.)

Xamarin doesn’t quite meet my expectations because it’s not really a cross-platform framework in the manner of Qt (which I use) or JUCE (which is widely used by others in my field). Instead of providing a common “widget set” across all platforms, Xamarin provides a separate thin interface to the native UI logic for each platform. It’s hard to judge how much more work this is, without knowing where the abstraction boundaries lie, but it may be a more relevant and sensible distinction on mobile platforms (where the differences are often in interaction and layout) than desktops (where the differences are mostly about how large numbers of individual widgets look).

An ideal combination of language and framework for me goes something like

  • strongly-typed, mostly functional, mostly immutable data structures
  • efficient unboxed support for floating-point vector types, including SIMD support
  • simple syntax (SML is nice)
  • low-cost foreign-function interface for C integration
  • high-level approach to multithreading
  • can work with gross UI layout in HTML5 (possibly DOM-update reactive UI style?)
  • good libraries for e.g. audio file I/O, signal processing, matrix algebra
  • can develop on Linux and deploy to all of Linux, Windows, OS/X, iOS, Android
  • free (or cheap, for proprietary apps) and open source (for open source apps)
  • has indenting Emacs mode

Where F# appears to score

F#, Microsoft’s ML-derived functional language for the .NET CLR, hits several of these. It has the typing, mostly-functional style, syntax, FFI, multithreading, libraries, deployment and licensing, and potentially the development platform (if the open source Xamarin framework should lead to the ability to build mobile apps directly from Linux).

I’m not sure about floating-point and vectors or about reusable HTML-style UI. I’d like to make the time to do another comparison of some ML-family languages, focusing on DSP-style float activity and on threading. I’ve done a bit of related work in Standard ML, which I could use as a basis for comparison.

Unless and until I get to do that, I’d love to hear any thoughts about F# as a general-purpose DSP-and-UI language, for a developer whose home platform is Linux.

My impression from the feedback on my earlier post was that the F# community is both enthusiastic and polite, and I notice that F# is the third most-loved language in the StackOverflow’s 2016 survey. Imagine a language that is useful no matter what platform you’re targeting, and whose developers love it. I can hope.

 

Fold: at the limit of comprehension

Fold” is a programming concept, a common name for a particular higher-order function that is widely used in functional programming languages. It’s a fairly simple thing, but in practice I think of it as representing the outer limit of concepts a normal programmer can reasonably be expected to grasp in day-to-day work.

What is fold? Fold is an elementary function for situations where you need to keep a tally of things. If you have a list of numbers and you want to tally them up in some way, for example to add them together, fold will do that.

Fold is also good at transforming sequences of things, and it can be used to reverse a list or modify each element of a sequence.

Fold is a useful fundamental function, and it’s widely used. I like using it. I just scanned about 440,000 lines of code (my own and other people’s) in ML-family languages and found about 14,000 that either called or defined a fold function.

Let me try to describe fold more precisely in English: It acts upon some sort of iterable object or container. It takes another function as an argument, one that the caller provides, and it calls that function repeatedly, providing it with one of the elements of the container each time, in order, as well as some sort of accumulator value. That function is expected to return an updated version of the accumulator each time it’s called, and that updated version gets passed in to the next call. Having called that function for every element, fold then returns the final value of the accumulator.

I tried, but I think that’s quite hard to follow. Examples are easier. Let’s add a list of numbers in Standard ML, by folding with the “+” function and an accumulator that starts at zero.

> val numbers = [1,2,3,4,5];
val numbers = [1, 2, 3, 4, 5]: int list
> foldl (op+) 0 numbers;
val it = 15: int

What’s difficult about fold?

  1. Fold is conceptually tricky because it’s such a general higher-order function. It captures a simple procedure that is common to a lot of actions that we are used to thinking of as distinct. For example, it can be used to add up a list of numbers, reverse a list of strings, increase all of the numbers in a sequence, calculate a ranking score for the set of webpages containing a search term, etc. These aren’t things that we habitually think of as similar actions, other than that they happen to involve a list or set of something. Especially, we aren’t used to giving a name to the general procedure involved and then treating individual activities of that type as specialisations of it. This is often a problem with higher-order functions (and let’s not go into monads).
  2. Fold is syntactically tricky, and its function type is confusing because there is no obvious logic determining either the order of arguments given to fold or the order of arguments accepted by the function you pass to it. I must have written hundreds of calls to fold, but I still hesitate each time to recall which order the arguments go in. Not surprising, since the argument order for the callback function differs between different languages’ libraries: some take the accumulator first and value second, others the other way around.
  3. Fold has several different names (some languages and libraries call it reduce, or inject) and none of them suggests any common English word for any of the actions it is actually used for. I suppose that’s because of point 1: we don’t name the general procedure. Fold is perhaps a marginally worse name than reduce or inject, but it’s still probably the most common.
  4. There’s more than one version of fold. Verity Stob cheekily asks “Do you fold to left or to the right? Do not provide too much information.” Left and right fold differ in the order in which they iterate through the container, so they usually produce different results, but there can also be profound differences between them in terms of performance and computability, especially when using lazy evaluation. This means you probably do have to know which is which. (See footnote below.)

A post about fold by James Hague a few years ago asked, “Is the difficulty many programmers have in grasping functional programming inherent in the basic concept of non-destructively operating on values, or is it in the popular abstractions that have been built-up to describe functional programming?” In this case I think it’s both. Fold is a good example of syntax failing us, and I think it’s also inherently a difficult abstraction to recognise (i.e. to spot the function application common to each activity). Fold is a fundamental operation in much of functional programming, but it doesn’t really feel like one because the abstraction is not comfortable. But besides that, many of the things fold is useful for are things that we would usually visualise in destructive terms: update the tally, push something onto the front of the list.

In Python the fold function (which Python calls reduce) was dropped from the built-in functions and moved into a separate module for Python 3. Guido van Rossum wrote, “apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what’s actually being fed into that function before I understand what the reduce() is supposed to do.” Instead the Python style for these activities usually involves destructively updating the accumulator.

Functional programming will surely never be really mainstream so long as fold appears in basic tutorials for it. Though in practice at least, because it’s such a general function, it can often be usefully hidden behind a more discoverable domain-specific API.

***

(Footnote. You can tell whether an implementation of fold is a left or right fold by applying it to the list “cons” function, which is often called “::”. If this reverses a list passed to it, you have a left fold. For example, the language Yeti has a function simply called fold; which is it? —

> fold (flip (::)) [] [1,2,3,4];
[4,3,2,1] is list<number>

So it’s a left fold.)

 

Zero-based indexing

The excellent Greg Wilson, founder of Software Carpentry, tweeted the above link to a 2013 blog post by Mike Hoye the other day.

I didn’t comment on this article when it first appeared because I didn’t have the nerve to confront its author, who was shouting down everyone who tried to discuss it in the comments. But I can’t bear to see this article promoted again, and by a good authority.

The article claims that the reason most of today’s programming languages use zero-based indexing (i.e. they count array indexes from 0, so that arr[0] is the first element of array arr, rather than arr[1]) is because it saved a tiny amount of compile time (not run time), and that this mattered because on a specific IBM mainframe hosted by MIT in the 70s there was a danger that a job taking too long to compile might be bumped in order to make way for a program to calculate handicap points for yacht racing.

This is a pretty implausible suggestion, so it needs some pretty good evidence. That isn’t there. The article has some very nice sources, but the quotes from them just don’t support the proposition they’re being asked to support. The main quotes, from Martin Richards and Tom Van Vleck, both appear to say nearly the opposite of the things they’re described as saying. There’s plenty of room for nuance in interpreting in what people say, but the author accepts no nuance in anyone’s responses to the article, choosing instead to mock and ridicule anyone who doesn’t agree with him. There’s no citation for the one thing that is necessary to make the argument hold together (that indexes were calculated at compile time rather than run time). Reading this article carefully, the only conclusion I can draw is that the choice of 0-based indexing almost certainly has nothing to do with yachts.

I don’t mind a great deal whether a programming language uses 0-based or 1-based indexing. The reason this matters to me is because the article is not just a screed with a funny story in it, but a call for rigour in understanding the history of programming languages, something I do care about and that its author appears to take very seriously indeed. Its general principle is really sound — we get used to a lot of arbitrary aspects of languages and then explain them as the mythology of the elders, rather than finding the actual reasons. But this article only added to the mythology, and people who know better are now citing it as if it had been established to be true, which it almost certainly isn’t.

(I feel really bad just writing this. It’s quite possible the author is regretting ever getting involved in this stupid topic but has too much integrity to take down or edit the post. I wish I had never been reminded of how maddening I found it.)