Naming conventions in Standard ML

Many programming languages have a standard document that describes how to write and capitalise the names of functions, variables, and source files. It’s especially useful to have a standard for writing names made up from more than one word, where there are various options for how to join the words: “camel case”, which looks likeThis (with a capital letter “hump” in the middle), or “snake case”, which is underscore_separated.

I think Java in the mid-90s was the first really mainstream language to standardise file and variable naming conventions. The Java package mechanism requires files to be laid out in a particular way, and Sun published Java coding conventions which quickly became an effective standard for class and variable naming. Other languages followed. Python has had a standard that covers naming (PEP8) since 2001. More recent examples include Go and Swift.

Older languages tend to be less consistent. C++ is a mess: the standard library and most official example material uses snake_case for most names, but a great many developers, including those on most of the projects I’ve worked on, prefer camelCase, with capital initials for class names. File names are even more various: C++ source files are seen with .cpp, .cxx, .cc, and .C extensions; C++ header files with .h, .hpp, or no extension at all.

Standard ML (SML) is also a mess, and an interesting one because the language itself was standardised in 1990 and has been completely unchanged since the standard was revised in 1997. So although it is super-standardised, it’s a bit too old to have caught the wider shift in sentiment toward prescribing things like naming and file structure.

The SML standard is formal and very focused. It says nothing about coding style or naming, contains almost no examples using compound names, says nothing about filenames or file organisation, and specifies no way for one file to refer to another — the standard is indifferent to whether your source code is held in a file at all.

In trying to establish what naming conventions to use for my own code, I decided to look around at some existing libraries in SML to see what they had settled on.

The Basis library

SML has a standard library, the Basis library, which is a bit more recent than the language itself. Although it isn’t prescriptive, the library does use certain conventions itself and the introductory notes explain what they are. These cover only names of things within a program — not filenames, which are left up to the implementor of the standard. I’ll refer to them in the table below.

The Cornell style guide

Top search result for “SML naming conventions” for me is this online style guide for the Cornell CS312 course. It doesn’t cover file naming. Given the limited industry uptake for SML, an academic guide may be proportionately more influential than for other languages. I’ll mention this guide below as well.

Other code I looked at

I took a look at the following code:

The source of the MLton, MLKit, and SMLSharp compilers (excluding accompanying utility libraries)
The Basis library implementations shipped with MLton and SMLSharp
The SML/NJ extended library
The source of the Ur/Web language
The Ponyo library, an interesting fledgling effort to produce a broader base library than the Basis

In total, about 444,500 lines of code across 1790 SML source files. Some (presumably automatically-generated) source files are very long; while the mean file length is 248 lines including comments and blanks, the median is only 47.

Names within the language

The SML language has at least seven categories of things that need names: variables, type names, datatype constructors, exceptions, structures, signatures, and functors.

(By “variables” I really mean bindings, i.e. the vast majority of ordinary things with names: things that in a procedural language might include function names, variable names, and constant declarations. I’m using the word “variable” because it’s such a familiar everyday programming term.)

Source	Variable	Type name	Datatype constructor	Exception	Structure	Signature	Functor
mlton	`variableName`	(mixed)	`DatatypeCtor`	`ExceptionName*`	`StructureName`	`SIGNATURE_NAME`	`FunctorName`
mlkit	(mixed)	(mixed)	`DatatypeCtor*`	`ExceptionName*`	`StructureName`	`SIGNATURE_NAME`	`FunctorName`
smlsharp	`variableName`	`typeName*`	`DATATYPE_CTOR*`	`ExceptionName`	`StructureName`	`SIGNATURE_NAME`	`FunctorName`
basis	`variableName`	`type_name`	`DATATYPE_CTOR`	`ExceptionName`	`StructureName`	`SIGNATURE_NAME`	`FunctorName`
smlnj-lib	`variableName`	`type_name`	`DATATYPE_CTOR`	`ExceptionName`	`StructureName`	`SIGNATURE_NAME`	`FunctorNameFn`
urweb	`variableName`	`type_name*`	`DatatypeCtor`	`ExceptionName`	`StructureName`	`SIGNATURE_NAME`	`FunctorNameFn`
ponyo	`variableName`	`typeName`	`DatatypeCtor`	`ExceptionName`	`Structure_Name`	`SIGNATURE_NAME`	`Functor_Name`
cornell	`variableName`	`type_name`	`DatatypeCtor`	`ExceptionName`	`StructureName`	`SIGNATURE_NAME`	`FunctorName`

* mostly

Here’s what I found, categorised into universal conventions, usual conventions, and “other”.

Universal

The following is the only universal convention:

Signature
`SIGNATURE_NAME`

The only code I found that doesn’t follow this convention is in the SML standard itself, which omits the underscore (like SIGNATURENAME).

Usual

The following conventions are not universal, but more popular than any other.

Variable	Type name	Exception	Structure	Functor
`variableName`	`type_name`	`ExceptionName`	`StructureName`	`FunctorName`

Camel case is clearly idiomatic for everything except type names. MLKit contains some snake-cased bindings as well, but none of the other libraries did. I like snake case in SML and I’ve written a fair bit of code using it myself; I hadn’t realised until now how uncommon it was. (It’s more common in SML’s sibling language OCaml. Ironic that, of the three very similar languages SML, OCaml, and F#, the only one not to use camel case is called OCaml.)

I spotted a handful of all-caps exception names and some camel case type names, but no library preferred those consistently.

The Ponyo library differs from the above for structures (Structure_Name) and functors (Functor_Name).

The SML/NJ library sort-of differs for functors, which are given a Fn suffix (FunctorNameFn). But you could think of this as part of the name, in which case the convention is the same.

Most type and datatype names used in public APIs are single words, or even single letters, so the convention often doesn’t matter for those.

Other

There seems to be no consensus about datatype constructors — I found DatatypeConstructor and DATATYPE_CONSTRUCTOR in roughly equal number.

Filenames

Nothing in the SML standard or Basis library cares about what source files are called, what file extension they use, or how you divide your code up among them. Some compilers might care, but most don’t. The business of telling the compiler which files a program consists of, or of expressing any relationships between files, is left up to external tools. SML has neither header files nor import directives.

This makes fertile ground for variety in naming schemes.

I’m going to consider only filenames that are associated with a primary structure, signature, or functor. Here’s the table.

Source	Structure	Signature	Functor
mlton	`structure-name.sml`	`signature-name.sig`	`functor-name.fun`
mlkit	`StructureName.sml`	`SIGNATURE_NAME.sml*`	`FunctorName.sml`
smlsharp	`StructureName.sml`	`SIGNATURE_NAME.sig*`	`FunctorName.sml`
mlton-basis	`structure-name.sml`	`signature-name.sig`	`functor-name.fun`
smlsharp-basis	`StructureName.sml`	`SIGNATURE_NAME.sig`	(none)
snlnj-lib	`structure-name.sml`	`signature-name-sig.sml`	`functor-name-fn.sml`
urweb	`structure_name.sml`	`signature_name.sig`	(n/a)
ponyo	`Structure_Name.sml`	`SIGNATURE_NAME.ML`	`Functor_Name.sml`

* mostly

Clearly very inconsistent. There are no universal or usual conventions, only “other”.

Behind this there is a wider question about code organisation in files — should each signature live in its own file? Each structure? In many cases they do, but that is also far from universal.

If you use a scheme in which filenames are clearly derived from signature and structure names, does that mean you shouldn’t put more than one structure in the same file? What do you do with code that is not in any structure? Really it’s a pity to have to think about filenames at all, in a language that is so completely indifferent to file structure.

A Reasonable Recommendation

A plausible set of rules based on the above.

For names within the language:

Variable	Type name	Datatype constructor	Exception	Structure	Signature	Functor
`variableName`	`type_name`	`DATATYPE_CTOR`	`ExceptionName`	`StructureName`	`SIGNATURE_NAME`	`FunctorName`

This is the style used by the Basis library. Apart from datatype constructors, everything here was in the majority within the libraries I looked at.

For datatype constructors it seems reasonable to pick the most visible option and one that is consistent with the names in Basis. (This differs from the Cornell guide, however.) There is no confusion between these and signature names, because signature names never appear anywhere except in the declaration lines for those signatures and the structures that implement them.

For filenames:

Structure	Signature	Functor
`structure-name.sml`	`signature-name.sig`	`functor-name.sml`

The logic here is:

It’s still not a great idea to expect a case sensitive filesystem, so all-one-case is good
Generally use .sml extension for SML source
But the .sig extension for signatures seems very widely used, and it’s fair to make public signatures as easy to spot as possible
The .ml extension is not a great idea because it clashes with OCaml
The .fun extension used by MLton is a bit obscure, and you don’t always want to separate out functors (if you want to make functors more distinctive, give them names ending in Fn, as the SML/NJ library does).