Inadequate names for abstract affairs

Naming conventions in Standard ML

Many programming languages have a standard document that describes how to write and capitalise the names of functions, variables, and source files. It’s especially useful to have a standard for writing names made up from more than one word, where there are various options for how to join the words: “camel case”, which looks likeThis (with a capital letter “hump” in the middle), or “snake case”, which is underscore_separated.

I think Java in the mid-90s was the first really mainstream language to standardise file and variable naming conventions. The Java package mechanism requires files to be laid out in a particular way, and Sun published Java coding conventions which quickly became an effective standard for class and variable naming. Other languages followed. Python has had a standard that covers naming (PEP8) since 2001. More recent examples include Go and Swift.

Older languages tend to be less consistent. C++ is a mess: the standard library and most official example material uses snake_case for most names, but a great many developers, including those on most of the projects I’ve worked on, prefer camelCase, with capital initials for class names. File names are even more various: C++ source files are seen with .cpp, .cxx, .cc, and .C extensions; C++ header files with .h, .hpp, or no extension at all.

Standard ML (SML) is also a mess, and an interesting one because the language itself was standardised in 1990 and has been completely unchanged since the standard was revised in 1997. So although it is super-standardised, it’s a bit too old to have caught the wider shift in sentiment toward prescribing things like naming and file structure.

The SML standard is formal and very focused. It says nothing about coding style or naming, contains almost no examples using compound names, says nothing about filenames or file organisation, and specifies no way for one file to refer to another — the standard is indifferent to whether your source code is held in a file at all.

In trying to establish what naming conventions to use for my own code, I decided to look around at some existing libraries in SML to see what they had settled on.

The Basis library

SML has a standard library, the Basis library, which is a bit more recent than the language itself. Although it isn’t prescriptive, the library does use certain conventions itself and the introductory notes explain what they are. These cover only names of things within a program — not filenames, which are left up to the implementor of the standard. I’ll refer to them in the table below.

The Cornell style guide

Top search result for “SML naming conventions” for me is this online style guide for the Cornell CS312 course. It doesn’t cover file naming. Given the limited industry uptake for SML, an academic guide may be proportionately more influential than for other languages. I’ll mention this guide below as well.

Other code I looked at

I took a look at the following code:

  • The source of the MLton, MLKit, and SMLSharp compilers (excluding accompanying utility libraries)
  • The Basis library implementations shipped with MLton and SMLSharp
  • The SML/NJ extended library
  • The source of the Ur/Web language
  • The Ponyo library, an interesting fledgling effort to produce a broader base library than the Basis

In total, about 444,500 lines of code across 1790 SML source files. Some (presumably automatically-generated) source files are very long; while the mean file length is 248 lines including comments and blanks, the median is only 47.

Names within the language

The SML language has at least seven categories of things that need names: variables, type names, datatype constructors, exceptions, structures, signatures, and functors.

(By “variables” I really mean bindings, i.e. the vast majority of ordinary things with names: things that in a procedural language might include function names, variable names, and constant declarations. I’m using the word “variable” because it’s such a familiar everyday programming term.)

Source Variable Type name Datatype constructor Exception Structure Signature Functor
mlton variableName (mixed) DatatypeCtor ExceptionName* StructureName SIGNATURE_NAME FunctorName
mlkit (mixed) (mixed) DatatypeCtor* ExceptionName* StructureName SIGNATURE_NAME FunctorName
smlsharp variableName typeName* DATATYPE_CTOR* ExceptionName StructureName SIGNATURE_NAME FunctorName
basis variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName
smlnj-lib variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
urweb variableName type_name* DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
ponyo variableName typeName DatatypeCtor ExceptionName Structure_Name SIGNATURE_NAME Functor_Name
cornell variableName type_name DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorName

* mostly

Here’s what I found, categorised into universal conventions, usual conventions, and “other”.

Universal

The following is the only universal convention:

Signature
SIGNATURE_NAME

The only code I found that doesn’t follow this convention is in the SML standard itself, which omits the underscore (like SIGNATURENAME).

Usual

The following conventions are not universal, but more popular than any other.

Variable Type name Exception Structure Functor
variableName type_name ExceptionName StructureName FunctorName

Camel case is clearly idiomatic for everything except type names. MLKit contains some snake-cased bindings as well, but none of the other libraries did. I like snake case in SML and I’ve written a fair bit of code using it myself; I hadn’t realised until now how uncommon it was. (It’s more common in SML’s sibling language OCaml. Ironic that, of the three very similar languages SML, OCaml, and F#, the only one not to use camel case is called OCaml.)

I spotted a handful of all-caps exception names and some camel case type names, but no library preferred those consistently.

The Ponyo library differs from the above for structures (Structure_Name) and functors (Functor_Name).

The SML/NJ library sort-of differs for functors, which are given a Fn suffix (FunctorNameFn). But you could think of this as part of the name, in which case the convention is the same.

Most type and datatype names used in public APIs are single words, or even single letters, so the convention often doesn’t matter for those.

Other

There seems to be no consensus about datatype constructors — I found DatatypeConstructor and DATATYPE_CONSTRUCTOR in roughly equal number.

Filenames

Nothing in the SML standard or Basis library cares about what source files are called, what file extension they use, or how you divide your code up among them. Some compilers might care, but most don’t. The business of telling the compiler which files a program consists of, or of expressing any relationships between files, is left up to external tools. SML has neither header files nor import directives.

This makes fertile ground for variety in naming schemes.

I’m going to consider only filenames that are associated with a primary structure, signature, or functor. Here’s the table.

Source Structure Signature Functor
mlton structure-name.sml signature-name.sig functor-name.fun
mlkit StructureName.sml SIGNATURE_NAME.sml* FunctorName.sml
smlsharp StructureName.sml SIGNATURE_NAME.sig* FunctorName.sml
mlton-basis structure-name.sml signature-name.sig functor-name.fun
smlsharp-basis StructureName.sml SIGNATURE_NAME.sig (none)
snlnj-lib structure-name.sml signature-name-sig.sml functor-name-fn.sml
urweb structure_name.sml signature_name.sig (n/a)
ponyo Structure_Name.sml SIGNATURE_NAME.ML Functor_Name.sml

* mostly

Clearly very inconsistent. There are no universal or usual conventions, only “other”.

Behind this there is a wider question about code organisation in files — should each signature live in its own file? Each structure? In many cases they do, but that is also far from universal.

If you use a scheme in which filenames are clearly derived from signature and structure names, does that mean you shouldn’t put more than one structure in the same file? What do you do with code that is not in any structure? Really it’s a pity to have to think about filenames at all, in a language that is so completely indifferent to file structure.

A Reasonable Recommendation

A plausible set of rules based on the above.

For names within the language:

Variable Type name Datatype constructor Exception Structure Signature Functor
variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName

This is the style used by the Basis library. Apart from datatype constructors, everything here was in the majority within the libraries I looked at.

For datatype constructors it seems reasonable to pick the most visible option and one that is consistent with the names in Basis. (This differs from the Cornell guide, however.) There is no confusion between these and signature names, because signature names never appear anywhere except in the declaration lines for those signatures and the structures that implement them.

For filenames:

Structure Signature Functor
structure-name.sml signature-name.sig functor-name.sml

The logic here is:

  • It’s still not a great idea to expect a case sensitive filesystem, so all-one-case is good
  • Generally use .sml extension for SML source
  • But the .sig extension for signatures seems very widely used, and it’s fair to make public signatures as easy to spot as possible
  • The .ml extension is not a great idea because it clashes with OCaml
  • The .fun extension used by MLton is a bit obscure, and you don’t always want to separate out functors (if you want to make functors more distinctive, give them names ending in Fn, as the SML/NJ library does).