SML and OCaml: So, why was the OCaml faster?

My earlier post Four MLs (and a Python) listed a toy example program written in four languages in the ML family: Standard ML, OCaml, Yeti, and F♯.

Perhaps unwisely, I measured and reported the runtimes for each version of this program when processing a test file on my Linux laptop. The figures I got were, in seconds:

SML OCaml Yeti F♯
16.4 7.8 14.4 13.9

At this point I have to repeat what I said in the earlier post—this is not a meaningful benchmark; these are all pretty good; they’re not so far apart that you can’t imagine a slightly different test case reversing the order.

Even so, it puzzled me why this particular SML code (compiled using the MLton compiler) ran slower than the OCaml. They’re similar languages, well-established, both compiled to native code using optimising compilers. What’s the difference?

I took a rambling, not-entirely-conclusive look at this, with some detours along the way.

First, the OCaml

Several commentators noted that the OCaml implementation could have been sped up by hoisting regexp construction out of the inner function. I tried out a couple of the suggested alternatives:

Original from Marek from Dora
7.8 7.4 7.3

So it makes a bit of a difference, but not as much as I’d expected. (Both alternatives are nicer to read than my own code was, though.) Another comment here proposes using the Core library from Jane Street instead of the supplied Str module, but I haven’t yet tried that.

Now, the Standard ML

The first thing I thought I’d do, since Standard ML is a standard that has many implementations, was to try the program in more than one of them.

SML environments

Here are all the SML implementations I could find that showed any signs of life—I’d be interested to hear of any I missed.

Note that, unlike the previous survey, these are not different languages. They all implement the same language, even if in practice many of them also have extensions reflecting whatever research projects engaged their authors. All are open source.

  • MLton is the compiler I used in my original post. It’s an optimising native-code compiler that aims to produce fast code complying with the SML standard without any novel extensions. Unlike the other implementations here, it is only a compiler and does not include an interactive environment (or REPL, read-eval-print-loop).
  • Standard ML of New Jersey (SML/NJ) is the most venerable complete implementation and, I think, the only one of these that existed the last time I wrote any ML. It provides an interactive environment but doesn’t have a straightforward native-code compiler. It includes a number of language and library extensions, and I avoided it for my earlier test because I was keen to stick to the standard parts of the language and I’d heard it was hard to generate a native binary with.
  • Poly/ML is a compact distribution including an interactive environment and a native compiler. I’m not sure when it first appeared, but it’s been around long enough to have been mentioned in the 1996 edition of Larry C Paulson’s excellent ML for the Working Programmer. (As an aside, it’s a pity to see that book available only either second-hand or in hugely expensive academic pricing, even after almost 20 years. Almost everything in it is still relevant and enlightening to a reader with an interest in functional programming. I’m sure an online version would be well-loved, though that’s probably too big a job for too little reward to hope that it would ever appear. Anyway I hope it sells enough to be useful to the author still, because it’s a brilliant piece of work.)
  • Moscow ML is another neat distribution with both interactive environment and native compiler. A nice quality of Standard ML, at least until you start writing code that depends on non-standard libraries, is that you can try out your code in more than one compiler and compare their error messages and the like: Moscow ML is fast to run, and has (for my taste) the most readable error messages of these.
  • MLKit is a modular distribution that also dates back to the 90s. The name made me think abstractly of Apple platform libraries, but it has nothing to do with those. Like MLton it has a compiler but not an interactive environment.
  • Alice ML is an implementation that extends the language with concurrency primitives and various other interesting bits and bobs. It consists of an interactive environment and bytecode compiler, without a native code compiler.
  • SML is an SML implementation from Tohoku university in Japan. Just as MLKit has nothing to do with Apple’s *Kit frameworks, so SML has nothing to do with Microsoft’s C and F. I was unable to get it to build on my usual system (64-bit Arch Linux with LLVM 3.6) so I ran it in a 32-bit Ubuntu 14.04 container instead. Rather shockingly, when compiling with the SML v1.2 distro package I got completely wrong answers—it didn’t parse negative exponents correctly when reading from file. I installed the official SML v2.0 instead, and that gave the right answers.

All of these were able to run the test program. They all took an essentially negligible amount of time to compile it, except for MLton and SML which took a second or two. Here’s how long the resulting code took to process my test file (in seconds).

SML/NJ v110.77 Poly/ML 5.5.2 Moscow ML 2.10 MLKit 4.3.7 Alice ML 1.4+ SML♯ 2.0
16.4 9.3 * 25.6 99.1 33.5 579.5 ** 173.9


* I know that it’s possible to make SML/NJ produce a native binary, but I still haven’t taken the time to figure out how to do it. This time comes from adding a timing wrapper in the program itself and then importing it into the interactive environment.

** Alice ML has three JIT compiler settings; this was mode 1, which proved the fastest.)

So SML/NJ is not far off OCaml; it’s clearly possible. We can see from the other columns why I said the timings in the original article were all pretty good—Moscow ML is a nice environment to develop with but it’s much slower here, as is SML, and Alice ML is clearly not well suited to this particular job.

So where did the time go?

MLton comes with a simple-to-use instrumenting profiler; running it on this task gives:

             function               cur 
---------------------------------- -----
to_number                          51.1%
values_of_line                     10.7%
<gc>                                6.3%
fold_stream                         5.9%
IEEEReal.scan.digitStar             4.7%
IEEEReal.scan.digitStar.done        4.4%
Sequence.Slice.concat               3.8%
IEEEReal.scan.done                  3.5%
Integer.fmt                         3.3%
values_of_line.fn                   2.0%
IEEEReal.scan.normal'               1.8%
Sequence.concat                     1.7%
PosixError.SysCall.simpleResultAux  0.3%
add_to                              0.3%
IEEEReal.scan.afterE.neg            0.1%

It’s spending about 65% of its runtime converting strings to floating-point numbers. (Most of that is reported as spent in my own function, which does nothing except call out to the Basis library: I guess this is because MLton has inlined the conversion code, since it’s only called in one place.)

For a quick sanity check, I replace the call to Real.fromString with a function that only pretends to convert the string:

fun fake_real_from_string str =
    if str = "" then NONE
    else SOME ((Real.fromInt (length (explode str))) / 10000.0)

… and the runtime is reduced from 16.4 seconds to 4.5. We now get the wrong answers, but it’s clear that this is where most of the time went.

What does Real.fromString consist of? The implementation supplied with MLton appears to be this section of real.sml, calling out to this code in IEEE-real.sml. It looks as if the IEEE-real structure does some cleaning and preprocessing and parses into a standard format, which is then converted to a floating-point value by the code in real.sml.

For comparison, here’s the primitive as defined in the OCaml standard library; it calls out to this C function in the OCaml runtime, which simply calls strtod from the C library. A quick scan of the OCaml version with the sampling profiler sysprof shows that it spends around 40% of its runtime in strtod. That’s about 3 seconds, compared to more than 10 for MLton.

It’s no surprise that simply calling the C library might be faster. I wonder why they take such different approaches. My first thought was that the MLton version, which has a look of careful code about it, was a pure SML solution, but then I noticed that ultimately it also calls out to a C library function to do the eventual conversion (strtor, from the gdtoa library).

Still, all I really know about parsing floats is that it’s tricky and full of pitfalls and I certainly wouldn’t trust myself to do it right. Searching for discussion about MLton’s Real.fromString implementation, I run into this post, which says, “Real.fromString is difficult to implement quickly in SML because of the requirements for getting good-to-the last bit conversions and getting all the corner cases right” and notes that it’s likely only a problem in pathological cases like this one.

So my question isn’t quite answered, although that’s enough to satisfy my own curiosity for the moment. But there is one thing I’m going to try before I close:

Hacking it with the FFI

MLton has a foreign function interface (FFI) for calling C functions; why not just call strtod ourselves? Or its simpler sibling atof?

fun to_number str =
    let val atof = _import "atof": char array -> real;
        (* Pass a C-style null-terminated string *)
        atof (Array.fromList (explode (str ^ "00")))

Compiling with

$ mlton -default-ann 'allowFFI true' sum.sml

this actually runs fine, taking about 10.5 seconds, four less than the original. The results for this particular test run are the same, but, well… you’d have to be a bit desperate.


  • In both SML and OCaml, the largest slice of runtime of my test program went to string-to-floating-point conversions; OCaml performs these by calling out to the C library directly, while MLton has a more laborious implementation (I don’t entirely understand why and would very much appreciate hearing from anyone who does)
  • MLton seems to perform well as a compiler, given the more elaborate library code
  • MLton’s foreign-function interface seems straightforward, at least for simple examples (though of course this is specific to the compiler and doesn’t work in other ML implementations)
  • The OCaml program seems to be still the fastest even after accounting for this, and I hazard that that may be because other library functions are similarly minimal
  • This was indeed a pretty terrible benchmark



Feedback on “Four MLs (and a Python)”

My last post got an unprecedented amount of attention after appearing on the popular site Hacker News. It spent about 14 hours on the front page there and got almost 11,000 unique visitors that day — a fair way above my usual daily average, for this whole blog, of about a hundred.

I combed through the comments on the Hacker News post, as well as entries on and Reddit. Counting only comments directly about the post (rather than to other commenters), I reckoned there were:

  • Nine comments responding only, or primarily, to performance differences in the language implementations;
  • Six suggestions to hoist regexp construction out of the split function in the OCaml version, to make it faster: four of them included code, and they were all quite polite;
  • Six objections that my Python code was un-idiomatic, un-Pythonic, or generally strange. Five of those proposed rewriting it to use the CSV parser module or pandas — good advice in general, though not really in line with the purpose of the post;
  • Six code rewrites in, enthusiasms about, or defences of, F♯;
  • Six comments suggesting other languages to look at (4 x Haskell; Scala; Mythryl);
  • Three observations that the Python code would run faster with PyPy;
  • Three people sharing my general feeling that it would be a good thing if Standard ML had more practical things going for it;
  • Two people having difficulty with my use of the term “Big Data” to describe a 300MB CSV file. (Sorry. I really only ever use the term ironically, but you weren’t to know that.)

I must say, OCaml developers seem like a keen bunch. Actually the vibe I get about both OCaml and F♯, from these few comments, is a happy one. About SML it’s more a sense of distant yearning. The Python comments were a bit grumpier, but that’s reasonable given that my post didn’t exactly engage with the way things actually get done in Python.

It’s quite obviously hard to resist a speed contest. I wasn’t initially going to include the runtime measurements in the post, but I couldn’t help myself either. Still, none of the coding decisions I made when writing the test programs had anything to do with speed. (Scalability, in a simplistic way, yes: I made sure not to read the whole file into memory, and to avoid non-tail-recursive code.) Then of course more people responded about performance than anything else. And even though I can see that that way lies madness, I’d still like to find time to follow up on the question of why the SML code didn’t go as quickly as the OCaml did.

Since I wrote that post, I’ve had a piece of real work come up for which these languages are quite well suited, involving parsing and manipulating a modest amount of symbolic data with a very specialised structure. I decided to do it using Standard ML this time, and hope to find out whether there’s a point at which it becomes more impractical than delightful.


Four MLs (and a Python)

I wrote a small command-line text processing program in four different ML-derived languages, to try to get a feel for how they compare in terms of syntax, library, and build-run cycles.

ML is a family of functional programming languages that have grown up during the past 40 years and more, with strong static typing, type inference, and eager evaluation. I tried out Standard ML, OCaml, Yeti, and F#, all compiling and running from a shell prompt on Linux.

The job was to write a utility that:

  • accepts the name of a CSV (comma-separated values) file as a command-line argument
  • reads all the lines from that file, each consisting of the same number of numeric columns
  • sums each column and prints out a single CSV line with the results
  • handles large inputs
  • fails if it finds a non-numeric column value or an inconsistent number of columns across lines (an uncaught exception is acceptable)

A toy exercise, but one that touches on file I/O, library support, string processing and numeric type conversion, error handling, and the build-invocation cycle.

I tested on a random Big Data CSV file that I had to hand; running the wc (word count) utility on it gives the size and a plausible lower bound for our program’s runtime:

$ time wc big-data.csv 
 337024 337024 315322496 big-data.csv

real 0m3.086s
user 0m3.050s
sys 0m0.037s

I’ve included timings throughout because I thought a couple of them were interesting, but they don’t tell us much except that none of the languages performed badly (with the slowest taking about 16 seconds on this file).

Finally I wrote the same thing in Python as well for comparison.

Practical disclaimer: If you actually have a CSV file you want to do things like this with, don’t use any of these languages. Do it with R instead, where this exercise takes three lines including file I/O. Or at least use an existing CSV-mangling library.

Here are the programs I ended up with, and my impressions.

Standard ML

Standard ML, or SML, is the oldest and “smallest” of the four and the only one to have a formal standard, fixed since 1997. Its standard library (the Basis library) is a more recent addition.

fun fold_stream f acc stream =
    case TextIO.inputLine stream of
	SOME line => fold_stream f (f (line, acc)) stream
      | NONE => acc
fun to_number str =
    case Real.fromString str of
	SOME r => r
      | NONE => raise Fail ("Invalid real: " ^ str)
fun values_of_line line =
    let val fields = String.fields (fn c => c = #",") line in
	map to_number fields

fun add_to [] values = values
  | add_to totals values =
    if length totals = length values then (Real.+) (totals, values)
    else raise Fail "Inconsistent-length rows"

fun sum_stream stream =
    fold_stream (fn (line, tot) => add_to tot (values_of_line line)) [] stream
fun sum_and_print_file filename =
    let val stream = TextIO.openIn filename in
	let val result = sum_stream stream in
	    print ((String.concatWith "," (map Real.toString result)) ^ "\n")
	TextIO.closeIn stream
fun main () =
    case CommandLine.arguments () of [filename] => sum_and_print_file filename
      | _ => raise Fail "Exactly 1 filename must be given"

val () = main ()

(Note that although I haven’t included any type annotations, like all ML variants this is statically typed and the compiler enforces type consistency. There are no runtime type errors.)

This is the first SML program I’ve written since 23 years ago. I enjoyed writing it, even though it’s longer than I’d hoped. The Basis library doesn’t offer a whole lot, but it’s nicely structured and easy to understand. To my eye the syntax is fairly clear. I had some minor problems getting the syntax right first time—I kept itching to add end or semicolons in unnecessary places—but once written, it worked, and my first attempt was fine with very large input files.

I had fun messing around with a few different function compositions before settling on the one above, which takes the view that, since summing up a list is habitually expressed in functional languages as an application of fold, we could start with a function to apply a fold over the sequence of lines in a file.

More abstractly, there’s something delightful about writing a language with a small syntax that was fixed and standardised 18 years ago and that has more than one conforming implementation to choose from. C++ programmers (like me) have spent much of those 18 years worrying about which bits of which sprawling standards are available in which compiler. And let’s not talk about the lifespans of web development frameworks.

To build and run it I used the MLton native-code compiler:

$ time mlton -output sum-sml sum.sml

real 0m2.295s
user 0m2.160s
sys 0m0.103s
$ time ./sum-sml big-data.csv 

real 0m16.383s
user 0m16.370s
sys 0m0.027s

The executable was a 336K native binary with dependencies on libm, libgmp, and libc. Although the compiler has a good reputation, this was (spoiler alert!) the slowest of these language examples both to build and to run. I also tried the PolyML compiler, with which it took less than a tenth of a second to compile but 26 seconds to run, and Moscow ML, which was also fast to compile but much slower to run.


OCaml is a more recent language, from the same root but with a more freewheeling style. It seems to have more library support than SML and, almost certainly, more users. I started taking an interest in it recently because of its use in the Mirage OS unikernel project—but of these examples it’s the one in which I’m least confident in my ability to write idiomatic code.

(Edit: at least two commenters below have posted improved versions of this—thanks!)

open Str

let read_line chan =
  try Some (input_line chan)
  with End_of_file -> None

let rec fold_channel f acc chan =
  match read_line chan with
  | Some line -> fold_channel f (f line acc) chan
  | None -> acc

let values_of_line line =
  let fields = Str.split (Str.regexp ",") line in float_of_string fields
let add_to totals values =
  match totals with
  | [] -> values
  | _  ->
     if List.length totals = List.length values then
       List.map2 (+.) totals values
     else failwith "Inconsistent-length rows"

let sum_channel chan =
  let folder line tot = add_to tot (values_of_line line) in
  fold_channel folder [] chan
let sum_and_print_file filename =
  let chan = open_in filename in
  (let result = sum_channel chan in
   print_string ((String.concat "," ( string_of_float result)) ^ "\n");
   close_in chan)

let main () =
  match Sys.argv with
  | [| _; filename |] -> sum_and_print_file filename
  | _ -> failwith "Exactly 1 filename must be given"
let () = main ()

I’m in two minds about this code. I don’t much like the way it looks and reads. Syntax-wise there are an awful lot of lets; I prefer the way SML uses fun for top-level function declarations and saves let for scoped bindings. OCaml has a more extensive but scruffier library than SML and although there’s lots of documentation, I didn’t find it all that simple to navigate—as a result I’m not sure I’m using the most suitable tools here. There is probably a shorter simpler way. And my first attempt didn’t work for long files: caught out by the fact that input_line throws an exception at end of file (ugh), I broke tail-recursion optimisation by adding an exception handler.

On the other hand, writing this after the SML and Yeti versions, I found it very easy to write syntax that worked, even when I wasn’t quite clear in my head what the syntax was supposed to look like. (At one point I started to worry that the compiler wasn’t working, because it took no time to run and printed no errors.)

I didn’t spot at first that OCaml ships with separate bytecode and optimising native-code compilers, so my first tests seemed a bit slow. In fact it was very fast indeed:

$ time ocamlopt -o sum-ocaml str.cmxa

real 0m0.073s
user 0m0.063s
sys 0m0.003s
$ time ./sum-ocaml big-data.csv 

real 0m7.761s
user 0m7.740s
sys 0m0.027s

The OCaml native binary was 339K and depended only on libm, libdl, and libc.


Yeti is an ML-derived language for the Java virtual machine. I’ve written about it a couple of times before.

valuesOfLine line =
    map number (strSplit "," line);
addTo totals row =
    if empty? totals then array row
    elif length totals == length row then array (map2 (+) totals row)
    else failWith "Inconsistent-length rows"

rowsOfFile filename =
    readFile filename "UTF-8"
        do handle: map valuesOfLine (handle.lines ()) done;

sumFile filename =
    fold addTo (array []) (rowsOfFile filename);

sumAndPrintFile filename =
    println (strJoin "," (map string (sumFile filename)));

case (list _argv) of
     [filename]: sumAndPrintFile filename;
     _: failWith "Exactly 1 filename must be given";

I love Yeti’s dismissive approach to function and binding declaration syntax—no let or fun keywords at all. Psychologically, this is great when you’re staring at an empty REPL prompt trying to decide where to start: no syntax to forget, the first thing you need to type is whatever it is that you want your function to produce.

The disadvantage of losing let and fun is that Yeti needs semicolons to separate bindings. It also makes for a visually rather irregular source file.

As OCaml is like a pragmatic SML, so Yeti seems like a pragmatic OCaml. It provides some useful tools for a task like this one. Although the language is eagerly evaluated, lazy lists have language support and are interchangeable with standard lists, so the standard library can expose the lines of a text file as a lazy list making a fold over it very straightforward. The default map and map2 functions produce lazy lists.

Unfortunately, this nice feature then bit me on the bottom in my first draft, as the use of a lazy map2 in line 6 blew the stack with large inputs (why? not completely sure yet). The standard library has an eager map as well as a lazy one but lacks an eager map2, so I fixed this by converting the number row to an array (arguably the more natural type for it).

The Yeti compiler runs very quickly and compiles to Java .class files. With a small program like this, I would usually just invoke it and have the compiler build and run it in one go:

$ time yc ./sum.yeti big-data.csv 

real 0m14.440s
user 0m26.867s
sys 0m0.423s

Those timings are interesting, because this is the only example to use more than one processor—the JVM uses a second thread for garbage collection. So it took more time than the MLton binary, but finished quicker…


F♯ is an ML-style language developed at Microsoft and subsequently open-sourced, with a substantial level of integration with the .NET platform and libraries.

(Edit: as with the OCaml example, you’ll find suggestions for alternative ways to write this in the comments below.)

let addTo totals row =
    match totals with
    | [||] -> row
    | _ ->
       if Array.length totals = Array.length row then
         Array.map2 (+) totals row
       else failwith "Inconsistent-length rows"

let sumOfFields fields =
    let rows = ( float) fields in
    Seq.fold addTo [||] rows

let fieldsOfFile filename = 
    seq { use s = System.IO.File.OpenText(filename)
          while not s.EndOfStream do yield s.ReadLine().Split ',' }

let sumAndPrintFile filename =
    let result = fieldsOfFile filename |> sumOfFields in
    printfn "%s" (String.concat "," ( string result))

let main argv = 
    match argv with
    | [|filename|] -> (sumAndPrintFile filename; 0)
    | _ -> failwith "Exactly 1 filename must be given"

F♯ also has language support for lazy lists, but with different syntax (they’re called sequences) and providing a Python-style yield keyword to generate them via continuations. The sequence generator here came from one of the example tutorials.

A lot of real F♯ code looks like it’s mostly plugging together .NET calls, and there are a lot of capital letters going around, but the basic functional syntax is almost exactly OCaml. It’s interesting that the fundamental unit of text output seems to be the formatted print (printfn). I gather F♯ programmers are fond of their |> operator, so I threw in one of those.

I’m running Linux so I used the open source edition of the F♯ compiler:

$ time fsharpc -o sum-fs.exe sum.fs
F# Compiler for F# 3.1 (Open Source Edition)
Freely distributed under the Apache 2.0 Open Source License

real 0m2.115s
user 0m2.037s
sys 0m0.063s
$ time ./sum-fs.exe big-data.csv 

real 0m13.944s
user 0m13.863s
sys 0m0.070s

The compiler produced a mere 7680-byte .NET assembly, that of course (like Yeti) requires a substantial managed runtime. Performance seems pretty good.


Python is not an ML-like language; I include it just for comparison.

import sys

def add_to(totals, values):
    n = len(totals)
    if n == 0:
        return values
    elif n == len(values):
        return [totals[i] + values[i] for i in range(n)]
        raise RuntimeError("Inconsistent-length rows")
def add_line_to(totals, line):
    values = [float(s) for s in line.strip().split(',')]
    return add_to(totals, values)

def sum_file(filename):
    f = open(filename, 'r')
    totals = []
    for line in f:
        totals = add_line_to(totals, line)
    return totals

if __name__ == '__main__':
    if len(sys.argv) != 2:
        raise RuntimeError("Exactly 1 filename must be given")
    result = sum_file(sys.argv[1])
    print(','.join([str(v) for v in result]))

Feels odd having to use the return keyword again, after using languages in which one just leaves the result at the end of the function.

This is compact and readable. A big difference from the above languages is invisible—it’s dynamically typed, without any compile-time type checking.

To build and run this, I just invoked Python on it:

$ time python ./ ./big-data.csv 

real 0m10.939s
user 0m10.853s
sys 0m0.060s

That’s Python 3. Python 2 was about a second faster. I was quite impressed by this result, having expected to suffer from my decision to always return new lists of totals rather than updating the values in them.

Any conclusions?

Well, it was a fun exercise. Although I’ve written more in these languages than appears here, and read quite a bit about all of them, I’m still pretty ignorant about the library possibilities for most of them, as well as about the object support in OCaml and F♯.

I am naively impressed by the OCaml compiler. For language “feel”, it gave me the least favourable first impression but I can imagine it being pleasant to use daily.

F♯ on Linux proved unexpectedly straightforward (and fast). Could be a nice choice for web and server applications.

I have made small web and server applications using Yeti and enjoyed the experience. Being able to integrate with existing Java code is good, though of course doubly so when the available libraries in the language itself are so limited.

Standard ML has a clarity and simplicity I really like, and I’d still love to try to use it for something serious. It’s just, well, nobody else seems to—I bet quite a lot of people have learned the language as undergrads (as I did) but it doesn’t seem to be the popular choice outside it. Hardly anyone uses Yeti either, but the Java interoperability means you aren’t so dependent on other developers.

Practically speaking, for jobs like this, and where you want to run something yourself or give someone the source, there’s not much here to recommend anything other than Python. Of course I do appreciate both compile-time typechecking and (for some problems) a more functional style, which is why I’m writing this at all.

But the fact that compilers for both SML and OCaml can generate compact and quick native binaries is interesting, and Yeti and F♯ are notable for their engagement with other existing frameworks.

If you’ve any thoughts or suggestions, do leave a comment.