Compiling a program is a strange thing to do

Following my previous post about functional languages, a suspicious reader asked about the list of prerequisites I gave for a language: purely functional, Hindley-Milner typing, compiling to JVM bytecode, blah blah blah.

Was that list genuine—or was I by any chance just listing the properties of a language I’d stumbled over at random and decided I liked?

The list was in fact real, if tidied up a bit after the fact. I was looking for something like ML, that I could use in a Java-based environment, for fun, and the things I listed roughly describe that combination. (Looking for a specific language reworked “for the JVM” is not such a strange thing to do: there are quite a lot of them.)

There was an outlier in my list of priorities, though, something that I might not have cared about until recently: the REPL.

Read, evaluate, print

A REPL is a fancy name for a very simple thing: the interactive prompt that comes up when you start a language interpreter without giving it the name of a program to run. You type something, and it types the result back at you. If you keep typing, you might build up a whole interpreted program.

ML does have an interactive environment, but I hardly remember using it. It’s more recent experience with Python and Ruby that reminded me just what a nice thing it is. Interactivity makes a big difference when exploring and understanding a new language. I wouldn’t want to start learning a language without it, now.

The funny thing is that until I first went to university, most of my programming experience involved some sort of interactive environment. I’d never used a compiler. All my programming (such as it was) had been in interpreted BASIC with occasional bits of raw machine code, mostly on the ZX Spectrum home computer. Spectrum BASIC (you can try it here) was slow and limited, but it had an interactive prompt and a helpful editor that would prevent you even entering a line if it spotted a syntax error.

So what a magical day it was, at university, when first we learned what real programming looks like:

$ cc my_first_program.c
$ ./a.out
Segmentation fault (core dumped)
$

Things got even better as C++ took off; it’s always been slow to compile, and I spent a decade or so working largely on programs that took over an hour to build. That’s a tricky feedback loop, which encourages you to try to work on multiple independent things for each compile cycle—probably not to the benefit of any of them.

Compiling your code is a pretty strange thing to do, really. Interpreted languages and languages with automatic bytecode compiling and runtime loading have been around for decades. Even if they’re a bit slower, surely the first priority of a language should be to make things easier for the programmer and increase the chances of getting a program that actually works correctly.

Why compile?

So, why do we still compile code so much of the time?

Here are a few guesses.

Separating low-level from high-level concerns is hard

Interpreted and byte-compiled languages have the possibility of being nearly as fast as compiled ones. Bytecode evaluation optimisers can be very good, and would presumably be better if more work had gone into them rather than into optimising compilation to machine code; alternatively, domains (such as signal processing) that benefit from low-level optimisation might be written using domain-specific languages in which common activities at very high level are interpreted using blocks of optimised low-level code.

But to get these right—particularly at the domain-specific level—you have to do a very good job of understanding the field, the requirements, and the programmers who will be working in the environment. If it’s not good enough, developers will fall back on something they can trust rather than wait for it to improve.

It’s a known quantity

And developers can trust compiled languages, on the whole. Not every task can use an interpreted or domain-specific language: those environments need to be implemented in something, and they need to run on some platform. I guess it’s simpler in the long run to make stable compilers than to implement a multitude of interpreters directly using low-level machine code.

Compiling “finishes” the program

A psychological trick. When you compile something, it’s done. If it seems to work, you can ship it. If you haven’t yet discovered version control, you can copy the binary somewhere and get a treacherous feeling of safety while you mangle, or lose, the source code. In contrast, with an interpreted language, every edit you make seems to risk breaking the whole program.

Humans can’t read machine code

Finally, when you have a compiled binary you can ship it without the source. Thus the whole edifice of proprietary software sale and distribution.

Any more ideas?