Uncategorized

Zero-based indexing

The excellent Greg Wilson, founder of Software Carpentry, tweeted the above link to a 2013 blog post by Mike Hoye the other day.

I didn’t comment on this article when it first appeared because I didn’t have the nerve to confront its author, who was shouting down everyone who tried to discuss it in the comments. But I can’t bear to see this article promoted again, and by a good authority.

The article claims that the reason most of today’s programming languages use zero-based indexing (i.e. they count array indexes from 0, so that arr[0] is the first element of array arr, rather than arr[1]) is because it saved a tiny amount of compile time (not run time), and that this mattered because on a specific IBM mainframe hosted by MIT in the 70s there was a danger that a job taking too long to compile might be bumped in order to make way for a program to calculate handicap points for yacht racing.

This is a pretty implausible suggestion, so it needs some pretty good evidence. That isn’t there. The article has some very nice sources, but the quotes from them just don’t support the proposition they’re being asked to support. The main quotes, from Martin Richards and Tom Van Vleck, both appear to say nearly the opposite of the things they’re described as saying. There’s plenty of room for nuance in interpreting in what people say, but the author accepts no nuance in anyone’s responses to the article, choosing instead to mock and ridicule anyone who doesn’t agree with him. There’s no citation for the one thing that is necessary to make the argument hold together (that indexes were calculated at compile time rather than run time). Reading this article carefully, the only conclusion I can draw is that the choice of 0-based indexing almost certainly has nothing to do with yachts.

I don’t mind a great deal whether a programming language uses 0-based or 1-based indexing. The reason this matters to me is because the article is not just a screed with a funny story in it, but a call for rigour in understanding the history of programming languages, something I do care about and that its author appears to take very seriously indeed. Its general principle is really sound — we get used to a lot of arbitrary aspects of languages and then explain them as the mythology of the elders, rather than finding the actual reasons. But this article only added to the mythology, and people who know better are now citing it as if it had been established to be true, which it almost certainly isn’t.

(I feel really bad just writing this. It’s quite possible the author is regretting ever getting involved in this stupid topic but has too much integrity to take down or edit the post. I wish I had never been reminded of how maddening I found it.)