Learning to read Arabic writing: one of my better ideas

I live in London not far from Paddington, where Arabic writing is often seen:

road

I spent my first few years in the area a bit oblivious to this (shops are shops), but eventually I started to wonder about simple things like: are these all the same language and script, or do they just look similar? And of course: what do they say? Then two years ago I took a gamble on the notion that this might be Arabic, and signed up for Arabic evening classes.

On the first day of the class, we were all asked why we had chosen to study Arabic. Everyone else had a proper explanation – planning to study in an Arabic-speaking country, dispatched to an Arabic-speaking country for business, have a parent who speaks Arabic and want to catch up, etc. I’d like to report that I said “I want to be able to read the shop signs on Edgware Road”, but I wasn’t bold enough, so I just cited curiosity.

I kept up the classes (one evening a week) for a year. Arabic is a difficult language and I didn’t excel. I learned simple introductions, some directions, some colours, a bit of grammar, and that I can’t pronounce the letter ع any better than any other native English speaker can. I learned enough that I can now recognise the odd word when I hear people speaking Arabic, but not enough to join in, and anyway I’ve always been very self-conscious about speaking other languages. But I am now able to slowly read (and write) the alphabet.

Predictably enough, it turns out the signage in Arabic around here usually says the same thing as the Roman lettering next to it. That’s the case for most of the text in the street-view photo above, for example. That could be disappointing, but I find it rather liberating. When people put Arabic text on a sign in this country, they aren’t trying to make things weird for native-English-speaking locals, they’re trying to make it easier for everyone else.

Arabic, the language, has 400-odd million speakers worldwide. Arabic the alphabet serves up to a billion users. Besides the Arabic language, it’s used for Persian and Urdu¹, both of which are quite dissimilar to Arabic. As it turns out, most of the places near me that I was interested in are in fact Arabic-speaking, but there are quite a few Persian places as well and Urdu, being the primary language of Pakistan, is widely used in the UK too.

(I have since had it pointed out to me that, for an English speaker whose main aim is to learn to read the script, going to Persian classes would have been easier than Arabic. Persian is an Indo-European language, it’s grammatically simpler, and the language you learn in classes is a form that people actually speak, whereas the standard Arabic taught to learners here I gather is different from anything spoken on the street anywhere. I have since bought a Persian grammar book, just in case I feel inspired.)

Learning the basics of how to read Arabic gives me a feeling of delight and reassurance, as if I am poking a hole for my brain to look out and find that a previously unfamiliar slice of the world’s population is doing the same stuff as those of us who happen to be users of the Roman alphabet. I recommend it.

Notes for the clueless about the Arabic alphabet

  • It’s written and read right-to-left. This is probably the only thing I did know before I started actively learning about it.
  • It is an alphabet, not a syllabary like Japanese kana or a logographic system like Chinese writing.
  • It is very much structured as a script. Each letter could have up to four shapes (initial, middle, final, standalone) depending on how it joins to the letters around it, so that the whole word flows smoothly. I think this contributes a lot to the sense of mystery “we” have about Arabic. The Cyrillic, Hebrew, and Greek alphabets are not intrinsically any more mysterious, but they are a lot more obviously composed of letters that can be individually mapped to Roman ones.
  • Short vowel sounds are not written down at all. This is unfortunate for the learner, as it means you often can’t pronounce a word unless you already know it. There is a system for annotating them, but it’s not generally used except in the Koran and sometimes in textbooks or Wikipedia where avoiding ambiguity is paramount.
  • There are 28-odd letters, but the number depends on what you’re reading – Persian adds a few over Arabic, but I think it also has some duplicates.
  • Some letters are very distinctive; for example the only letter with two dots below it is the common ي “ya”, which generally maps to an “ee” sound. Others are quite hard to spot because you have to know the joining rules to distinguish them in the middle of a word.
  • You could transliterate any language to Arabic, just as you can transliterate anything to the Roman alphabet. The result might be awkward, but there’s no reason you can’t write English in Arabic letters and have it be just about comprensible. I imagine there must be people who routinely do this.

 

¹ I know no Urdu, but I understand it’s typically written in the Arabic alphabet but with a more flowing script (Nastaliq, نستعلیق) than is typically used for modern Arabic or Persian. An interesting calligraphic distinction between languages. I first heard of Nastaliq through a fascinating article by Ali Eteraz in 2013, The Death of the Urdu Script, which lamented that it was too hard to display it on current devices. The situation has apparently improved since then.

 

Why I will be voting “in” this Thursday

Although the public debate about this week’s EU referendum in the UK has become absurdly bitter on both sides, I have had some constructive talks about the subject with people around me, even where we have disagreed. There is, or was, a reasonable debate to be had and it’s a pity we haven’t seen a sensible national discussion about it.

In the spirit of trying to be positive: here are five reasons why I would like the UK to remain in the EU, without talking about the personalities or made-up economic projections coming from the campaigns on either side.

1. The EU has a useful role in the UK in terms of long-term oversight

This country has no written constitution and has an effectively two-party parliamentary system in which each new government starts by setting out to undo whatever its predecessor did. Institutions like the European Court of Human Rights give us both longer-term continuity and a moderating influence across the various ideologies of the European states. They’re a good thing.

I might feel differently on this if I thought the Leave campaign were keen to make up for exit with better constitutional protections in the UK. Unfortunately the impression I get is the opposite.

(I think this argument holds even for lower-level things like food labelling and sourcing regulations. After all, those are also the regulations that mean a Cornish pasty is a pasty from Cornwall wherever you buy it in the EU, not just a meat pie from a factory in Denmark with Cornish Pasty printed on the pack.)

2. Our position within the EU is a great one

We have full membership of the EU without the tricky bit (the Euro) and with a membership rebate that we could never negotiate again. It’s the best of both worlds already. Any country in the world would envy that.

3. Leaving won’t give us more independence

I understand the argument that a state should strive to be self-determining as far as possible. I just don’t think that leaving the EU would have a happy outcome in that respect.

It wouldn’t change anything about who runs this country or how they run it, and it wouldn’t send a message that anybody would be equipped to act on. Our government would continue to have the same pro-business pro-international-collaboration outlook, for good or bad. We would almost certainly end up leaning more than ever on the USA, a country we would no longer have much to offer in return, while scrabbling around for other partnerships and making poorer deals with other European states.

4. Immigration is a red herring, but freedom of movement is a good thing

Immigration is clearly a subject that people feel viscerally about. But the sort of mass migration being exploited for this argument, of refugees from Syria for example, has nothing to do with the subject we’re supposed to be deciding on — we already turn those people away (Calais, remember?). I obviously have views about that (who doesn’t) but it makes no sense for it to be a pivotal subject for this referendum.

What is relevant is freedom of movement for workers within the EU. I think this is a good thing, partly because it’s how we can have world-leading research labs like (ahem) the one I work in, and partly because it cuts both ways — Britons can and do move abroad as well (permanently or temporarily) and this openness is a great part of providing opportunities and prospects for future generations.

People of my age or older may remember the 80s TV series Auf Wiedersehen, Pet, a comedy about British builders working in Germany. A central prop of that programme was that there was something ramshackle about their arrangement and that they were at the mercy of exploitative employers and tax rules as migrant workers. We’ve become unused to thinking of British migrant workers as being exploited in this way.

I know that there is also a narrative about other EU citizens coming to the UK simply to claim benefits. The great majority of people who move here do so either to work or to study, or because they are married to British citizens. Many British citizens draw benefits abroad as well. The overall balance of numbers doesn’t in any way reflect the anxiety people have about it. That anxiety is serious, but it isn’t something that this referendum can properly address with either outcome.

The question of what would happen to EU workers who are already in the UK, if we left, seems like such a massive quagmire that I don’t want to think about it. I don’t think it could be very harmonious.

5. I’d like to see positivity prevail

There’s something very British about willingly engaging in an endeavour (after a referendum!) and then whingeing about it constantly for the next 40 years.

The tone from British media and politicians for decades now has been mostly about how onerous the EU is and “what can it do for us?”, very seldom about the power it gives us or what we can do together with the other countries within it. This negative guff is forced on us by media barons who genuinely have no reason to give a damn about us in the first place, and it ends up setting a very miserable tone. Let’s resist!

 

Naming conventions in Standard ML

Many programming languages have a standard document that describes how to write and capitalise the names of functions, variables, and source files. It’s especially useful to have a standard for writing names made up from more than one word, where there are various options for how to join the words: “camel case”, which looks likeThis (with a capital letter “hump” in the middle), or “snake case”, which is underscore_separated.

I think Java in the mid-90s was the first really mainstream language to standardise file and variable naming conventions. The Java package mechanism requires files to be laid out in a particular way, and Sun published Java coding conventions which quickly became an effective standard for class and variable naming. Other languages followed. Python has had a standard that covers naming (PEP8) since 2001. More recent examples include Go and Swift.

Older languages tend to be less consistent. C++ is a mess: the standard library and most official example material uses snake_case for most names, but a great many developers, including those on most of the projects I’ve worked on, prefer camelCase, with capital initials for class names. File names are even more various: C++ source files are seen with .cpp, .cxx, .cc, and .C extensions; C++ header files with .h, .hpp, or no extension at all.

Standard ML (SML) is also a mess, and an interesting one because the language itself was standardised in 1990 and has been completely unchanged since the standard was revised in 1997. So although it is super-standardised, it’s a bit too old to have caught the wider shift in sentiment toward prescribing things like naming and file structure.

The SML standard is formal and very focused. It says nothing about coding style or naming, contains almost no examples using compound names, says nothing about filenames or file organisation, and specifies no way for one file to refer to another — the standard is indifferent to whether your source code is held in a file at all.

In trying to establish what naming conventions to use for my own code, I decided to look around at some existing libraries in SML to see what they had settled on.

The Basis library

SML has a standard library, the Basis library, which is a bit more recent than the language itself. Although it isn’t prescriptive, the library does use certain conventions itself and the introductory notes explain what they are. These cover only names of things within a program — not filenames, which are left up to the implementor of the standard. I’ll refer to them in the table below.

The Cornell style guide

Top search result for “SML naming conventions” for me is this online style guide for the Cornell CS312 course. It doesn’t cover file naming. Given the limited industry uptake for SML, an academic guide may be proportionately more influential than for other languages. I’ll mention this guide below as well.

Other code I looked at

I took a look at the following code:

  • The source of the MLton, MLKit, and SMLSharp compilers (excluding accompanying utility libraries)
  • The Basis library implementations shipped with MLton and SMLSharp
  • The SML/NJ extended library
  • The source of the Ur/Web language
  • The Ponyo library, an interesting fledgling effort to produce a broader base library than the Basis

In total, about 444,500 lines of code across 1790 SML source files. Some (presumably automatically-generated) source files are very long; while the mean file length is 248 lines including comments and blanks, the median is only 47.

Names within the language

The SML language has at least seven categories of things that need names: variables, type names, datatype constructors, exceptions, structures, signatures, and functors.

(By “variables” I really mean bindings, i.e. the vast majority of ordinary things with names: things that in a procedural language might include function names, variable names, and constant declarations. I’m using the word “variable” because it’s such a familiar everyday programming term.)

Source Variable Type name Datatype constructor Exception Structure Signature Functor
mlton variableName (mixed) DatatypeCtor ExceptionName* StructureName SIGNATURE_NAME FunctorName
mlkit (mixed) (mixed) DatatypeCtor* ExceptionName* StructureName SIGNATURE_NAME FunctorName
smlsharp variableName typeName* DATATYPE_CTOR* ExceptionName StructureName SIGNATURE_NAME FunctorName
basis variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName
smlnj-lib variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
urweb variableName type_name* DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorNameFn
ponyo variableName typeName DatatypeCtor ExceptionName Structure_Name SIGNATURE_NAME Functor_Name
cornell variableName type_name DatatypeCtor ExceptionName StructureName SIGNATURE_NAME FunctorName

* mostly

Here’s what I found, categorised into universal conventions, usual conventions, and “other”.

Universal

The following is the only universal convention:

Signature
SIGNATURE_NAME

The only code I found that doesn’t follow this convention is in the SML standard itself, which omits the underscore (like SIGNATURENAME).

Usual

The following conventions are not universal, but more popular than any other.

Variable Type name Exception Structure Functor
variableName type_name ExceptionName StructureName FunctorName

Camel case is clearly idiomatic for everything except type names. MLKit contains some snake-cased bindings as well, but none of the other libraries did. I like snake case in SML and I’ve written a fair bit of code using it myself; I hadn’t realised until now how uncommon it was. (It’s more common in SML’s sibling language OCaml. Ironic that, of the three very similar languages SML, OCaml, and F#, the only one not to use camel case is called OCaml.)

I spotted a handful of all-caps exception names and some camel case type names, but no library preferred those consistently.

The Ponyo library differs from the above for structures (Structure_Name) and functors (Functor_Name).

The SML/NJ library sort-of differs for functors, which are given a Fn suffix (FunctorNameFn). But you could think of this as part of the name, in which case the convention is the same.

Most type and datatype names used in public APIs are single words, or even single letters, so the convention often doesn’t matter for those.

Other

There seems to be no consensus about datatype constructors — I found DatatypeConstructor and DATATYPE_CONSTRUCTOR in roughly equal number.

Filenames

Nothing in the SML standard or Basis library cares about what source files are called, what file extension they use, or how you divide your code up among them. Some compilers might care, but most don’t. The business of telling the compiler which files a program consists of, or of expressing any relationships between files, is left up to external tools. SML has neither header files nor import directives.

This makes fertile ground for variety in naming schemes.

I’m going to consider only filenames that are associated with a primary structure, signature, or functor. Here’s the table.

Source Structure Signature Functor
mlton structure-name.sml signature-name.sig functor-name.fun
mlkit StructureName.sml SIGNATURE_NAME.sml* FunctorName.sml
smlsharp StructureName.sml SIGNATURE_NAME.sig* FunctorName.sml
mlton-basis structure-name.sml signature-name.sig functor-name.fun
smlsharp-basis StructureName.sml SIGNATURE_NAME.sig (none)
snlnj-lib structure-name.sml signature-name-sig.sml functor-name-fn.sml
urweb structure_name.sml signature_name.sig (n/a)
ponyo Structure_Name.sml SIGNATURE_NAME.ML Functor_Name.sml

* mostly

Clearly very inconsistent. There are no universal or usual conventions, only “other”.

Behind this there is a wider question about code organisation in files — should each signature live in its own file? Each structure? In many cases they do, but that is also far from universal.

If you use a scheme in which filenames are clearly derived from signature and structure names, does that mean you shouldn’t put more than one structure in the same file? What do you do with code that is not in any structure? Really it’s a pity to have to think about filenames at all, in a language that is so completely indifferent to file structure.

A Reasonable Recommendation

A plausible set of rules based on the above.

For names within the language:

Variable Type name Datatype constructor Exception Structure Signature Functor
variableName type_name DATATYPE_CTOR ExceptionName StructureName SIGNATURE_NAME FunctorName

This is the style used by the Basis library. Apart from datatype constructors, everything here was in the majority within the libraries I looked at.

For datatype constructors it seems reasonable to pick the most visible option and one that is consistent with the names in Basis. (This differs from the Cornell guide, however.) There is no confusion between these and signature names, because signature names never appear anywhere except in the declaration lines for those signatures and the structures that implement them.

For filenames:

Structure Signature Functor
structure-name.sml signature-name.sig functor-name.sml

The logic here is:

  • It’s still not a great idea to expect a case sensitive filesystem, so all-one-case is good
  • Generally use .sml extension for SML source
  • But the .sig extension for signatures seems very widely used, and it’s fair to make public signatures as easy to spot as possible
  • The .ml extension is not a great idea because it clashes with OCaml
  • The .fun extension used by MLton is a bit obscure, and you don’t always want to separate out functors (if you want to make functors more distinctive, give them names ending in Fn, as the SML/NJ library does).

 

Bowie

Here’s a playlist of good David Bowie songs that I had never heard until after he died last week.

Spotify playlist
YouTube links:
Dead Against It (1993)
Up The Hill Backwards (1980)
Move On (1979)
Dancing With The Big Boys (1984) (with Iggy Pop)
I Would Be Your Slave (2002)
Girl Loves Me (2016)
You’ve Been Around [Dangers 12″] (1993) (Jack Dangers remix)
Nite Flights (1993) (Scott Walker cover)
No Control (1995)
Bring Me The Disco King (2003)
I’m Deranged (1995)
5:15 The Angels Have Gone (2002)

Most of these came out after the peak of his popularity, but they aren’t obscure at all — I was just never a fan.

The first Bowie songs I remember hearing on the radio were Modern Love and Let’s Dance, both released in 1983 when I was eleven. I thought those two were fine, though they weren’t the sort of thing I liked to think I was into at the time. (I had a big Motörhead war-pig patch on my little denim jacket. Lemmy’s death was also the end of an era.)

A few years later, a cousin introduced me to some of the Spiders from Mars period songs like Rebel Rebel and Suffragette City. I was a bit puzzled because I thought I knew Bowie as a smooth, modern 80s-sounding chap. But I didn’t get the appeal either: too much like everything else from the early 70s. Rebel Rebel even sounded like the Stones, which was definitely my dad’s music.

Back in the real timeline of the 80s, Bowie was releasing Never Let Me Down, an album seen everywhere (one of several awful record covers) but seldom played, then launching the drearily adult Tin Machine.

His next album, Black Tie White Noise, didn’t come out until 1993, when I was briefly in Berlin as a student and mostly listening to industrial music and obscure things I read about in Usenet groups. If I had been aware that David Bowie had an album out, I would certainly have ignored it. By the time of 1997’s Earthling, a jungle-influenced album released a whole two years after peak jungle with a dreadful Cool Britannia cover, it felt socially impossible to admit to liking a Bowie song ever again. And that was pretty much the end of that.

There’s been a David Bowie album, collaboration, tour, or retrospective for almost every year of my life, and I’ve never taken more than a passing interest in any of them.

I was taken by surprise, then, by how emotional I felt about his death.

***

What did eventually make me notice David Bowie as a figure was the connection with Iggy Pop. I think Iggy is brilliant, and I’d been a bit of a fan for a while before I eventually twigged what it was that his most interesting stuff had in common. That made me aware of the famously dramatic and productive spell for those two in Berlin the late 70s (the only albums of Bowie’s that I ever actually bought are from this period) and also an opening to a bit of a web of interesting collaborations and influences.

(Going back during the past week and filling in a lot of the songs of Bowie’s that I’ve missed during the last few decades, it’s been particularly fun to hear Iggy Pop numbers, er, pop up all over the place. China Girl — always an Iggy song to me — is well known, but there are at least three other albums that recycle songs previously recorded by him, including a straight cover of the flop lead single from Iggy’s most foolish album. A sustained friendship.)

***

So something of the emotion for me has to do with all that Berlin stuff. There are two aspects to that. One is the grubbily romantic idea of “pressure-cooker” West Berlin, seen from a distance as a place of hideouts, drugs, spying, longing, separation, and any other melodrama that “we” could project onto it. I’m sure this version of the city was overstated for lyrical purposes, but it probably did exist to a degree. The Berlin that fascinated and frightened me in 1993 was already a very different city, and both versions are hardly visible in today’s shiny metropolis.

The other aspect is the notion that moving to a different town in a different country could give you a new life and make your past disappear, even for someone already so celebrated — that it could really be so simple. What makes that idea available here is that Bowie didn’t just go, but then produced such different work after going that it really could appear as if his past had not gone with him.

This impression of self-effacement alongside all the self-promotion, the ability to erase the past, is a very attractive one for a pop star, and it fits also with the amount of collaborative work Bowie did. From some of the videos you can imagine that he was never happier than when playing keyboards or doing tour production for Iggy, singing backing vocals in a one-off with Pink Floyd, or playing second fiddle to Freddie Mercury or Mick Jagger.

Perl 6

I see the official release of the Perl 6 language specification happened on Christmas day.

The first piece of commercial web development I did was in Perl 5. A lot of people can probably say the same thing. This one was a content-management system led by James Elson in 1999 at PSWeb Ltd, a small agency in Farringdon that renamed itself to Citria and expanded rapidly during 1999-2001 before deflating even more rapidly when the dotcom bust arrived.

My recollection was that this particular CMS only ever had one customer, the BBC, who used it only for their very small Digital Radio site. But I still have a copy of the code and on inspection it turns out to have some comments that must have been added during a later project, so perhaps it did get deployed elsewhere. It was a neat, unambitious system (that’s a good thing, James is a tasteful guy) that presented a dynamic inline-editing blocks-based admin interface on a backend URL while generating static pages at the front end.

I remember there was an open question, for a time, about whether the company should pursue a product strategy and make this first CMS, or something like it, the basis of its business, or else take up a project strategy and use whatever technology from whichever provider seemed most appropriate for each client. The latter approach won out. It’s interesting to speculate about the other option.

(I like to imagine that the release of Perl 6 is sparking tiresome reminiscences like this from ageing programmers across the world.)

Perl 6 looks like an interesting language. (It’s a different language from Perl 5, not compatible with it.) The great strength of Perl was of course its text-processing capacity, and for all the fun/cute/radically-unmaintainable syntax showcased on the Perl 6 site, it’s clear that that’s still a big focus. Perl 6 appears to be the first language to attempt to do the right thing with Unicode from the ground up: that is, to represent text as Unicode graphemes, rather than byte strings (like C/C++), awkward UTF-16 sequences (Java), or ISO-10646 codepoint sequences (Python 3). This could, in principle, mean your ad-hoc botched-together text processing script in Perl 6 has a better chance of working reliably across multilingual inputs than it would in any other language. Since plenty of ad-hoc botched-together text processing still goes on in companies around the world, that has the feel of a good thing.

Replacing the GNOME Shell font in GNOME 3.16

[Edit: see the comment by Hugo Roy after the article, describing a much simpler, more “official” way to achieve this]

When using Linux on a touchscreen laptop, I generally use the GNOME 3 desktop — logical design, big touch targets, good support for high-resolution displays, nice to look at.

While I mostly approve of the design decisions made for GNOME 3, there is one thing about it that I don’t get on with, and that’s its use of the Cantarell font. Cantarell is clear and readable, and a fine default for most UI text, but at the middle of the top of the screen there lives a digital clock:

Clock in Cantarelland I find this strangely annoying to look at. I think it has a lot to do with the excessively round zero. Since it’s always there, it annoys me a surprising amount.

Until GNOME 3.15, it was easy to change the font used throughout GNOME Shell by editing one property in a CSS file. Unfortunately GNOME 3.16 moved that CSS file into an opaque resource bundle and made it accessible only through some awkwardly-designed tools. I can’t fathom how that appeared to be a good idea, but there it is.

Anyway, with help from this forum post I knocked out a script to update this resource file so as to make it prefer the Fira Sans font from FirefoxOS. It makes a copy of the existing resource file with a .dist suffix.

This may be specific to Arch Linux (the distro I’m using), so caution advised if you refer to this for any reason. It’s necessary to have the glib2 and otf-fira-sans packages installed for this to work.

#!/bin/bash

set -eu

rname=gnome-shell-theme.gresource
resource="/usr/share/gnome-shell/$rname"

ext="$(date +%s)$$"
tmpdir="./fix_$ext"
mkdir "$tmpdir"
trap "rm -f $tmpdir/* ; rmdir $tmpdir" 0

manifest="$rname.xml"
cat > "$tmpdir/$manifest" <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<gresources>
<gresource prefix="/org/gnome/shell/theme">
EOF

for file in $(gresource list "$resource"); do
    base=$(basename "$file")
    out="$tmpdir/$base"
    gresource extract "$resource" "$file" > "$out"
    echo "<file>$base</file>" >> "$tmpdir/$manifest"
done

cat >> "$tmpdir/$manifest" <<EOF
</gresource>
</gresources>
EOF

(
    cd "$tmpdir"
    perl -i -p -e 's/font-family:.*;/font-family: "Fira Sans", Cantarell, Sans-Serif;/' gnome-shell.css
    glib-compile-resources "$manifest"
)

sudo cp "$resource" "$resource.dist.$ext"
sudo cp "$tmpdir/$rname" "$resource"

Of course every time an update comes along, it overwrites the resource file and I have to run this again. Which is one reason I’m posting this as a reminder to myself.