Code

Spare us humans from XML

XML appeared in 1996, was refined during 1997, and was standardised in 1998.

I remember a lot of excitement about it at the time, from managers who imagined it would solve all their data portability problems. I was conscious of some of this enthusiasm before I really looked at the format.

When I did, I thought

ugh

But it wasn’t as bad as all that. The idea of XML was to standardise the lexical layer for a file format, helping to cut down on the programmer’s natural tendency to just bodge something up when saving and worry about loading later.

It worked, and it made and still makes a tolerable format for all kinds of things—within limits. Of course it’s horribly verbose and slow to parse, but hey, it compresses well. And you still don’t get reliable interchange unless you have a known storage structure on both sides, something a series of increasingly complex helper standards evolved atop XML to help you with.

One thing XML never was, though, was nice for humans to read.

At the time this seemed OK, because it obviously wasn’t intended to be for humans. We humans would never be editing it. We’d only ever be looking at it through filters and lenses and programs that knew what it really meant. We’d never actually have to see the format.

Fifteen years later, here I am sitting looking at

<if>
    <condition>
        <isset property="tested.project.dir" />
    </condition>
    <then>
        <property name="tested.project.absolute.dir" location="${tested.project.dir}" />
        <xpath input="${tested.project.absolute.dir}/AndroidManifest.xml"
            expression="/manifest/@package" output="tested.manifest.package" />
        <if>
            <condition>
                <isset property="tested.manifest.package" />

Oof! Enough of that.  But that’s Android development: of course that’s for robots.

Windows Phone is for people, though. How about:

<phone:PhoneApplicationPage.ApplicationBar>
    <shell:ApplicationBar Opacity="0">
        <shell:ApplicationBarIconButton Text="previous" IsEnabled="False"
            IconUri="/Shared/Images/appbar.left.png" Click="PreviousButton_Click"/>
        <shell:ApplicationBarIconButton Text="page jump"
            IconUri="/Images/appbar.book.png" Click="JumpButton_Click"/>
        <shell:ApplicationBarIconButton Text="settings" IconUri=

Aargh.

These aren’t pathological examples that I’m having to grub around the internals of some graphical environment in order to find. That last one, for example, is copied verbatim from a beginners’ programming book for Windows Phone 7.

I’d far, far rather see developers use XML than go back to rolling completely unique file formats for every application. But surely by now there are enough widely-supported data representation languages—formats that simplify the presentation by standardising object relationships as well as lexical details, such as JSON or RDF/Turtle—to cover the majority of situations in which humans may end up having to edit something?

7 thoughts on “Spare us humans from XML

  1. Bang on.

    Initially, I never saw the point of XML.
    Then thought it was quite a good idea.
    Then realised that it didn’t really make life that much easier for serialising out your data.
    Then tools and languages made that better and it works quite nicely for stuff like SOAP from C# objects.

    But then XAML …. I guess they expect you to use Expression Blend to create it, rather than to be editting it, but I always find editting the XAML to get the better results. It’s hardly a nice experience.

  2. XAML is no worse than HTML. It’s a markup language for user interfaces, and does a better job of defining application layout than the document-focused HTML. I don’t think it’s much uglier, and whereas it’s not beautiful, it is at least human WRITEABLE, unlike flash.

    I think the real problem with XML is not the concept of XML itself, but the over-complication of the surrounding standards caused by a misunderstanding of the difference between data and semantics. For instance, the basics of XML itself can pretty much be defined in a few sentences. The use of DTDs to determine well-formed documents can be handy. But the standard for XML Schemas is massively more complicated and doesn’t actually do anything to help with XML communications in the real world (it’s easier and more efficient to parse the document and throw errors, performing validation in the client application).

    The reason for the complexity is that XML Schemas are trying to define semantics within a data format. Leaving it as a pure data format and letting the application(s) handle the semantics is always more sensible because those applications are written in languages whose job is to define semantics, just as XML’s job is to structure data.

    Sure, XML has some unnecessary complications – attributes are redundant and get heavily misused, for instance. It is verbose, but compresses nicely. The fact that it is (at a pinch) human readable can be useful. On one project recently I have been working with .NET/MVC on the server and the knockoutjs library on the client. All the data is serialized as JSON, and even with relatively small datasets it gets unreadable way more quickly than an XML serialization.

    1. To find JSON less readable than XML is surprising. Why would that be? Something to do with all the redundant gubbins in XML giving you a contextual cushion?

      Admittedly my frustration with this assumption that XML “solves” your format problem partly springs from my encounters with RDF/XML, which is a pretty bad example of an XML format.

      Take something as simple (in Turtle syntax) as this fragment:

      @prefix : <#> .
      :tiddles a :cat .
      

      Convert it to RDF/XML:

      <?xml version="1.0" encoding="utf-8"?>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="file://x.rdf#" xml:base="file://x.rdf">
        <rdf:Description rdf:about="file://x.rdf#tiddles">
          <rdf:type rdf:resource="file://x.rdf#cat"/>
        </rdf:Description>
      </rdf:RDF>
      

      But it isn’t really a question of semantics—RDF doesn’t define any semantics either. It’s just a fairly simple data structure, but one that happens to have many possible mappings to XML (as any data structure will) of which the “standard” one happens to be particularly difficult to read.

  3. As you say, that’s more a function of RDF/XML than XML per se. Here’s a fragment of serialized data from a Fakoli site:

    2
    1
    Global Menu
    global

    0
    0

    1
    Classifications
    1
    0

    0
    /management/classification
    0
    admin

    1

    Seems pretty readable to me, although I am undoubtedly biased. Definitely verbose, but it compresses well. To be fair JSON isn’t that bad to read if it is pretty printed, but it is almost always sent on the wire as a big glob of code with no spaces.

    1. Yes, it’s hazardous trying to cite XML in an HTML post!

      To be fair JSON isn’t that bad to read if it is pretty printed, but it is almost always sent on the wire as a big glob of code with no spaces.

      But you’re comparing it with pretty-printed XML? You wouldn’t quote the wire format for an XML example — apart from anything else, it’s quite likely to be binary, compressed.

  4. And of course it ate all my tags.

    <?xml version=”1.0″ encoding=”iso-8859-1″?>
    <Fakoli>
    <MenuMap>
    <MenuList>
    <Menu>
    <menu_id>2</menu_id>
    <site_id>1</site_id>
    <name>Global Menu</name>
    <identifier>global</identifier>
    <description/>
    <css_class/>
    <highlight_current_item>0</highlight_current_item>
    <highlight_current_section>0</highlight_current_section>
    </Menu>
    </MenuList>
    <MenuItemList>
    <MenuItem>
    <menu_item_id>1</menu_item_id>
    <title>Classifications</title>
    <menu_id>1</menu_id>
    <parent_id>0</parent_id>
    <identifier/>
    <page_id>0</page_id>
    <url>/management/classification</url>
    <sort_order>0</sort_order>
    <role>admin</role>
    <permissions/>
    <published>1</published>
    </MenuItem>
    </MenuItemList>
    </MenuMap>
    </Fakoli>

Comments are closed.