In my previous post I explained a small but significant API change for v0.9 of the Dataquay library.
Although there was nothing very deep about this change or its causes, I found it interesting partly because I had used a partly test-driven process to evolve the original API and I felt there may be a connection between the process and any resulting problems. Here are a few thoughts prompted by this change.
Passing the tests is not enough
Test-driven development is a satisfying and welcome prop. It allows you to reframe difficult questions of algorithm design in terms of easier questions about what an algorithm should produce.
But producing the right results in every test case you can think of is not enough. It’s possible to exercise almost the whole of your implementation in terms of static coverage, yet still have the wrong API.
In other words, it may be just as easy to overfit the API to the test cases as it is to overfit the test cases to the implementation.
Unit testing may be easier than API design
So, designing a good API is harder than writing tests for it. But to rephrase that more encouragingly: writing tests is easier than designing the API.
If, like me, you’re used to thinking of unit testing as requiring more effort than “just bunging together an API”, this should be a worthwhile corrective in both directions.
API design is harder than you think, but unit testing is easier. Having unit tests doesn’t make it any harder to change the API, either: maintaining tests during redesign is seldom difficult, and having tests helps to ensure the logic doesn’t get broken.
Types are not just annoying artifacts of the programming language
An unfortunate consequence of having worked with data representation systems like RDF mostly in the context of Web backends and scripting languages is that it leads to a tendency to treat everything as “just a string”.
This is fine if your string has enough syntax to be able to distinguish types properly by parsing it—for example, if you represent RDF using Turtle and query it using SPARQL.
But if you break down your data model into individual node components while continuing to represent those as untyped strings, you’re going to be in trouble. You can’t get away without understanding, and somewhere making explicit, the underlying type model.
Predictability contributes to simplicity
A simpler API is not necessarily one that leads to fewer or shorter lines of code. It’s one that leads to less confusion and more certainty, and carrying around type information helps, just as precondition testing and fail-fast principles can.
It’s probably still wrong
I’ve effectively found and fixed a bug, one that happened to be in the API rather than the implementation. But there are probably still many remaining. I need a broader population of software using the library before I can be really confident that the API works.
Of course it’s not unusual to see significant API holes in 1.0 releases of a library, and to get them tightened up for 2.0. It’s not the end of the world. But it ought to be easier and cheaper to fix these things earlier rather than later.
Now, I wonder what else is wrong…
4 thoughts on “Small conclusions about APIs and testing”
> But there are probably still many remaining. I need a broader population of software using the library before I can be really confident that the API works.
Most of the last 10 years of my work has been around API design and implementation and this is always the problem.
It is really really hard to sit back and think about all of the ways in which a stranger might try and use your API when they look at what you have given them and it is similarly difficult to imagine all the things that they want to do in the field that you are trying to cover with your API.
This is what makes a Software Architects job so hard if they have to do it in isolation. Much easier is a generaised clean implementation that you are able to refactor a few times before you have to commit to it as a published API.
But although it may be easier to say (effectively) that you’d like to pin down an implementation through test cases and then evolve the API based on user experience, it seems a bit tough on the users. You’d feel there ought to be more of a science to it.
Of course there is — a bit — and there are some guidelines out there. Here’s a recent example in a somewhat different technical field, from Matt Gemmell. These look pretty sound, though there are 25 of them, at least half of which are phrased in ways that are specific to the technology in use — it’s quite a complicated business in other words. Most of these suggestions are things I recognise and follow, though possibly no. 15 might cover the particular case I’m describing here.
I’ve always thought of myself as a programmer or developer, never a software architect (which sounds like an impossible ideal in a way) and API design is I think one of my weaker points. I lack a general set of instinctive rules for telling me whether an interface is going to seem right when a different programmer sees it; I’m not sure how far these things are generalisable among languages and environments. The best I can generally do is to document it, and see whether the documentation is as simple as it should be.
It probably says something though that some of the basic consistency principles used in something like X11 from 25 years ago (consistent prefix/suffix function naming, common arguments first, etc) are still quite often ignored. I think I’m reasonably OK at naming and argument structuring, it’s the semantic side I need to pay more attention to.
I think that it isn’t hard to see, even in the case of companies like Microsoft that have people that have been doing this stuff forever, that if you become a user of a young API then you can expect some distruption along the way in the early times. Either that, or the implementers quickly lose producivity as they try and maintain API calls that no longer really fit into the current internals.
As an implementer, I don’t mind a bit of this backward support, it kind of feels like you are doing some real enginerring.
I guess the real killer is if you suddenly realise that some of your published API, can’t possibly work in a complete and stable manner with the arguments that it currently takes and you are going to have to kill it. These are the sort of things that I always struggle with, because I never do deep design, my design is usually, realise what needs to be done at a medium sort of level and start coding it, then as soon as you see some sensible structure arriving, refactor it out into a cleaner shape.
Thinking about, it seems that designing an API should be a kind of similar process to designing a UI, you start by getting something that works, then you stand back and try and look at it as though it isn’t your own work and really try and feel what it would be like to use. Seems like a dark art to me though. I think I’ve got better at both of these lately, having spent more time writing WPF code and learning MVC architecture.
You are definitely right that some APIs that you see aren’t even named and structured in a helpful way. I picked up alot about structuring them since many of APIs that I’ve done have had an eye towards language interop, so have taken the sensible route of following many of the COM suggestions for arg passing etc, but I’m always shocked when you see an API that doesn’t even consider that there might be a different memory allocator on the other side of the fence.
Comments are closed.