In The Toolbox – Dictionary & Thesaurus

One of the reasons I reckon I got into computer programming was to avoid having to write – not code, but actual English prose. I hated it. At school, when I was 16, I had to take exams in both English Language and English Literature and I did phenomenally bad – I got a 'U' in both. For those unfamiliar with the English education system circa 1986 a 'U' means Unclassified. It means I did so badly that they can't even give me a grade – even an 'F', the lowest grade, would be too good for me. Of course I had to take it again as a 'C' grade was mandatory to get anywhere, such as University, and I got it, eventually.

Computer programming on the other hand was great. Back in the '80s I was a bedroom coder bashing out nuggets of assembler on various 8-bit micros. The best thing about assembler was that there weren't many mnemonics to learn and they weren't spelled properly anyway (e.g. MOV, SUB, JP, etc.) or they were acronyms (e.g. LEA – Load Effective Address). If you did misspell anything the assembler would probably pick you up on it right away as the code wouldn't compile (should that be "assemble"?). Transposition-style errors were possible but it was more likely that you got the logic wrong than the "spelling" of the code.

The Writing's On the Wall

Wind the clock forward another 10 years and we're into the realms of Object-Oriented languages and silly limits like 8-character long identifiers have become a relic of the past. The idea of Design Patterns has sprung up to give us a common vocabulary with which to discuss recurring design problems. Also we're now talking about domain modelling so that we can represent both abstract and concrete forms of business concepts in our code as classes and functions to solve much bigger problems. The world I was residing in at that moment in time was still in the territory of "office automation", but the problems we were solving were more abstract as we tried to make sense of how the Internet was adding a technical spin around historically mechanical processes. Suddenly I discover that real language is everywhere in my job.

The wake-up call for me that my weak natural language skills were a genuine problem was seeing how others had started to work around my limitations. In a number of cases one developer had created typedef's (aliases, for non-C++ programmers) as a way of correcting the spelling of some of my class names. Back then there was no decent IntelliSense to guide you, and on a large codebase hitting the compiler only to find that you'd not misspelled a type correctly cost you a non-trivial amount of time. And these classes were in the core library. Another stand-out moment was around my confused use of the words "license" and "licence" such that the noun and the verb were misused badly enough to create configuration and code hell for those that knew the difference.

If that wasn't bad enough working with a colleague who was using Whole Tomato's Visual Assist product, which not only spell checks your comments and string literals, but also your class and method names (including compound words!), was a real eye opener. What I selfishly thought was me just being a bit quirky, perhaps even slightly endearing, was fast becoming an embarrassment. Everywhere I looked now there were little red squiggly lines under what I was writing – be it code or prose.

Turning the Tables

In what could probably be viewed as a poacher-turned-gamekeeper reversal of roles I now find myself on the other side of that fence. And the situation is more unpleasant than I ever suspected. In an even more ironic twist of fate I find myself writing more documentation now that I'm "doing agile" than I ever did before.

It was over 10 years before I even saw a technical or functional spec and I only added a single paragraph to the one I did come across. No, the way I focused my efforts (initially) on writing was to put together lengthy diatribes about the state of the codebase and the lack of good engineering practices being used, i.e. go all passive-aggressive. Whilst this ended up having little-to-no impact on changing the attitudes within the team, it did give me an outlet with which to practice writing proper prose. I coupled this together with putting a developer (rather than support) focused wiki to document some of the more common gotchas, such as how to merge integration branches efficiently in ClearCase, or diff for all changes between two labels. I also started to follow the advice of Record Your Rationale from the book 97 Things Every Software Architect Should Know [1]. My desire to write short, but more importantly clear documentation was also largely prompted by working in a multi-cultural team for the first time.

Naturally when you start paying closer attention to such matters in your own code you also begin to notice those deficiencies in other people's output too. I've had to fix "eclipsing" issues in ClearCase due to random capitalization, e.g. Counterparty vs. CounterParty, missed diagnostic clues due to misspelled words not matching log file greps, and hunting longer in source code repos when doing some software archaeology [2] due to badly written commit messages. Having non-native English speakers working on the codebase too was also an eye opener as I struggled to explain why some of the code they wrote just didn't read particularly well. Up to that point I hadn't realised it was even possible to write "grammatically incorrect" code and I felt more than a little churlish about bringing it up.

And so finally we come to two very old fashioned "tools" that I've since grown to depend on...

The Dictionary

Once upon a time a dictionary was a cutting-edge feature, even for a word processor or desktop publisher, and a stand-alone dictionary application would cost serious money. These days they are often built into FOSS tools like Notepad++, and the commit dialog for the TortoiseXxx VCS extensions has one too so there is no excuse for misspelling a commit message unless your language is unsupported. Chrome was the first browser I used which had dictionary support for text boxes that made writing blobs of text like blog post comments easier as you no longer had to paste the text into Notepad++ or Word just run the spell checker over it.

I mentioned earlier that spell checking wasn't restricted to just prose either, it could also be performed on code. And not just comments and string literals, but also on class, method and variable names too by tools like Visual Assist. In the intervening years I've seen a few free plug-ins to the popular IDE's that will handle comments and string literals, but sadly they won't have a stab at identifiers which I think is a shame. Perhaps too many developers still write code with overly terse names? For code running on the .Net platform the FxCop tool is one of the few exceptions, it also allows you to add your own domain-specific acronyms and terms to a custom dictionary to minimise the false positives.

As for spell checking files using a command-line based tool there is Ispell, Aspell and more recently Hunspell. Whilst these are suitable for checking prose, I'm only aware of Aspell as having support for checking code (and even then only C/C++ comments and literals).

When it comes to checking the spelling of a word on my smartphone, which has a reasonable but not extensive dictionary, I end up reaching for Google. By default just searching for a single word will suggest an alternative if spelled incorrectly and the first hit is usually the definition, although you can Google "define <word>" to ensure the first hit is the definition. However that always feels like the proverbial "sledgehammer to crack a nut" but I'm too much of a cheapskate to cough up for a decent dictionary app (assuming I can find room on the device for it but that's another story).

The Thesaurus

My wife's English teacher once told her never to use the words get, put or nice. In his opinion there are so many better synonyms in the English language that a writer should never feel the need to use them. From a software development perspective I can't say the word "nice" has been much of a problem but the other two seem to crop up with alarming regularity.

When it comes to naming properties of a class (in programming languages where properties are not a first class concept) you could be forgiven for thinking that the getValue/setValue pair is mandatory. As a consequence of this there is also a school of thought that any method which is prefixed with "get" is therefore a property and so likely comes with a similar performance guarantee too.

In the online thesaurus I looked at there were pages and pages of synonyms and related words for "get". Off the top of my head I can think of a number of really common alternatives that I use regularly, such as: create, make, build, acquire, fetch, locate, find, retrieve, request, derive and calculate. You can probably already see a few patterns here as the initial ones revolve around the creation of objects, the middle few with looking up things and the final couple for doing some form of processing.

Although there are no formal guidelines about what semantics any of these words might suggest (despite at least one attempt by yours truly [3]) I would hope that they provide at least a little more insight than the weaker "get". For example the words fetch, request and retrieve all hint at some form of more complex operation than simply returning the value of a member variable. Hopefully the level of complexity hinted also suggests that failure is probably on the cards and so may need to be factored in.

Whilst code may not be prose, little variation in the verbs used in method names makes code read very monotonously. Test names that follow a classic given/when/then format will often also end up being very mechanical in nature. You may have a bit more leeway with test names, but that doesn't absolve you of being economical with the language to succinctly convey the behaviour under test.

These days a thesaurus is always within easy reach if you have an internet connection and it only takes a few moments to search and find something appropriate for the task in hand. However, sadly they are far less accessible than a dictionary within text editors and browsers. Hence I often have a copy of Word lying around in the background so that I can switch to it, type the word I'm looking to replace and hit shift-F7 to get a bunch of alternatives within seconds. Being a native application it also takes up far less resources than another Chrome tab.

Epilogue

My journey as a programmer took an unexpected turn just over a decade ago as I finally found myself unable to avoid the similarities between writing readable code and prose. Instead of being a chore I have actually found the experience of "raising my game" quite liberating. In fact where once I found the idea of learning about natural languages quite unappealing, I now find the subject far more attractive exactly because there are many parallels with programming. And whilst I may not write code, tests or documentation to rival a bestselling novel, I hope that the extra attention to detail adds precision and clarity that ultimately benefits the reader.

References

[1] http://97things.oreilly.com/wiki/index.php/Record_your_rationale
[2] In The Toolbox - Software Archaeology, C Vu 26-1
[3] http://chrisoldwood.blogspot.co.uk/2009/11/standard-method-name-verb-semantics.html

Chris Oldwood
10 June 2015

Biography

Chris is a freelance developer who started out as a bedroom coder in the 80’s writing assembler on 8-bit micros; these days it’s C++ and C#. He also commentates on the Godmanchester duck race and can be contacted via gort@cix.co.uk or @chrisoldwood.