In The Toolbox – Wrapper Scripts

At the ACCU conference this year Pete Goodliffe hosted a session titled “Becoming a Better Programmer”. Part of it involved a number of people (that Pete had invited) spending a few minutes describing what they believe has helped make them a better programmer. One such person was the editor of this very journal – Steve Love – who picked Automation as his topic. If you read his editorial from a few issues back you’ll know that, like Pete and Steve, I too prefer to simplify things when I find myself doing the same task over and over. The full subjects of both automation and scripting are huge in themselves, but there is a particular intersection that at first might seem almost trivial and yet can quickly grow into something more useful – wrapper scripts.

Simplifying Existing Tools

If you’ve ever worked with SQL Server from the command line you’ll have come across the bundled SQLCMD.EXE tool. This, along with its forerunner OSQL.EXE, is the traditional tool for executing SQL statements (and script files) against a SQL Server instance. Like many mature command line tools it’s a bit of a Swiss-Army knife and has sprouted a myriad of options and switches to control how you feed SQL text into it and how the results and errors are handled after execution. For example the following command will fetch the current time on a local instance using the current user credentials:-

C:\> SQLCMD –E –S .\SQLEXPRESS –d master–b –m 10 –Q “select getdate();”

The –b and –m switches are technically unnecessary when running interactively, but the moment you start running SQL batches from scripts you’ll likely add them if you want anything out of the ordinary to cause execution to stop and SQLCMD to return an error code. Then there is the annoyance factor of just getting the command line slightly wrong. If you forget the –E you’ll get a weird login failure, or if you use –q instead of –Q it won’t terminate after executing the SQL. Case-sensitive command line tools do nothing to help ensure a calm, quiet working environment either.

One tried and trusted solution to these “problems” is to turn to the venerable Wiki and document lots of command lines as snippets that you can just cut-and-paste directly into your shell. Anyone who has ever tried that from a Word document where Word has been “smart” and converted the simple dashes to “smart” dashes will know how fraught with danger this idea is. That’s before you consider what happens when the wiki becomes unavailable (and I can guarantee a development server has an SLA measured in months) or you begin to appreciate the shear tediousness of what it is you’re actually doing.

Another more personal alternative is to use your shell or some 3^rd party tool to create little macros for your favourite commands. However, I feel this is a bit like writing your own tests – if it’s good enough for you, then why not the rest of the team? After all, when a new team member joins they’re probably going to have to go through the same process. So, you can give them a leg-up by storing a set of project-specific pre-canned scripts that helps with the most common scenarios.

In a sense it’s a bit like Partial Function Application because, whereas a tool like SQLCMD has to allow for the different modes of operation in general, your development environment will almost certainly be far more specific. This means you can exploit that knowledge by first cutting down on any superfluous arguments. The command below, for example, creates a new database on a local instance:-

C:\> SQLCMD –E –S .\SQLEXPRESS –d master –Q “create database MyTestDb;”

The only two real variables in the command are the instance (.\SQLEXPRESS) and the database name (MyTestDb), the rest is boilerplate stuff. So, let’s create a batch file that accepts just those two parameters, and more importantly has a memorable name (CreateDatabase.cmd):-

@SQLCMD –E –S %1 –d master –Q “create database %2;”

Now we can just use this simple script to create a test database in future:-

C:\> CreateDatabase .\SQLEXPRESS MyTestDb

Error Handling

Of course just like any other code we write we have to consider the handling of errors and so we should add a sprinkling of that too – first to check we have the required number of arguments, then to pass back any error returned by the actual tool:-

@echo off

if/i "%1" == "" call :usage & exit /b 1

if/i "%2" == "" call :usage & exit /b 1

SQLCMD –E –S %1 –d master –Q “create database %2;”

iferrorlevel 1 exit /b %errorlevel%

:usage

echoUsage: %~n0 ^<instance^> ^<database^>

goto :eof

If you have bothered to write a usage message, then you could also choose to add a couple of extra lines to provide a consistent and modern way to query it (although this somewhat starts to defeat the original purpose of the exercise – simplification!):-

if/i "%1" == "-?" call :usage & exit /b 0

if/i "%1" == "--help" call :usage & exit /b 0

Now, at this point I’m sure you’re beginning to question whether the script is starting to get so complicated that you’ll be spending more time writing it than what you (and hopefully your colleagues) will eventually save by using it. To decide that I suggest you consult the recent XKCD titled “Is It Worth the Time?” [1].

However it should also be fairly obvious that this is just boilerplate code that can be copied from a template. One such template (for batch files) can be found in my blog post “Windows Batch File Template” [2]. Consequently I’ve found putting these kind of scripts together quite trivial and it also allows for some other common simple scenarios to be easily accommodated.

Personalisation

As a rule of thumb the point of these sorts of scripts is to allow a more unified development experience through the use of some common tools. Also the development environment should allow for the redundancy to be hidden or removed as we saw above. But there are times when each developer (and the build machine itself counts as another developer) needs to provide some personal configuration data.

Once again the database example fits the bill nicely. In many organisations you can install a database locally, but not everywhere allows this (you can thank an overreaction to the 2003 Slammer virus for that). Although not ideal, you may also have to deal with minor differences in development machine configuration such as where the instance is installed (a fresh install and an upgrade can be different). For example nearly every developer may have a local copy of SQL Server Express and be quite happy calling their unit test database “UnitTest”, in which case you might as well save them a bit more typing and just default to those values:-

setinstance=%1

setdatabase=%2

if/i “%instance%” == “” set instance=.\SQLEXPRESS

if/i “%database%” == “” set database=UnitTest

An alternative approach is to use optional environment variables to specify the default values so that you never need to pas any arguments if you decide to tow the party line:-

setinstance=%personal_instance%

setdatabase=%personal_database%

if/i “%instance%” == “” set instance=%1

if/i “%database%” == “” set database=%2

if/i “%instance%” == “” call :usage & exit /b 0

if/i “%database%” == “” call :usage & exit /b 0

Ultimate flexibility comes from combining the two approaches so that you can keep things really simple for the vast majority of use cases, but you still have the ability to override things at any point by supplying different values on the command line. In my experience database development is the one place where this has actually proved to be quite useful.

Developer Sandbox Set-up

If you’re going to allow (or need to accommodate) some element of personalised configuration then make sure it is optional. There is nothing more soul-destroying on your first day of a new job than being presented with a 30-page manual on what steps you need to take to configure your development machine so that you can actually do your job. And if you think I’m exaggerating by recounting a 30-page document – I’m not!

For me the steps required to get up and running on a new project should be as simple as:-

Install version control software
Fetch development branch codebase
Build code & run tests

Step 3 might look like it should be split into two, but it’s not because that’s exactly what the build machine will be doing and so whatever it does I should be able to, too. At this point I then know that I can replicate what the build machine does and so I’m good to go.

Script Composition

After you’ve created a few simple scripts it then becomes easier to see how you can combine them with other simple scripts to accomplish ever bigger tasks. Although it was never envisaged it would end up that way much of the build processes I’ve been involved with in the past few years have ended up taking an imperative approach mainly due to the incremental approach of layering together many simple scripts to create a more complex process. The same goes for the deployment side as well.

For example the deployment process started out as two simple scripts that wrapped invoking MSIEXEC.EXE – one to install the package and one to uninstall it. The wrappers allowed me to handle the common error codes (e.g. 1605) that occur when the package is already installed/uninstalled. Once the NT services were written another simple wrapper around SC.EXE was created to start/stop them. These where then combined into a larger script that ensured the services were started/stopped before/after the packages installed/uninstalled. Add in another simple script to ensure any errant processes are terminated (PSKILL.EXE), another to copy the files from a known share to the local machine (ROBOCOPY.EXE) and finally a top-level script to act as the root for a Scheduled Task and you suddenly have an automated deployment process that even Heath Robinson would be proud of.

Scripting Objects

On Windows there is a slight variation on this theme of driving command line tools using a very basic language; which is to drive “objects” instead. If you consider the batch file language as the glue that binds together disparate console-style processes, then VBScript is the same simple glue that binds together objects exposed via COM. It might seem an expensive way to do business, but if your architecture is that classic pairing of a Visual Basic front-end and a C++ back-end then you’ve already done most of the heavy lifting. It might sound perverse but I’ve worked on a project where the entire build process was a library of VBScript files that were stitched together by using Windows Script Host (.WSF) files. It’s not something a sane person would consider doing today but 10-15 years ago it was Microsoft’s answer to the lack of scripting on Windows. That said, in a locked down production environment with legacy servers it might still be your only choice.

In a way that ideology still exists today in the guise of PowerShell. COM has been replaced by its modern heir - .Net - and the PowerShell language provides a much cleaner binding because .Net itself underpins it. Of course the pipeline model still exists too although it’s been “enhanced” to flow objects instead of simple lines of text. Once again, if your core technology is already .Net you’ve done the heavy lifting and consuming your components via scripts is pretty easy. PowerShell may be Microsoft’s first choice, but the model works equally well with both F# and IronPython in the driving seat, although the latter seems to be sorely neglected these days.

Capturing Pipelines

Of course none of this is going to replace the venerable UNIX pipeline which, despite its simple text based nature, lives on exactly because it’s simple and an army of programmers have created a wealth of small, well-focused tools that are designed to be composed to create something bigger than the sum of its parts.

Oftentimes I’ll need to do a little text processing and it ends up being a disposable one-liner. But other times I realise it actually might be useful to my team mates. There is a certain level of personal gratification in publishing your Byzantine one-liners to your fellow developers and if they only see the light of day once in a blue moon then the XKCD chart [1] rules. But if you think it’ll get a frequent work out then you might want to consider encapsulating it within a script for ease of use.

A few years ago I started work on a new project that sprouted the need to process line-based text files at every turn. This caused me to reacquaint myself with SED & AWK as part of a rediscovery of the classic pipeline. In fact you’ll find my delight documented in this very journal as part of the Inspirational (P)articles series [3].

In one particular case after the system had gone live I started needing to compare some published CSV files between the development, test and production environments as part of the testing strategy. The files were not directly comparable so a little pre-processing was needed to first remove the noise. Old hands will no doubt recognise that a sprinkling of SED is one way to replace some variable string patterns with fixed (and therefore comparable) text:-

SED “s/Timed-Out/ERROR/g” | SED “s/OutOfMemory/ERROR/g”

Due to the non-deterministic nature of a SQL SELECT without an ORDER BY clause (which would have been an unnecessary burden on SQL Server) the file needed to be sorted. The best key to sort on was not the leftmost in the file which meant treating the file as having delimited fields:-

SORT –t “|” –k 3,4

Finally, every time the file had new fields appended in a release they needed to be ignored when doing regression testing:-

CUT –d “|” –f 1-24

The output was always fed directly to a GUI based diff tool in case there were differences to investigate and so structurally the script looked like this:-

SED %1 … | SED … | SED … | SORT … | CUT … > “%TEMP%\lhs.csv”

SED %2 … | SED … | SED … | SORT … | CUT … > “%TEMP%\rhs.csv”

GUIDIFF “%TEMP%\lhs.csv” “%TEMP%\rhs.csv”

There was no reason (apart from the usual lack of time and a suitably privileged account) why that last step couldn’t have used the normal DIFF tool and been automated to capture the differences in a report every morning.

Gall’s Law [4] says that a complex system that works is invariably found to have evolved from a simple system that worked. Build, test and deployment processes in particular seem to have a habit of growing organically and judicious use of little scripts can be one way of slowly piecing together functionality by building on the existing set of tried and trusted tools.

[1] http://xkcd.com/1205

[2] http://chrisoldwood.blogspot.co.uk/2010/06/windows-batch-file-template.html

[3] C Vu Journal Vol 23 #1 (March 2011) & http://chrisoldwood.blogspot.co.uk/2010/11/reacquainting-myself-with-sed-awk.html

[4] http://en.wikipedia.org/wiki/Gall's_law

Chris Oldwood

13 June 2013

Bio

Chris is a freelance developer who started out as a bedroom coder in the 80’s writing assembler on 8-bit micros. These days it’s C++ and C# on Windows in big plush corporate offices. He is also the commentator for the Godmanchester Gala Day Duck Race and can be contacted via gort@cix.co.uk or @chrisoldwood.