Reacquainting Myself with Sed & Awk

(ACCU – Inspirational Particles)

 

So, I’m sitting there slouched in my chair starring at the Visual Studio text editor whilst it does a rather lengthy, but albeit simple transformation of a text file and I’m coming to the realisation that this is no longer just a one-off event. In fact as I think harder I realise that during the last few weeks I’ve had to generate a number of long simple SQL scripts by transforming the CSV format result set from SSMS[1] back into a bunch of INSERT statements. It’s also suddenly become noticeable that the tool I have historically been using for these simple tasks is no longer the svelte and nimble text editor that was VC6 (nay VS98), but has instead morphed into the bloated and sluggish VC9 (VS2008). I’ve read many complaints in the past about its increasing slothfulness but brushed them aside as they didn’t really resonate with me; but when a tool asks you if it can disable its own Undo feature to improve performance you know you’re in trouble…

 

I haven’t been paid by-the-hour since before the millennium so just sitting and waiting for it to finish was no longer an option. And then I started to hear the sniggers in the background as legions of smug Unix programmers chortle to themselves and flick through their tool chest whispering “Vi?”, “Emacs?”, “Sed?”, “Awk?”… And then I heard a chorus of disapproval as every other programmer joined in with a round of “Use the right tool for the job – stupid”. The problem was that I haven’t used any of those tools since leaving university nearly 20 years ago and my previous experience of Unix ports under DOS and 16-bit Windows was not favourable…

 

Still, I put my metaphorical spade down (actually I left it working away in the background as I thought I might as well make use of my multi-tasking OS) and go in search of a proper Windows port of Sed and/or Awk. I mean one that respects Windows conventions like having spaces in filenames, using backslashes in paths and understanding that it has to process wildcards passed on the command line. I also didn’t want to have to install and run any kind of emulator; as Raymond Chen once said “If the solution starts with ‘First Install Product X’ then now you have two problems”. Fortunately I think I’ve found a suitable candidate in the shape of the UnxUtils project on good old SourceForge[2]. Of course Visual Studio still finished before I had unpacked the .zip file, copied the files to a folder and updated my PATH, but that’s not the point – I now had the tools ready for action.

 

I didn’t have to wait too long either as a few weeks later I needed to work out how much disk space a certain type of file was consuming in our one of our data stores. This was an awkward one as I kind of knew how to do it in PowerShell (which I’m also learning and has far more relevance for sysadmin work on Windows), but I was keen to spend a short time with Awk as I remembered that it could ‘do sums’ as well as transform text. A quick Google later and I had a one liner that did exactly what I wanted (albeit after some gnashing of teeth because ‘END’ has to be in upper case it seems).

 

Somewhat chuffed with myself at that point I decided that if I was going to make a real go if it then I should seek out some paper based tutorial so that I could harness (some) of the power of these two little gems. Mere seconds on Amazon unearthed “Sed & Awk” (2nd Edition) by Dougherty & Robbins with a second hand copy listed for just a handful of British Pounds. I also couldn’t resist the companion O’Reilly Pocket Reference either for a couple of quid. So a few clicks later and I’m done. The Dougherty & Robbins book was the perfect introduction for me (ok, I’ll stop there and take the opportunity to write up a formal review to keep Mr Higgins off my back…) and many happy memories of my time at university came flooding back as I remembered some of the things these tools do well.

 

A few months later I then had one of those glorious moments where I had a problem to solve and I knew exactly which would be the best tool for the job. I needed to extract some values from a file which was formatted somewhat like an XML data document with one element per line and a key value pair as an attribute. A simple GREP wouldn’t do it as the key/value pair occurred in a number of different contexts – what I needed to do was restrict the GREP to certain repeating portions of a file. And before you could say “Address Range” I had already typed the command line and was exhibiting a grin so large it would make the Cheshire Cat look miserable. It seems this old dog can still be taught new tricks, but also very old ones too…

 

[1] SQL Server Management Studio

[2] http://unxutils.sourceforge.net

 

Chris Oldwood

31/01/2011

 

Bio

Chris started out as a bedroom coder in the 80s, writing assembler on 8-bit micros. These days it’s C++ and C# on Windows in big plush corporate offices. He is also the commentator for the Godmanchester Gala Day Duck Race and can be contacted via gort@cix.co.uk.