In The Toolbox – The Developer’s Sandbox


We often talk about developers working in “isolation” or use the term “sandbox” to describe an environment that cuts ourselves off from the outside world. But what exactly do we mean by these terms – what is on the inside and what is on the outside? There are often many different sandboxes in the development process too, for example the build pipeline and the various test environments. Some of these are “larger” than others, where size could be measured in terms of the number of processes, machines and network paths collaborating together.


The aim of this article is to look at various different sizes of development-level sandboxes and explain what problems I’ve commonly encountered when using them. As will become apparent later you’ll see what I personally consider is the ideal sandbox for day-to-day, user-story development work. That last qualification is important because when I’m doing support or localised system-level testing I probably need to loosen the sandbox constraints to bring in some external dependencies, but ideally under tightly controlled conditions [1].

Component/Integration/Acceptance Tests

To be doubly clear this is not about unit tests – it’s about developing and running component, integration and acceptance level tests. Unit tests naturally have no dependencies but once we have run those and our confidence starts to build it’s nice to start bringing in fast running tests that talk to real dependencies to start gaining further confidence that all the units have been assembled correctly and are still working together as intended.


The kinds of services I’ve been developing in recent years have been developed outside-in, starting with a failing acceptance test and then moving inside the service, sometimes leading to writing a combination of integration, component and/or unit tests before “unwinding the stack”. Whilst they often have a real (e.g. out-of-process) database and messaging service in play for the acceptance tests they all still run to completion within a couple of minutes. Even so they exercise the majority of code paths which leads to high degree of confidence in the functional aspects of any change before it is committed by the developer.


Hence this article is about where those “heavier” 3rd party dependencies live and how we can cope with the potential disruptions they have a habit of producing. Once we leave the simplicity and beauty of the in-memory sandbox that unit testing provides we enter a realm where side-effects can become persistent. If we’re not careful we start chasing our own tails due to test failures outside our control, or worse, start to ignore test failures we believe our changes could never have caused; that is the start of a very slippery slope.

Network-Level Sandbox

At the extreme our degree of isolation might only be to avoid us affecting the production system. More typically we create named environments, such as DEV and UAT, which are usually partitioned on a (virtual) machine-wide basis.


From a developer’s perspective this kind of environment means there is some form of shared infrastructure in use, such as a database, file share or message bus. Once we have any form of shared resource we start to bring in the possibility of noise to the development process, which, as mentioned earlier can manifest itself as test failures outside our control. Nothing kills a state of “flow” more readily than an unexpected test failure and shared resources increase the likelihood of that happening.


When any test fails my immediate reaction is that it’s my fault – I’m always guilty until proven innocent. Not every line of code (production or test) I come across is easy to reason about and so I have to assume the worst. Tolerating transient test failures just leads to distrust and an eventual “blindness” whereby the test provides no value because its failure is ignored. Eventually someone will get fed up and just comment the test out altogether or add the “ignore” attribute so that it never runs. Now the test is just an illusion that buys us false hope.


Years ago the need to share infrastructure was borne out of cost – not every developer could afford to have an instance of SQL Server or Oracle on their machine due to the high licensing costs. In today’s world there are “developer” editions of the big iron products which help to alleviate this cost. The NOSQL databases are usually free and only come with the kind of limitations a developer would never breach as part of their normal development cycle anyway. The same goes for web servers and message queuing products too. What is more likely to make this set-up unusable is either a draconian usage policy [2], where you have no rights to install anything, or the machine is woefully underpowered and couldn’t take the extra strain of additional services in tandem.


Putting aside these reasons why you might have to suffer the sharing of services another problem they create is that it can make remote working even more painful. If the organisation does not provide a VPN or some form of remote desktop then you cannot easily work outside the confines of the office. Even with a decent broadband connection I’ve seen test suites take an order of magnitude longer to run because of the latency that starts to dominate on all the underlying remote service connections created during the test runs. In some cases the firewall covering the VPN may only be configured for the “standard” network traffic (think SharePoint) and so you might be blocked from accessing your modern NOSQL database due to its use of unusual port numbers. As for working on the train during your daily commute or business trip, that would just be a non-starter.

Partitioning Data

The usual technique for dealing with shared infrastructure is to partition either at the schema-level or data-level. For databases you can have your own named database within an instance and for message queues the queue name could encompass some derivable prefix or suffix, such as your login or developer machine. These values work nicely for an out-of-the-box configuration option but as we shall see later being able to easily override them, such as through a defaulted environment variable, is often desirable anyway [1].


At least with this kind of partitioning you are only really sharing the service itself with other developers. Hopefully the product is reliable, which is presumably why you picked it, the data volumes are low, and so the chances of failure are mostly down to hardware issues of some description. The use of virtual servers would make this kind of problem largely a thing of the past if it wasn’t for all the bureaucracy that can be required to get the VM up and running again on another host. The security patching cycle that goes on every month has also been known to take down a crucial development server or two in the past so it’s not all plain sailing.


The second option is to partition at the data level. This usually involves adding prefixes and/or suffixes to the data in such a way that you can uniquely identify your own test data to distinguish it from your colleague’s. This can be a useful technique but it starts to have an impact on both your production and test code as your design inherently has to acquire the ability to pre and post-process data down in the stack. If you’re lucky this will already be a required part of the design and the application of some classic design patterns, such as Decorator, will minimise the impact. If not you’re adding complexity to the code which might be avoidable via other means. Granted it may not be a huge leap in complexity, but it is still something which differs (code wise) between what you run in development and what is actually running in production.


Due to the potentially low-level nature of trying to work this way, it also affects the way you write your tests. Now you have to be mindful of other developers and so can’t just truncate a data table or purge a message queue. Instead you have to delete only your own data from a table or carefully drain the message queue without disturbing the order of other messages. I’m not even sure the latter is entirely possible (I’ve only seen teams take the hit on the disruption when tests are run concurrently, i.e. just keep running them until they do pass).


Sometimes you are not in complete control of all the data which is generated – database identity columns are a case in point. If you rely on them to generate the IDs for your entities you either have to write a different query to identify your data or make sure you capture the identity values in some way so you can refer to them later. Identity columns might not partake in transactions so you can’t assume that N inserts will result in rows with N consecutive identity values when run concurrently with other tests.


Aside from server failures the other big disruption to using shared relational databases with data-level partitioning is schema changes. If another developer needs to change the schema or some shared code, such as a stored procedure, it affects everyone. This also implies that the database must always be running the latest code – reverting the database back to match the current production schema is out of the question without interfering with the rest of the team. Whilst NOSQL databases are inherently schema-less this does not mean schema problems do not happen. On a fast moving codebase with heavy refactoring the schema may be changing rapidly without formally bumping any internal “schema version number” such that breaks in serialization occur. They should be infrequent, but it’s important to be aware that it’s still possible when sharing infrastructure.

Machine-Level Sandbox

Being able to work isolated from the rest of the team (infrastructure wise) is a useful step up. Once you remove what’s going on around you from the potential sources of noise you then only have yourself to blame when something goes wrong; although maybe a group policy update or security patch will still catch you unawares every now and then.


Being master of your own castle gives you the power to play and tinker to your heart’s content without the fear of disrupting your entire team. Want to restart your database, IIS, or the messaging service to blow away the cobwebs? No problem, just do it as there’s no need to coordinate this kind of activity with everyone else.


This doesn’t mean that your setup can be entirely ad-hoc though. Although you can assume that the service name will be “localhost” in any configuration it helps if developers stick to a consistent set of port numbers, etc. so that the default configuration stored in the version control system should just work on any developer’s machine. This is especially useful for getting a new joiner up and running quickly.

Shared Local Services

Exactly because everything is running locally on every developer’s machine you don’t have to play games with the names of databases or message queues as you can all use the same name. The ability to easily configure such settings is useful for other test environments or scenarios, but for the common case – developing user stories on the default integration branch (e.g. trunk) – it should just work as is.


Whilst this set-up is more stable than having to share infrastructure it does not come without it a few of its own problems. Sometimes you might feel like a part-time system administrator, which is A Good Thing from a DevOps perspective, but can be a distraction when you really just want to get your story finished. For example IIS and IIS Express play games in the background that can leave your test failing, despite you putting in the working implementation, only to find it was still using the old, cached failing implementation. User rights are another area of contention as services tend to run under privileged accounts which makes debugging harder. Being forced to develop as an administrator is not a good habit and is probably what makes some companies nervous enough to disallow any local administrative rights altogether [2].


Not sharing services also implies that there are many more copies of that service around, which means that there will likely be different variations as each developer upgrades his or her machine at different times. Whilst “it works on my machine and the build server” is great for your confidence, you might still end up helping out the one colleague who can’t get it working on their machine because of some weird issue related to them having a different version of a product or driver. Of course this kind of problem can be easily mitigated by having some notes on a wiki or a set of scripts that ensure everyone gets to install the same version at each upgrade. This also ensures that those who may be inclined to always get the latest and greatest beta directly from the vendor are also in sync.

Internal Services

3rd party products are, or should be, fairly stable. Aside from bug fixes one doesn’t tend to switch major versions of a database on a whim so the problems cited above are probably a little over dramatic. Where this problem can start to creep in is with internal (i.e. the business’s) services which you’d hope would evolve more rapidly as the company’s priorities change. If you work on a team that is responsible for developing a number of small, independently deployable services, then you will have to decide how much coupling you want to take-on to balance detecting API breaks early versus reducing disruption to others though API breaks unrelated to what they are working on.


When the team develops a number of services that are dependent on one another, even if they’re independently deployed, there is a temptation to either include them all in the same solution, or try and reuse them for integration and acceptance testing. Doing this in the build pipeline makes perfect sense as you are usually interested in building layers of trust by integrating more services to ensure they still work together. However, I’m not convinced that this set-up is quite so desirable on a developer’s workstation.


If you’re working on the interface between the two services then you’re probably interested in those problems showing up. The majority of the time however is unlikely to be spent that way and other features will should not be directly dependent either as that would be a sign of unnecessarily tight coupling. If those services are not mocked locally within the solution, then, any time you integrate upstream changes for the solution you’re working on, you need to update the dependent solutions too, lest you run the chance of an “impedance mismatch” occurring when running unrelated tests.


Part of the motivation for moving to much smaller services with more well defined contracts is to decouple their evolutions and that has a knock-on effect into the amount of isolation you can afford to take on. This is especially true when you have a build pipeline that can easily verify each service in isolation and then deploy the services together and verify their respective contracts are being honoured by their consumers.

Source Folder Sandbox

One of the key concepts that the rise in Functional Programming is bringing to the forefront is the notion of side-effects, and that is exactly what makes testing noisier than it should be. Most of the problems mentioned above are the result of side-effects – the output of binaries and test run data that leaks outside the (supposedly transient) test environment. This leaked state then pollutes further test runs for ourselves and potentially others until we can revert back to a known good state.


Whilst isolation at the machine-level generally provides us with our biggest bang-for-buck it’s possible to strive to eliminate even the use of shared local services so that each source folder becomes an independent test environment. Some of the systems I’ve worked on have been able to have two source folders of two different revisions of the same product be built and tested concurrently. This then easily leads to an ability to run them side-by-side too which has been very useful for investigating unintended differences not picked up by the automated tests.


What makes this finer-grained sandbox achievable is the ability to host more of the dependent services in-process, or if out-of-process, then it should be configured, started and managed by the test suite itself rather than being a classic machine-wide daemon. For databases this could be done using an in-process version like SQL Server Compact Edition or SQLite [3], or an out-of-process mock such as CouchbaseMock [4]. With .Net based web APIs the newer OWIN stack is designed to support hosting multiple APIs in the same process, albeit under different app domains. Even the traditional TIBCO messaging service can be started on-the-fly.


Naturally this implies that for a distributed system you are not running exactly the same versions of the those services, but the point of the exercise is not to replicate the production environment, but to trade-off ease of development with the ability to pick up fundamental bugs caused by simple integration mistakes. It’s always easier to debug on your local machine as you have your entire toolbox at your disposal.


Running two codelines [5] side-by-side where there are services that listen on network ports likely won’t work out-of-the-box as it’s often easier to fix the port numbers to create a more deterministic configuration. However, as was mentioned earlier, it helps if key settings such as these can be easily tweaked, perhaps through the use of environment variables, to make it painless to expand the sandbox as needed.


I once started a new job and was presented with a 50-page document describing what I needed to do to set-up my development machine. Admittedly it covered a few other things as well, but even so the process seemed far too convoluted just to get into the code.


Nowadays I’d expect to install the core language toolset and the version control client and pretty much be done, at least, to get started that is. From this meagre configuration I should then be able to create a working copy of the latest source code from the repo, build it and run the core test suites to verify that I’m all set up. At this point I should now be in a position to start making most normal code changes that could be pushed to production via the standard build pipeline. The only scenarios initially out of my reach would require me to know more about a production-like set-up, but that will come in due course.


It’s not always possible to create and develop in the sandbox of your own choosing but where possible I’ve found it well worth the effort to strive to remove as much noise as possible from sources outside my control. The more constrained you make your sandbox the more confident you feel about exploring the codebase and the system without constantly looking over your shoulder to see if any of your meddling has disturbed your team mates.


[1] Testing Drives the Need for Flexible Configuration, Chris Oldwood, Overload 124

[2] Developer Freedom, Chris Oldwood, C Vu 26-1



[5] SCM Patterns by Stephen P. Berczuk with Brad Appleton


Chris Oldwood

12 March 2015



Chris is a freelance developer who started out as a bedroom coder in the 80’s writing assembler on 8-bit micros; these days it’s C++ and C#. He also commentates on the Godmanchester duck race and can be contacted via or @chrisoldwood.