Simplicity through Immutability

Introduction

Some languages, mostly notably C++, provide support for marking instances of objects as immutable, i.e. with the const keyword. Whilst C# supports a notion of const, it only applies to primitive values and is essentially an alias for a literal value. The best you can do with objects is to apply the Single Assignment Pattern [1] to member variables using the readonly keyword. Java’s final keyword has a wider scope than C#’s readonly for enforcing single assignment, but it still falls short of enforcing mutability of the referenced object.

 

Consequently in many languages immutability must be implemented through the design of the type, and as such every instance of that type will be immutable (use of reflection and other Machiavellian techniques notwithstanding). Generally speaking to create immutable objects you only pass any state at creation time through the constructor. Any properties that are exposed must only be readable and immutability must be deep, meaning that any types used in any exposed collections must themselves be immutable too. As it is with turtles [2] it should be immutability all the way down.

 

The most common examples of immutable types are the primitive types, such as integers and strings (at least in C#). But more complex types can easily be made immutable too with just a little thought. The benefits, as you shall hopefully see, are a significant reduction in accidental complexity [3]. Whilst the implementation of the type itself may not contain significantly less lines of code, far fewer tests will be required and significantly less grey matter exercised to drive out the behaviours that matter.

 

The basis for this article is a real class I recently decided to refactor to make it simpler by making it immutable. Hence what follows below was my rationale for doing the refactoring in the first place.

The Mutable Example

Imagine you are writing a service and you have a need to provide a simple lookup to map one value to another. The data changes pretty infrequently, but not so infrequently that you’re happy to hardcode it. You decide to store the mapping data in a simple .CSV file and write a class to load the data and provide the mapping at runtime. This is one possible implementation:

 

public class ThingToWotsitMap

{

  public ThingToWotsitMap()
  {

    _map = new Dictionary<string, string>();

  }

 

  public void LoadMap(string filename)
  {

    _map.Clear();

 

    using (var stream = new StreamReader(filename))
    {

        // Parse .CSV file data and build map

        . . .

    }

  }

 

  public string LookupWotsit(string thing)

  {

    // Map thing to wotsit

  }

 

  private Dictionary<string, string> _map;

}

 

If you are already screaming at the page because you spotted that the code touches the file-system and is therefore difficult to unit test, then have a gold star. But you’ll have to put that to the back of your mind as that is not what this article is about.

So Many Questions

The class is very simple, surely there’s not much to say. Is there? Looking over the unit tests (which were written after) one pattern immediately leaps out – the use of two-phase construction:

 

[Test]

public void one_of_the_tests_for_the_map()
{

  var filename = . . .;

 

  var map = new ThingToWotsitMap();

  map.LoadMap(filename);

 

  Assert.That(. . .);

}

 

The solution is simple. All we need to do is to add a constructor that takes a filename and then delegates to LoadMap() for the heavy lifting, right?

 

That’s not the aspect of two-phase construction that bothered me; it was the fact that there was two-phase construction even to begin with. Adding another constructor does not take away the ability to mutate the class and it’s that mutability that starts to raise a number of questions about the behaviour of the class.

 

The first question I have is: what happens if the client doesn’t even call LoadMap()? Does the class behave correctly if no data is loaded? This immediately leads to the question about LoadMap() throwing an exception. Is the internal state consistent if that occurs, and what happens if the client discards the error? We are back to the first question again.

 

So off we go and merrily find out the answers to these questions and write a bunch more tests to make sure that the class is exception-safe and also behaves “safely” should something happen during its (two-phase) construction.

Thread Safety

Time passes and our semi-static data becomes not quite so static. We decide that restarting the service every time we need to change the data is untenable and would like to detect when the file changes and load it again automatically at some convenient moment.

 

So we add a bit of code externally to the class to watch for when the file changes and then just call LoadMap() to load the new data file.

 

But wait. Have we got any tests for calling LoadMap() on the class a second time? What happens if this load fails – is the internal structure still consistent, can we limp along with the old data or should it behave as if freshly constructed and therefore no data was loaded?

 

And all before we even get to questions about thread safety. This is a multi-threaded service, is this class thread safe? For example, what happens if one thread tries to lookup data whilst another is calling LoadMap() to refresh the data? Should existing callers be blocked and wait for the new data or continue with the old data until the new set is fully loaded?

 

Assuming you can answer all those questions, how confident are you in your ability to write thread-safe, exception-safe code? How confident are you that whatever techniques you’ve chosen to use will continue to be thread-safe as you move to new platforms in the future? Do you have adequate test coverage and/or documentation to show that you’ve even considered all these scenarios?

Refactoring to Immutability

Let’s wind the clock back to the beginning where we noticed the two-phase construction and see if we can take a different path that doesn’t involve us writing so much production and test code, and lead to so many tricky questions too.

 

My personal preference is to create immutable types where possible. Essentially a type should start immutable by default and prove that adding mutability will provide some significant advantage – performance perhaps – that could not be obtained via immutability. There are some very obvious cases, such as the Builder pattern [5], where mutability is required up front, but that can often be as a stepping stone to an immutable form of the same type.

 

The changes I made to the example code above were fairly minor. Naturally there are many other changes we could make and so the result below is not intended to be the final word on the matter, it is only intended to show the minor transformations that were needed to achieve immutability.

 

The first change I made was to make the constructor private and to take the underlying container as a constructor argument:

 

private ThingToWotsitMap(Dictionary<string, string> map)
{

  _map = map;

}

 

The second step was to make the LoadMap() method static and change it to return an instance of the class; essentially turning it into a Factory Method [6]:

 

public ThingToWotsitMap LoadMap(string filename)
{

  Dictionary<string, string> map = new Dictionary<string, string>();

 

  using (var stream = new StreamReader(filename))
  {

      // Parse .CSV file data and build map

      . . .

  }

 

  return new ThingToWotsitMap(map);

}

 

And that’s it really. One final step was to fix up the tests and production code to use this new factory method instead of the previous two-phase construction approach (I could lean heavily on the compiler here due to the nature of the refactoring):

 

[Test]

public void one_of_the_tests_for_the_map()
{

  var filename = . . .;

 

  var map = ThingToWotsitMap.LoadMap(filename);

 

  Assert.That(. . .);

}

Revisiting Those Questions

With our new design in place let’s revisit those earlier concerns and see how it stacks up. The first question around the behaviour of a default initialised object is moot because you cannot create one – you have to provide a filename. Likewise the second question around the exception safety of the LoadMap() method is also moot because if an exception is thrown you will not have a fully constructed object to worry about. Essentially the whole issue of state corruption is moot because the point of immutable types is that you can’t change them.

 

So far so good, but what about our latter change in requirements where we need to reload the data on-the-fly when it changes whilst multiple threads might be accessing it? This is still largely a moot point because the type is immutable – you cannot change it, you can only create a different object with the new data in it.

 

From the perspective of the type itself there are no thread-safety issues, but that does not mean there are no thread-safety issues at all. Instead of putting all the effort into making the type internally thread-safe we have pushed the problem up to the owner, but their problem is almost trivial by comparison; the owner just needs to switch ownership of the object in a thread safe manner, e.g.

 

var newMap = ThingToWotsitMap.LoadMap(filename);

 

// This assignment needs to be thread-safe

_currentMap = newMap;

 

In certain environments a write of a reference-sized value is already an atomic operation and so out-of-the-box this could already be enough, depending on how it is used. It is more likely that you’ll mark the member as “volatile” to introduce the relevant memory barrier and to ensure that the value is not aggressively cached. There shouldn’t be a need to use a heavyweight synchronization object like a mutex in this example as it’s just a single reference, but if you have to switch multiple references atomically it might be required.

 

Thinking about the performance of these two approaches they should be fairly similar, with the potential for the immutable version to win on the basis of needing less synchronization. Memory-wise reloading the data should be similar in both cases too, i.e. having two copies in memory at the point just before the switchover. You could choose to empty the internal container first in the mutable case before loading the file, but then you’d have to sacrifice exception safety which feels like a decision that needs serious consideration. On the face of it we don’t appear to have lost anything with our move to immutability.

Interlude: C# Initializer Syntax

Sadly there are some programming constructs that make the design of immutable types less alluring; one is the Initializer Syntax for objects and collections in C#. Both of these rely on the type being mutable, with state being mutated through properties for the former case and an Add() method for the latter.

 

Here is an example of the object initializer syntax:

 

var point = new Point { X = 1, Y = 2 };

 

Under the covers this is the same as writing:

 

var _point = new Point();

_point.X = 1;

_point.Y = 2;

var point = _point;

 

And this is the collection initializer syntax:

 

var points = new List<Point> { new Point(1, 2) };

 

This is the same as writing:

 

var _points = new List<Point>();

_points.Add(new Point(1, 2));

var points = _points;

 

Whilst we should not blame the language authors for providing us with a construct that allows more readable construction of objects, its over use possibly means mutability has become the de facto choice by accident. In a later version of C# the introduction of named arguments has meant that the invocation of a constructor was now also succinct and so mutability no longer had to be sacrificed for readability:

 

var point = new Point( x: 1, y: 2 );

 

On the collection front the addition of interfaces such as IReadOnlyList and the new Immutable Collections library [4] means that the runtime madness of ReadOnlyCollection can be laid to rest.

Summary

Immutability is clearly not a panacea – one size never fits all – but hopefully following me through this thought exercise shows that it can vastly simplify the amount of time you spend thinking up scenarios to test, and therefore the amount of code you write to verify them. Given how tricky writing multi-threaded code already is, being able to reduce the amount you have to write will fill you with greater confidence that you haven’t missed some subtle race condition.

Acknowledgements

Thanks to Jez Higgins and The Lazy Web (aka Twitter) for the brief discussion of Java’s final keyword and its effects.

References

[1] http://www.bigbeeconsultants.co.uk/blog/single-assignment-pattern

[2] http://en.wikipedia.org/wiki/Turtles_all_the_way_down

[3] http://en.wikipedia.org/wiki/No_Silver_Bullet

[4] https://www.nuget.org/packages/Microsoft.Bcl.Immutable

[5] http://en.wikipedia.org/wiki/Builder_pattern

[6] http://c2.com/cgi/wiki?FactoryMethod

 

Chris Oldwood

09 March 2015

 

Bio

Chris is a freelance developer who started out as a bedroom coder in the 80’s writing assembler on 8-bit micros; these days it’s C++ and C#. He also commentates on the Godmanchester duck race and can be contacted via gort@cix.co.uk or @chrisoldwood.