The Gilded Rose Kata and the Approval Tests Library

Note: It may be helpful to read my previous post on the Golden Master Technique before seeing an example of its application here.

The Gilded Rose Kata

I encountered the Gilded Rose Kata after reading this blog post on Why Most Solutions to the Gilded Rose Miss The Bigger Picture. The Gilded Rose code kata is particularly appealing because it requires modification of a poorly written (simulated) code base. It is excellent practice for our typical daily work, yet it is still small enough to learn and jump in quickly.

I encourage you to try the Gilded Rose Kata before reading further. At a minimum, please read the overview and consider how you would approach the problem.

Please give it a try and come back…

Ok, you’re back. How would you start?

The Golden Master Technique

This type of problem is a great example of when the Golden Master technique can be a better starting point than going straight into unit tests.

The overview of Gilded Rose states that the existing application “works”. This describes a “real application”, working in production, that is useful enough to enhance with new features. Thus, it is as important, if not more so, to preserve existing behavior of the application as it is to add new features.

In order to unit test the code properly, we would need to modify the code somewhat significantly (in proportion to the existing code base) to be able to create the seams required for good unit tests. And to get complete coverage of the requirements would require a fair number of tests, some of which may not be obvious or explicit.

Applying the Golden Master Technique

So, let’s get started in creating our “golden master”. We need to execute a sufficient number iterations of the code to generate enough output to create a meaningful baseline. For this example, 50-100 iterations would be more than sufficient coverage.  In a larger code base, we would need to create a more diverse range of input values. In this case, however, the initial setup appears sufficient to cover the necessary code paths.

To generate the Golden Master, we need to:

  1. Open/Create a file for saving the output
  2. Modify code to output state to the file, e.g. write the Item properties (Name/SellIn/Quality) out for each execution of UpdateQuality()
  3. Modify the code to iterate through through a sufficient number of days, 100 days for example
  4. Close/Save the file

Performing the above steps in code would not be difficult. However, this is even easier using the Approval Tests Library. The framework does exactly what we need:

  1. Runs a specified “test”
  2. At the end of the test, asserts/verifies a given state
  3. Compares the resulting execution, with an accepted master. If an accepted master doesn’t already exist, you accept & save it (or fix it until you are happy with it). If the master does exists, but is different, it fails the test. Which can again, either be fixed, or accepted as a new master.

To get started:

  1. Open the GildedRose solution
  2. Add/Install ApprovalTests into our solution.  For .NET, the easiest way, of course, is via NuGet. Otherwise, it can be downloaded from here.
  3. Enhance the application to be able to capture its state, for example using a string representation of the items & their current values
  4. Create a test that will verify the state from the previous step
  5. Run and fix the tests until you are happy with the accepted golden master

How to enhance the application to capture state (step 3)

public string GetSnapshot()
    var snapshot = new StringBuilder();

    foreach (var item in Items)
        snapshot.AppendLine(string.Format("Name: {0}, SellIn: {1}, Quality: {2}", item.Name, item.SellIn, item.Quality));


    return snapshot.ToString();

How to Create a test that will verify the state in the previous step (step 4)

1.  Let’s create a basic approval test, simply to validate the state from the initial setup (before any iterations):

public void TestThis()
    var app = Program.Initialise();

    var initialSetup = app.GetSnapshot();


To make our test compile and run…

2. Change the Program class to public…

public class Program

3. … and extract the initial setup into an Initialize() method:

public static Program Initialize()
    return new Program
        Items = new List
            new Item {Name = "+5 Dexterity Vest", SellIn = 10, Quality = 20},
            new Item {Name = "Aged Brie", SellIn = 2, Quality = 0},
            new Item {Name = "Elixir of the Mongoose", SellIn = 5, Quality = 7},
            new Item {Name = "Sulfuras, Hand of Ragnaros", SellIn = 0, Quality = 80},
            new Item {Name = "Backstage passes to a TAFKAL80ETC concert", SellIn = 15, Quality = 20},
            new Item {Name = "Conjured Mana Cake", SellIn = 3, Quality = 6}

Now, our Main() looks like this:

static void Main(string[] args)

    var app = Initialise();



4.  Finally, we can add another test, to execute with a 100 iterations, which creates our “golden master”:

public void TestThisTimes100()
    var app = Program.Initialise();

    var snapshotForHundredIterations = new StringBuilder();

    var initialSnapshot = app.GetSnapshot();


    for (int i = 0; i < 100; i++ )
        var currentSnapshot = app.GetSnapshot();


When, approval tests execute with the [UseReporter(typeof(DiffReporter))] attribute, the framework launches an installed diff tool. If you save the resulting file, formatted as SomeTestName.approved.txt, it becomes the accepted “golden master”.

That’s it! If you find it slightly confusing the first time, try a couple examples yourself. Once you understand the concept of the ApprovalTest framework, it is a simple and effective way to create a “golden master” test quickly.

At this point, you could proceed with the Gilded Rose Kata, making enhancements, or if desired, creating a set of explicit unit tests to more accurately describe the given requirements & for future readability & maintainability.

Other Resources


Using the Golden Master technique to test legacy code

Working with legacy code is a scary proposition. Generally, we lack an understanding of the application and its codebase, and we don’t have automated test coverage. In fact, in his book Working Effectively with Legacy Code, Michael Feathers defines legacy code as “code without tests”.

Therefore, before making changes to legacy code, it is important to guard against unintended changes. These days, developers are often too quick to assume unit tests are the (only) way to do this. However, in a large code base, where requirements are missing or unclear, this may not be a viable option. We could even introduce bugs by “fixing” behavior,  if downstream systems assume an existing (incorrect) behavior. Therefore, it may be important to first capture and lock down existing behavior before writing unit tests or modifying the existing code. Characterization tests are a means of capturing the existing behaviour.

To create the characterization tests, we can generate a large set of diverse inputs and run them against the existing codebase. By recording and saving these outputs, we capture the existing behavior. These outputs from the original code base are called the “Golden Master”. Later, when we need to modify the code, we can replay the same set of inputs and compare them against the original “master” outputs. Any differences between the original and new outputs help to identify unintended behaviour changes (or can be accepted if intentionally changed).

I have used this technique in real life scenarios, which previously, had been difficult to cover sufficiently with tests. The 80/20 rule applies here; we spent 80% of our time trying to cover 20% (less, really) of the fringe cases. In the end, the golden master technique has been more effective. Once in place, this technique can be combined with unit testing and other test methods.

Thanks to jbrains‘s posts on Legacy Code Retreat for introducing me to this technique. In particular, I recommend this for more information.

I will provide more details, examples, and tools for this technique in future posts.

Update: See an example of using this technique with my follow up post, The Gilded Rose Kata and The Approval Tests Library

Primitive Obsession Example # 1 – KeyValuePair

My last post described how removing cases of “Primitive Obsession” is one of the best ways to improve the readability of your code. Replacing primitives with domain specific types helps reveal the intention of the code, which helps others to read, maintain, and use the code we write.

This post will be the first in a series of examples illustrating the primitive obsession code smell. In each post, I will provide a “before” code sample where primitives are used, and then an “after” sample, where the usage of a primitive type is replaced with a domain specific type.

The first example will illustrate using a .NET built-in type, the KeyValuePair.

Disclaimer # 1: It is perfectly acceptable and appropriate to use this type in some cases. However, the type is often overused and can make code less readable. My intention with this post is for you to compare the “before” and “after” versions to see which approach is more readable for your situation.

Now, on to our example…

“Before”- code using the built-in generic type, KeyValuePair<TKey, TValue>:

class Program
    static void Main()
        var list = new List<KeyValuePair>();

        list.Add(new KeyValuePair("Chris Melinn", "FAKE123"));
        list.Add(new KeyValuePair("Lisa Simpson", "ABC456"));
        list.Add(new KeyValuePair("Waylan Smithers", "XYZ789"));

        // Outputs (as strings):
        // Chris Melinn, 'FAKE123'
        // Lisa Simpson, 'ABC456'
        // Waylan Smithers, 'XYZ789'
        foreach (var item in list)
            var message = string.Format("{0} , '{1}'",


Even in this simple example, it can be hard to tell what data is captured by the List of KeyValuePairs. Each KeyValuePair contains a Key of type string, and a Value also of type string. In a real application, hopefully, we would also have better variable names to help us determine what these objects contain. However, in this case, I have intentionally left generic variable names (such as item and list) to highlight the loss of meaning by using the primitive type.

After inspecting the data values in the sample, we could probably guess that our Key’s are people’s names. However, we might not be able to guess what our Value’s hold.

This example will assume that our code intends to print out the company’s computer inventory, with a list of machines issued to various staff members. In other words, in our example, the Key is the employee name and the Value is the machine serial number. Unfortunately, the primitive type does not capture that information. However, we will see this is easy information to convey when we refactor to a domain specific type.

Also, notice that, in this example, our type is created and used within the same block of code. In a real application, it is much more likely that another class would create the list (e.g. by reading from a database), and then return the values it to seperate class to consume the data.  In such a case, we would only see an interface or method such as:

public List<KeyValuePair<string, string>> GetEmployeeMachines()

Even with a well-named method, the developer will probably need to navigate to the class and method and view the source code (if possible) to confirm that the keys and values are what she expects them to be (not to mention, code signatures like these are fairly ugly and unpleasant).

Now, let’s refactor our original sample to remove the primitive KeyValuePair type and replace it with a domain-specific type. It would look something like the following (affected/modified lines are highlighted):

“After”- KeyValuePair<TKey, TValue> replaced with a domain specific type, EmployeeMachine:

class Program
    static void Main()
        var list = new List();

        list.Add(new EmployeeMachine("Chris Melinn", "FAKE123"));
        list.Add(new EmployeeMachine("Lisa Simpson", "ABC456"));
        list.Add(new EmployeeMachine("Waylan Smithers", "XYZ789"));

        // Outputs (as strings):
        // Chris Melinn, 'FAKE123'
        // Lisa Simpson, 'ABC456'
        // Waylan Smithers, 'XYZ789'
        foreach (var item in list)
            var message = string.Format("{0} , '{1}'",


class EmployeeMachine
    public string EmployeeName { get; private set; }
    public string MachineSerialNumber { get; private set; }

    public EmployeeMachine(string employeeName, 
                           string machineSerialNumber)
        EmployeeName = employeeName;
        MachineSerialNumber = machineSerialNumber;

The final result is more intention revealing and better conveys what data is captured in the EmployeeMachine class. Even with other generic, poorly named variables still left in the code (such as item and list), the reader can still understand what the code is doing without navigating and inspecting other classes.

Most importantly, I believe this refactoring is one of the easy “little things” you can do to increase the satisfaction of others who read, maintain, and use your code.

As always, I would love to hear from readers — especially if you have any personal experiences (or even code examples!) to share.

Improving Code – Remove “Primitive Obsession”

One of the best ways to improve code is to remove cases of Primitive Obsession. This particular code smell arises when we use primitive types (such strings or integers) to represent what should be explicit types in our domain.

In some cases, primitive types are quite appropriate. For example, we may only need an instance of some string to display somewhere in our UI. However, in other cases, our “simple string” should be a better abstraction within our domain.

For example, using a string to store my name (“Chris Melinn”) would likely be a loss of an abstraction. This probably needs to be modeled as a something more explicit such as a User or a Person, having a FirstName and a LastName. This is particularly true when other places in the application do things such as parse this string to derive a first name and a last name. These behaivors are telling us that our “name” is trying to be more than just a string.

At first, it may seem silly to create a whole new class that possibly contains just one or two members (like our Person with a Name or FirstName and LastName). However, by creating this “placeholder”, we are defining its place in our domain. More importantly, you may find that other behavior begins to find its way into your “simple” class (such as the conversion and calculation logic that would otherwise be spread around your application). When this happens, you know you did the right thing.

For a much better description and discussion, please take the time to watch this video by Corey Haines and J.B. Rainsberger. They are both excellent craftsmen, and, if you haven’t already, I recommend you take the time to explore their blogs as well.

JB Rainsberger – Primitive Obsession from Corey Haines on Vimeo.

Questions Before Refactoring

I want to discuss a question I found quite some time ago on StackOverflow, which posted a code sample (to follow) along with these questions and comments (bold highlights are mine):

(clipped)  “… this is the perfect time to refactor this puppy and make it more portable, reusable, reliable, easier to configure etc. Now that’s fine and dandy, but then I started feeling a tad overwhelmed regarding to where to start. Should it be a separate library? How should it be configured? Should it use IoC? DI?

So my [admittedly subjective] question is this — given a relatively small, but quite useful class like the one below, what is a good approach to refactoring it? What questions do you ask and how do you decide what to implement or not implement? Where do you draw the line regarding configuration flexibility?

[Note: Please don’t bash this code too much okay? It was written a long time ago and has functioned perfectly well baked into an in house application.]”

Before we get into the code sample, let’s address some of the key questions…

“What is a good approach to refactoring it?  What questions do you ask…?”

The first question I would ask is:

1.  Do I need to refactor this code at all?

In this case, the author commented at the end of his question that the code “was written a long time ago and has functioned perfectly well baked into an in house application”.  This comment suggests there may not be a need to refactor at all!  If the code has survived and hasn’t been modified in a long time, it probably doesn’t make sense to change it now.  However, let’s assume we have good reasons to start (e.g. need to start making a changes)…

2.  What is my objective in refactoring the code?

The term “refactoring” is becoming quite commonplace, and it is often used interchangeably with any type of code modification.  However, in Refactoring: Improving the Design of Existing Code, Fowler defined it as

“Refactoring (noun):  a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.”

With this definition, Fowler highlights that refactoring is not any general modification to the code, but specifically a change with the intention to “make it easier to understand”.

In this episode of Hanselminutes, “Uncle Bob” Martin makes the comment:

“You are constantly mercilessly refactoring systems,
not necessarily because they need to be refactored
but because they need to remain flexible. So it’s
rather like moving the gears on a bicycle or driving
your car for no other reason than to keep it well
lubricated, and in order to make those kinds of
changes you need tests, it all boils down to tests in
the end.”

(Before we move on, I want to highlight another important fact.  You cannot be sure you haven’t changed behavior unless the code is covered by good unit tests.  So, without good unit tests, you are not really refactoring.)

With this context in mind, we can determine how to make the code more flexible, easier to understand, and cheaper to modify.

3.  What should I do?  What should I refactor?

The reason I was attracted to this particular code sample to highlight is that it is actually a fairly decent piece of code.  I can see, as the author points out, that it could have been good enough to support an existing application without many problems.  I also think this sample is reflective of the type of code I generally see in the wild:  functional, but not expressive.

Additionally, even for a small code sample, it contains quite a bit of code duplication. I feel this amount of duplication is typical (unfortunately) of the average application.

The biggest step you can make to improve your code is learning (1) how to remove duplication, and (2) how to make your code more expressive.

Extract Method – How Much is Too Much?

Extract Method is one of the most basic and common refactorings. In Refactoring: Improving the Design of Existing Code, Martin Fowler gives the following motivation for using Extract Method:

“Extract Method is one of the most common refactorings I do.  I look at a method that is too long or look at code that needs a comment to understand its purpose.  I then turn that fragment of code into its own method.

I prefer short, well-named methods for several reasons.  First, it increases the chances that other methods can use a method when the method is finely grained.  Second, it allows the higher-level methods to read more like a series of comments.  Overriding also is easier when the methods are finely grained.

It does take a little getting used to if you are used to seeing larger methods.  And small methods really work only when you have good names, so you need to pay attention to naming.  People sometimes ask me what length I look for in a method.  To me length is not the issue.  The key is the semantic distance between the method name and the method body.  If extracting improves clarity, do it, even
if the name is longer than the code you have extracted.”

Few would argue with the benefits of this refactoring.

But, what happens if we perform Extract Method to the extreme?  What happens if we follow Uncle Bob‘s advice and extract till we drop?

‘For years authors and consultants (like me) have been telling us that functions should do one thing. They should do it well. They should do it only.

The question is: What the hell does “one thing” mean?

After all, one man’s “one thing” might be someone else’s “two things”.’

Uncle Bob’s post provides an example of extracting until nothing else can be done, and then he ends with this comment:

“Perhaps you think this is taking things too far. I used to think so too. But after programming for over 40+ years, I’m beginning to come to the conclusion that this level of extraction is not taking things too far at all. In fact, to me, it looks just about right.

So, my advice: Extract till you just can’t extract any more. Extract till you drop.”

As he predicts, many people do think he’s going too far.  Here’s a couple excerpts from the post’s comments:

“Following the flow of the data through the fully extracted version becomes difficult, since the developer will need to jump around constantly throughout the body of the class.

If the goal is to make development and maintenance easier and fully extracting the class makes it more difficult for a developer to follow the flow of the data is it better to fully extract just for the sake of following a rule?

My point is that patterns in code are easier to see when things are not broken down into such small chunks. At the fully decomposed state it isn’t obvious that an Adapter on the Matcher would simply fit into place. By decomposing the methods so fine you lose context, so much so it isn’t evident how the method relates to the accomplishing the goal of the class.”

and, another:

‘A function by definition, returns 1 result from 1 input. If there’s no reuse, there is no “should”. Decomposition is for reuse, not just to decompose. Depending on the language/compiler there may be additional decision weights.

What I see from the example is you’ve gone and polluted your namespace with increasingly complex,longer,more obscure, function name mangling which could have been achieved (quick and readable) with whitespace and comments. To mirror a previous poster, I rather see a javadoc with proper commenting than to trace what’s going on for such a simplistic case. I’m afraid to ask what you plan to do when the class is more complex and symbolExpression(..) isn’t descriptive enough!’

These arguments make a good point.  However, these arguments also apply to object-oriented code in general.  Reading and navigating object-oriented code can often be more difficult than its procedural counterparts.  However, we hope to overcome these disadvantages by creating a structure that is more readable and reusable overall.

In Uncle Bob’s example, the newer, more granular methods provide a more complete and accurate view of the behaviors and capabilities of the SymbolReplacer class.  In isolation, it might appear as overkill and “polluted namespaces”.  However, if you were maintaining a large codebase and needed to understand how to use (or reuse) SymbolReplacer, I believe Uncle Bob’s approach would make your task much easier.  You don’t need to read through javadoc (as one commenter prefers).  Instead, the method names are more clear, the size is smaller and easier to override, and the class itself almost becomes readable English.  In my opinion, these advantages outweigh the loss of navigability.

But, perhaps, as Martin Fowler mentions “it does take a little getting used to”.  Uncle Bob said almost the same thing: “Perhaps you think this is taking things too far. I used to think so too. But after programming for over 40+ years, I’m beginning to come to the conclusion that this level of extraction is not taking things too far at all. In fact, to me, it looks just about right.”

With the wisdom of those two, I think we owe it to ourselves to set aside our skepticism and give it a real try.  We can come back later, compare results, and make a decision then.  I have found that those who are willing to try their advice, in the end, never go back.  Perhaps, you will will find that your code gets cleaner and opportunities for reuse start showing themselves in surprising ways.