The Gilded Rose Kata and the Approval Tests Library

Note: It may be helpful to read my previous post on the Golden Master Technique before seeing an example of its application here.

The Gilded Rose Kata

I encountered the Gilded Rose Kata after reading this blog post on Why Most Solutions to the Gilded Rose Miss The Bigger Picture. The Gilded Rose code kata is particularly appealing because it requires modification of a poorly written (simulated) code base. It is excellent practice for our typical daily work, yet it is still small enough to learn and jump in quickly.

I encourage you to try the Gilded Rose Kata before reading further. At a minimum, please read the overview and consider how you would approach the problem.

Please give it a try and come back…

Ok, you’re back. How would you start?

The Golden Master Technique

This type of problem is a great example of when the Golden Master technique can be a better starting point than going straight into unit tests.

The overview of Gilded Rose states that the existing application “works”. This describes a “real application”, working in production, that is useful enough to enhance with new features. Thus, it is as important, if not more so, to preserve existing behavior of the application as it is to add new features.

In order to unit test the code properly, we would need to modify the code somewhat significantly (in proportion to the existing code base) to be able to create the seams required for good unit tests. And to get complete coverage of the requirements would require a fair number of tests, some of which may not be obvious or explicit.

Applying the Golden Master Technique

So, let’s get started in creating our “golden master”. We need to execute a sufficient number iterations of the code to generate enough output to create a meaningful baseline. For this example, 50-100 iterations would be more than sufficient coverage.  In a larger code base, we would need to create a more diverse range of input values. In this case, however, the initial setup appears sufficient to cover the necessary code paths.

To generate the Golden Master, we need to:

  1. Open/Create a file for saving the output
  2. Modify code to output state to the file, e.g. write the Item properties (Name/SellIn/Quality) out for each execution of UpdateQuality()
  3. Modify the code to iterate through through a sufficient number of days, 100 days for example
  4. Close/Save the file

Performing the above steps in code would not be difficult. However, this is even easier using the Approval Tests Library. The framework does exactly what we need:

  1. Runs a specified “test”
  2. At the end of the test, asserts/verifies a given state
  3. Compares the resulting execution, with an accepted master. If an accepted master doesn’t already exist, you accept & save it (or fix it until you are happy with it). If the master does exists, but is different, it fails the test. Which can again, either be fixed, or accepted as a new master.

To get started:

  1. Open the GildedRose solution
  2. Add/Install ApprovalTests into our solution.  For .NET, the easiest way, of course, is via NuGet. Otherwise, it can be downloaded from here.
  3. Enhance the application to be able to capture its state, for example using a string representation of the items & their current values
  4. Create a test that will verify the state from the previous step
  5. Run and fix the tests until you are happy with the accepted golden master

How to enhance the application to capture state (step 3)

public string GetSnapshot()
    var snapshot = new StringBuilder();

    foreach (var item in Items)
        snapshot.AppendLine(string.Format("Name: {0}, SellIn: {1}, Quality: {2}", item.Name, item.SellIn, item.Quality));


    return snapshot.ToString();

How to Create a test that will verify the state in the previous step (step 4)

1.  Let’s create a basic approval test, simply to validate the state from the initial setup (before any iterations):

public void TestThis()
    var app = Program.Initialise();

    var initialSetup = app.GetSnapshot();


To make our test compile and run…

2. Change the Program class to public…

public class Program

3. … and extract the initial setup into an Initialize() method:

public static Program Initialize()
    return new Program
        Items = new List
            new Item {Name = "+5 Dexterity Vest", SellIn = 10, Quality = 20},
            new Item {Name = "Aged Brie", SellIn = 2, Quality = 0},
            new Item {Name = "Elixir of the Mongoose", SellIn = 5, Quality = 7},
            new Item {Name = "Sulfuras, Hand of Ragnaros", SellIn = 0, Quality = 80},
            new Item {Name = "Backstage passes to a TAFKAL80ETC concert", SellIn = 15, Quality = 20},
            new Item {Name = "Conjured Mana Cake", SellIn = 3, Quality = 6}

Now, our Main() looks like this:

static void Main(string[] args)

    var app = Initialise();



4.  Finally, we can add another test, to execute with a 100 iterations, which creates our “golden master”:

public void TestThisTimes100()
    var app = Program.Initialise();

    var snapshotForHundredIterations = new StringBuilder();

    var initialSnapshot = app.GetSnapshot();


    for (int i = 0; i < 100; i++ )
        var currentSnapshot = app.GetSnapshot();


When, approval tests execute with the [UseReporter(typeof(DiffReporter))] attribute, the framework launches an installed diff tool. If you save the resulting file, formatted as SomeTestName.approved.txt, it becomes the accepted “golden master”.

That’s it! If you find it slightly confusing the first time, try a couple examples yourself. Once you understand the concept of the ApprovalTest framework, it is a simple and effective way to create a “golden master” test quickly.

At this point, you could proceed with the Gilded Rose Kata, making enhancements, or if desired, creating a set of explicit unit tests to more accurately describe the given requirements & for future readability & maintainability.

Other Resources

Using the Golden Master technique to test legacy code

Working with legacy code is a scary proposition. Generally, we lack an understanding of the application and its codebase, and we don’t have automated test coverage. In fact, in his book Working Effectively with Legacy Code, Michael Feathers defines legacy code as “code without tests”.

Therefore, before making changes to legacy code, it is important to guard against unintended changes. These days, developers are often too quick to assume unit tests are the (only) way to do this. However, in a large code base, where requirements are missing or unclear, this may not be a viable option. We could even introduce bugs by “fixing” behavior,  if downstream systems assume an existing (incorrect) behavior. Therefore, it may be important to first capture and lock down existing behavior before writing unit tests or modifying the existing code. Characterization tests are a means of capturing the existing behaviour.

To create the characterization tests, we can generate a large set of diverse inputs and run them against the existing codebase. By recording and saving these outputs, we capture the existing behavior. These outputs from the original code base are called the “Golden Master”. Later, when we need to modify the code, we can replay the same set of inputs and compare them against the original “master” outputs. Any differences between the original and new outputs help to identify unintended behaviour changes (or can be accepted if intentionally changed).

I have used this technique in real life scenarios, which previously, had been difficult to cover sufficiently with tests. The 80/20 rule applies here; we spent 80% of our time trying to cover 20% (less, really) of the fringe cases. In the end, the golden master technique has been more effective. Once in place, this technique can be combined with unit testing and other test methods.

Thanks to jbrains‘s posts on Legacy Code Retreat for introducing me to this technique. In particular, I recommend this for more information.

I will provide more details, examples, and tools for this technique in future posts.

Update: See an example of using this technique with my follow up post, The Gilded Rose Kata and The Approval Tests Library

Extract Method – How Much is Too Much?

Extract Method is one of the most basic and common refactorings. In Refactoring: Improving the Design of Existing Code, Martin Fowler gives the following motivation for using Extract Method:

“Extract Method is one of the most common refactorings I do.  I look at a method that is too long or look at code that needs a comment to understand its purpose.  I then turn that fragment of code into its own method.

I prefer short, well-named methods for several reasons.  First, it increases the chances that other methods can use a method when the method is finely grained.  Second, it allows the higher-level methods to read more like a series of comments.  Overriding also is easier when the methods are finely grained.

It does take a little getting used to if you are used to seeing larger methods.  And small methods really work only when you have good names, so you need to pay attention to naming.  People sometimes ask me what length I look for in a method.  To me length is not the issue.  The key is the semantic distance between the method name and the method body.  If extracting improves clarity, do it, even
if the name is longer than the code you have extracted.”

Few would argue with the benefits of this refactoring.

But, what happens if we perform Extract Method to the extreme?  What happens if we follow Uncle Bob‘s advice and extract till we drop?

‘For years authors and consultants (like me) have been telling us that functions should do one thing. They should do it well. They should do it only.

The question is: What the hell does “one thing” mean?

After all, one man’s “one thing” might be someone else’s “two things”.’

Uncle Bob’s post provides an example of extracting until nothing else can be done, and then he ends with this comment:

“Perhaps you think this is taking things too far. I used to think so too. But after programming for over 40+ years, I’m beginning to come to the conclusion that this level of extraction is not taking things too far at all. In fact, to me, it looks just about right.

So, my advice: Extract till you just can’t extract any more. Extract till you drop.”

As he predicts, many people do think he’s going too far.  Here’s a couple excerpts from the post’s comments:

“Following the flow of the data through the fully extracted version becomes difficult, since the developer will need to jump around constantly throughout the body of the class.

If the goal is to make development and maintenance easier and fully extracting the class makes it more difficult for a developer to follow the flow of the data is it better to fully extract just for the sake of following a rule?

My point is that patterns in code are easier to see when things are not broken down into such small chunks. At the fully decomposed state it isn’t obvious that an Adapter on the Matcher would simply fit into place. By decomposing the methods so fine you lose context, so much so it isn’t evident how the method relates to the accomplishing the goal of the class.”

and, another:

‘A function by definition, returns 1 result from 1 input. If there’s no reuse, there is no “should”. Decomposition is for reuse, not just to decompose. Depending on the language/compiler there may be additional decision weights.

What I see from the example is you’ve gone and polluted your namespace with increasingly complex,longer,more obscure, function name mangling which could have been achieved (quick and readable) with whitespace and comments. To mirror a previous poster, I rather see a javadoc with proper commenting than to trace what’s going on for such a simplistic case. I’m afraid to ask what you plan to do when the class is more complex and symbolExpression(..) isn’t descriptive enough!’

These arguments make a good point.  However, these arguments also apply to object-oriented code in general.  Reading and navigating object-oriented code can often be more difficult than its procedural counterparts.  However, we hope to overcome these disadvantages by creating a structure that is more readable and reusable overall.

In Uncle Bob’s example, the newer, more granular methods provide a more complete and accurate view of the behaviors and capabilities of the SymbolReplacer class.  In isolation, it might appear as overkill and “polluted namespaces”.  However, if you were maintaining a large codebase and needed to understand how to use (or reuse) SymbolReplacer, I believe Uncle Bob’s approach would make your task much easier.  You don’t need to read through javadoc (as one commenter prefers).  Instead, the method names are more clear, the size is smaller and easier to override, and the class itself almost becomes readable English.  In my opinion, these advantages outweigh the loss of navigability.

But, perhaps, as Martin Fowler mentions “it does take a little getting used to”.  Uncle Bob said almost the same thing: “Perhaps you think this is taking things too far. I used to think so too. But after programming for over 40+ years, I’m beginning to come to the conclusion that this level of extraction is not taking things too far at all. In fact, to me, it looks just about right.”

With the wisdom of those two, I think we owe it to ourselves to set aside our skepticism and give it a real try.  We can come back later, compare results, and make a decision then.  I have found that those who are willing to try their advice, in the end, never go back.  Perhaps, you will will find that your code gets cleaner and opportunities for reuse start showing themselves in surprising ways.