I want to discuss a question I found quite some time ago on StackOverflow, which posted a code sample (to follow) along with these questions and comments (bold highlights are mine):

(clipped)  “… this is the perfect time to refactor this puppy and make it more portable, reusable, reliable, easier to configure etc. Now that’s fine and dandy, but then I started feeling a tad overwhelmed regarding to where to start. Should it be a separate library? How should it be configured? Should it use IoC? DI?

So my [admittedly subjective] question is this — given a relatively small, but quite useful class like the one below, what is a good approach to refactoring it? What questions do you ask and how do you decide what to implement or not implement? Where do you draw the line regarding configuration flexibility?

[Note: Please don't bash this code too much okay? It was written a long time ago and has functioned perfectly well baked into an in house application.]”

Before we get into the code sample, let’s address some of the key questions…

“What is a good approach to refactoring it?  What questions do you ask…?”

The first question I would ask is:

1.  Do I need to refactor this code at all?

In this case, the author commented at the end of his question that the code “was written a long time ago and has functioned perfectly well baked into an in house application”.  This comment suggests there may not be a need to refactor at all!  If the code has survived and hasn’t been modified in a long time, it probably doesn’t make sense to change it now.  However, let’s assume we have good reasons to start (e.g. need to start making a changes)…

2.  What is my objective in refactoring the code?

The term “refactoring” is becoming quite commonplace, and it is often used interchangeably with any type of code modification.  However, in Refactoring: Improving the Design of Existing Code, Fowler defined it as

“Refactoring (noun):  a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.”

With this definition, Fowler highlights that refactoring is not any general modification to the code, but specifically a change with the intention to “make it easier to understand”.

In this episode of Hanselminutes, “Uncle Bob” Martin makes the comment:

“You are constantly mercilessly refactoring systems,
not necessarily because they need to be refactored
but because they need to remain flexible. So it’s
rather like moving the gears on a bicycle or driving
your car for no other reason than to keep it well
lubricated, and in order to make those kinds of
changes you need tests, it all boils down to tests in
the end.”

(Before we move on, I want to highlight another important fact.  You cannot be sure you haven’t changed behavior unless the code is covered by good unit tests.  So, without good unit tests, you are not really refactoring.)

With this context in mind, we can determine how to make the code more flexible, easier to understand, and cheaper to modify.

3.  What should I do?  What should I refactor?

The reason I was attracted to this particular code sample to highlight is that it is actually a fairly decent piece of code.  I can see, as the author points out, that it could have been good enough to support an existing application without many problems.  I also think this sample is reflective of the type of code I generally see in the wild:  functional, but not expressive.

Additionally, even for a small code sample, it contains quite a bit of code duplication. I feel this amount of duplication is typical (unfortunately) of the average application.

The biggest step you can make to improve your code is learning (1) how to remove duplication, and (2) how to make your code more expressive.

As part of our first DevEvening, I did a presentation on T4, the little-known code generation technology built-in to Visual Studio.  It was a pleasure meeting everyone at our first meeting, and I hope it was the first of many more to come.  I also want to give a big thank you to Alex Mackey for making it all happen – We are lucky to have you in Melbourne!  Thanks also to Clarius for their generous donation of two professional licenses of their Visual T4 Editor — congratulations Tarn and David!

The slides from the presentation are below:

This was one of my first presentations in this type of forum, but I think it went relatively well.  It’s surprising how much effort goes into making a successful presentation, and I have a newfound respect to those who do this regularly!

If you have any feedback on the presentation, please comment here or let me know on twitter.  I would appreciate any feedback to help me improve.

Relevant Links

In the latest Hanselminutes podcast, Roy Osherove describes some of his best practices in unit testing techniques.  In the discussion, Roy describes one of the key tenets from his book which says that a unit test must be “trustworthy”.

On the surface, this tenet seems fairly obvious.  Of course, we need to trust our tests!  However, as he describes some situations where this rule is often violated, I realize I would be guilty on many occasions.

For example, I always struggle with the idea to include data access layer tests in my unit tests.  Yes, I know that testing the database is really an integration test and “purists” would argue these should not be included.  However, I feel the only way to test the data access code in a meaningful way is to hit the database.  In other words, the code in the data access layer and the related code in the database are a single logical unit.

In this podcast, Roy helps me to understand the difference, which I have summarized here:

Clarification # 1 – Data Access Layer Unit Tests are not trustworthy

With a trustworthy test, you must be able trust the results 100% of the time.  If the test fails, I know the code is broken and must be fixed.  This means I shouldn’t have to ask things like “Was the database down?”, “Was the connection string OK?”, “Was the stored procedure modified?”.  By asking these questions, it shows that I don’t trust the results and I have a poorly designed “unit test”.

Clarification # 2 – Data Access Layer tests must be integration tests

In the podcast, Roy also states that the data access layer is generally a very thin layer on top of the database, invoking some execution logic within the database itself.  For example, the data access layer will issue a command/query such as a SELECT, INSERT, UPDATE, DELETE or similar set of statements using a stored procedure.  There generally is not much else that sits inside the data access code.  So, Roy also believes that this type of code must be tested as a single unit.

However, Roy also points out that these must be also be integration tests.  I had made the assumption that since they must be tested as a unit, then they must be unit tests!  However, by doing so, I have created untrustworthy tests.  Now, it makes sense to me that the data access layer will likely not include any unit tests at all!  They will only be created as integration tests.

Clarification # 3 – Separation of Unit Tests from Integration Tests

It probably seems like an obvious observation to make, but I feel its important to distinguish:  Unit Tests and Integration Tests are often both developed using a “Unit Test Framework”.  We’re using the same tools to create both types of tests.

However, the problem occurs when we blend the two types of tests together.  Instead, we should create different projects/assemblies, so they can be run easily and independently from one other.  It should be obvious and easy for someone to get all the code out of source control, build the solution, and know which tests should pass from the beginning.  If you mix integration tests into the same project, the other developer will have to determine which tests “should” pass and which “should” fail.  This makes things more difficult than necessary for other developers and may likely mean that your tests will ignored and no longer maintained.

Similar to my last post, this topic was also inspired by our discussions at this month’s Melbourne Patterns Group meeting.  In the meeting, an attendee mentioned that closures also assist in performing lazy evaluation.

Pedram Rezaei wrote an excellent article on lazy evaluation in C#, which helped me better understand lazy evaluation support in C#.  However, a seemingly minor comment from his post also struck me:

As you can see both Where and Select methods also return an instance of a type implementing IEnumerable<string> meaning that they are not executed until an enumerator is created… and its MoveNext method is called…

For some reason, I never realized returning IEnumerable<> meant the closure would not be evaluated until each MoveNext invocation.  Even though I knew that LINQ queries were generally “lazily evaluated”, I had not realized that this was the reason.

In learning this, I should expand on this statement from my last post:

** ToList( ) is only required if you want to maintain the return type as List<Employee>.  Otherwise, it is probably more appropriate to change to IEnumerable<Employee> here.

The referenced code block:

public List<employee> Managers(List<employee> emps)
{
return emps.Where(e => e.IsManager).ToList();
}

Although the statement and code above is still acceptable, it will behave in a way that wasn’t obvious to me at the time.  By converting the IEnumerable<> to a List<>, I had changed the implementation from being ‘lazily evaluated’ to being ‘eagerly evaluated’.  This isn’t necessarily a bad thing, but it is important to be aware of the difference (see this article by Sam Allen for more information).

This month’s Melbourne Patterns Group meeting included discussions on functional programming.  There were some topics and terms used in the discussion that encouraged me to do more research.

One of the terms I wanted to understand better was “closures”.  In the demonstration, I was familiar with the usages (like anonymous delegates in C# or blocks from Smalltalk), but I wanted to learn more.  My investigation led me to a relatively old post on the topic from Martin Fowler.  At the bottom of his article is another reference to the C# implementation of closures, based on C# 2.0.

As these articles were written in 2004, C# has since added more support for closures in C# 3.0 with lamba expressions, and again with LINQ and extension methods in 3.5.  I found it interesting to rewrite the same examples in both C# 3.0 and 3.5:

Example 1 – List of Employees who are Managers

Fowler’s Original Example in Ruby

def managers(emps)
  return emps.select {|e| e.isManager}
end

C# 2.0 Example from Walnes

public List<Employee> Managers(List<Employee> emps)
{
    return emps.FindAll(
        delegate(Employee e)
        {
            return e.IsManager;
        }
    );
}

C# 3.0

We can now replace the anonymous delegate with a smaller lamba expression:

public List<Employee> Managers(List<Employee> emps)
{
    return emps.FindAll(e => e.IsManager);
}

LINQ and 3.5

We can now use the extension methods on IEnumerable<T> from which List<T> derives.  In this case, the Where method is appropriate.

** ToList( ) is only required if you want to maintain the return type as List<Employee>.  Otherwise, it is probably more appropriate to change to IEnumerable<Employee> here.  [Update: see this post for more differences regarding these return types.]

public List<Employee> Managers(List<Employee> emps)
{
    return emps.Where(e => e.IsManager).ToList();
}

Now, in 3.5, the C# code is nearly identical in structure and clarity as the orginal ruby example.

Example 2 – referring to a local variable in the closure block

Fowler’s Original Example in Ruby

def highPaid(emps)
  threshold = 150
  return emps.select {|e| e.salary > threshold}
end

C# 2.0 Example from Walnes

public List<Employee> HighPaid(List<Employee> emps)
{
    int threshold = 150;
    return emps.FindAll(delegate(Employee e)
    {
        return e.Salary > threshold;
    });
}

C# 3.0

Again, we can now replace the anonymous delegate with a smaller lamba expression:

public List<Employee> HighPaid(List<Employee> emps)
{
    int threshold = 150;
    return emps.FindAll(e => e.Salary > threshold);
}

LINQ and 3.5

Again, we can now use the extension methods on IEnumerable<T> from which List<T> derives.  In this case, the Where method is appropriate.

** Again, ToList( ) is only required if you want to maintain the return type as List<Employee>.  Otherwise, it is probably more appropriate to change to IEnumerable<Employee> here.  [Update: see this post for more differences regarding these return types.]

public List<Employee> HighPaid(List<Employee> emps)
{
    int threshold = 150;
    return (List<Employee>) emps.Where(e => e.Salary > threshold).ToList();
}

Now, in 3.5, the C# code is nearly identical in structure and clarity as the orginal ruby example.

Example 3 – returning a closure from a method

Fowler’s Original Example in Ruby

def paidMore(amount)
  return Proc.new {|e| e.salary > amount}
end

C# 2.0 Example from Walnes

public Predicate<Employee> PaidMore(int amount)
{
    return delegate(Employee e)
    {
        return e.Salary > amount;
    };
}

C# 3.0

Again, we can now replace the anonymous delegate with a smaller lamba expression:

public Predicate<Employee> PaidMore(int amount)
{
    return e => e.Salary > amount;
}

There would be no change to the above in 3.5 (same as C# 3.0) here.

Example 4 – Create a function using the closure & assign to a variable

Fowler’s Original Example in Ruby

highPaid = paidMore(150)
john = Employee.new
john.salary = 200
print highPaid.call(john)

C# 2.0 Example from Walnes

Predicate<employee> highPaid = PaidMore(150);
Employee john = new Employee();
john.Salary = 200;
Console.WriteLine(highPaid(john));

There would be no change to the above in 3.0 or 3.5 (same as C# 2.0) here.

Download source here to see complete C# examples.

Follow

Get every new post delivered to your Inbox.