Mutation Driven Testing – When TDD Just Isn’t Good Enough

As someone who loves discussing Software Craftsmanship and best practices, Test Driven Development (TDD) is a bit of a sore spot for me. Let me start off by saying that I love TDD’s emphasis on testing. Too many software projects skimp on testing. And the results speak for themselves many years down the road, when changes take exponentially longer to implement, and people are too afraid to even touch anything.

That said, I’ve still never been a big fan of TDD. One the one hand, it’s too strict. Insisting on writing tests first, often gets in the way of the exploratory work – work that is needed before you can iron out what the right interfaces, methods, and OO-structure should be.

But on the other hand, TDD is ironically too lenient. Many practitioners assume that because they are practicing TDD, their test-suite is rock solid. When in reality, I’ve seen far too many tests written during TDD, that still suffer from gaping coverage holes. Coverage holes that can and will result in production bugs, either now or in the future. As far as testing methodology goes, plugging these coverage holes should be your top priority – overriding all other fads and recommendations.

Hence why my favored testing philosophy is best illustrated by Mutation Driven Testing, which follows this sequence of steps:

  1. Get to a state where you have both code and successfully passing tests
    1. Whether you should write code-first or test-first is best debated elsewhere. You’re certainly welcome to use TDD to get to this point
  2. Go through your newly added/modified code, line by line, and manually inject a single bug
    1. When deciding what a “reasonable bug” is, assume carelessness, laziness, inexperience and incompetence. But not malice. Using tests to catch malicious bugs is exponentially harder, and less realistic
  3. Verify that some test now fails
  4. If a test does not fail, review what coverage holes you have in your tests, and fix them by adding new tests or updating your existing tests
    1. The better you become at writing tests, the less often this should occur
  5. Undo the bug you just injected, and verify that your tests are now passing
  6. Go back to step 2 and repeat, until you’ve injected every bug you can think of, or have run out of time
    1. As you get the hang for Mutation-Driven-Testing, you’ll develop an intuitive knack for figuring out which bugs are most likely to slip past your average test suite. This will dramatically speed up this process, and also teach you to write more comprehensive tests

The philosophy behind Mutation-Driven-Testing is simple. The only way to assess the reliability of your test suite, is to see whether it fails when a bug is present. So inject that bug yourself. Use the resulting output to figure out where your coverage holes are, and strengthen them as appropriate. Not just to catch the specific bug you just injected, but any other similar category of bug too. By doing this, you can identify the blind spots in your test suite, and strengthen your test coverage accordingly.

Sidenote: There is tooling out there that attempts to automate the above form of mutation testing. I look forward to the day when they become mainstream, and just as comprehensive. For now, this article will focus on manually injected mutations.

A TDD Example

If you listen closely, you can hear a million TDD proponents crying out in protest.

“But you don’t need to do Mutation Testing if you’ve actually done TDD! If you do TDD correctly, every single piece of functionality will have a dedicated test, so you will never ever have a coverage hole!”

To illustrate why this is not true, and to demonstrate the benefits of Mutation-Driven-Testing, I have put together the following example. The source of the example is the top google result, when you search for “TDD Example”. 

One might argue that the author of this article is not a good role model for TDD. That “no true TDD practitioner” would write tests like in the above example. But this really is just sour grapes. The author is making a concerted effort to write comprehensive tests, has done a good job, and the example chosen below is very simple. The reality of TDD in the workplace, is that most practitioners aren’t perfect, and are always prone to some oversights. Oversights that can be flagged and fixed with Mutation-Driven-Testing.

Without further ado, let’s dive into the example – creating a simple string-based calculator. For the sake of conciseness, let’s look at just the first 3 requirements in the example, along with their implementations and tests.

Requirements:

  1. The method can take 0, 1 or 2 numbers separated by comma
  2. For an empty string the method will return 0
  3. Method will return their sum of numbers

Tests:

private static final TddExample EXAMPLE = new TddExample();

@Test(expected = RuntimeException.class)
public final void whenMoreThan2NumbersAreUsedThenExceptionIsThrown() {
  EXAMPLE.add("1,2,3");
}

@Test
public final void when2NumbersAreUsedThenNoExceptionIsThrown() {
  EXAMPLE.add("1,2");
  Assert.assertTrue(true);
}

@Test(expected = RuntimeException.class)
public final void whenNonNumberIsUsedThenExceptionIsThrown() {
  EXAMPLE.add("1,X");
}

@Test
public final void whenEmptyStringIsUsedThenReturnValueIs0() {
  Assert.assertEquals(0, EXAMPLE.add(""));
}

@Test
public final void whenOneNumberIsUsedThenReturnValueIsThatSameNumber() {
  Assert.assertEquals(3, EXAMPLE.add("3"));
}

@Test
public final void whenTwoNumbersAreUsedThenReturnValueIsTheirSum() {
  Assert.assertEquals(3+6, EXAMPLE.add("3,6"));
}

Implementation:

public int add(final String numbers) {
  int returnValue = 0;
  String[] numbersArray = numbers.split(",");
  if (numbersArray.length > 2) {
    throw new RuntimeException("Up to 2 numbers separated by comma (,) are allowed");
  }
  for (String number : numbersArray) {
    if (!number.trim().isEmpty()) { // After refactoring
      returnValue += Integer.parseInt(number);
    }
  }
  return returnValue;
}

Certainly seems like a good suite of tests that covers all functionality. But how well does it stand up to Mutation-Driven-Testing? In the real world, I would inject bugs one at a time, and run the tests after each injection. But for the sake of conciseness, let’s just inject all the relevant bugs at once.

Mutation 1: Empty vs Blank

if (!number.trim().isEmpty())

This is a bit of a freebie, but worth pointing out. What if we just removed the trim() call from our implementation. Certainly seems like a plausible oversight.

if (!number.isEmpty())

Mutation 2: Return 0 for Empty String

if (!number.trim().isEmpty()) { // After refactoring
  returnValue += Integer.parseInt(number);
}

The requirement says to return 0 for an empty string. Presumably, based on the author’s implementation, this means that any empty sub-string should be treated as 0, whereas previous non-empty sub-strings should still be summed. But what if the implementation does something different and returns 0 as soon as it sees any empty string?

if (number.trim().isEmpty()) { return 0; }
returnValue += Double.parseDouble(number);

Mutation 3: Three Inputs are not allowed

if (numbersArray.length > 2) {
  throw new RuntimeException("Up to 2 numbers separated by comma (,) are allowed");
}

The requirement says that the method can take “0, 1 or 2 numbers.” So…. we should check for 3 numbers and throw an exception?

This is admittedly a pretty foolish bug, but never underestimate how creative fools can be.

if (numbersArray.length == 3) {
  throw new RuntimeException("Up to 2 numbers separated by comma (,) are allowed");
}

Mutation 4: Double vs Int

returnValue += Integer.parseInt(number);

When building a “string -> numeric” calculator, with an integer final result, there are many different ways of implementing string conversions:

  1. Convert string to int, perform int operations, return an int result. Throw an exception if your input string is not an int
  2. Convert string to double, cast the double to an int, perform int operations, return an int result
  3. Convert string to double, perform double operations, cast the final result to an int and return it

The above 3 approaches all produce completely different outputs when given an input like "1.5, 1.5". In the example, the author had implemented option #1. Let’s assume that is indeed the desired behavior. But what if he had mistakenly implemented option #3?

returnValue += Double.parseDouble(number);

Putting It All Together

In practice, we would only inject a single bug at a time. But for the purposes of conciseness, let’s combine all of them, which leaves us with the following mess:

public int add(final String numbers) {
  double returnValue = 0;
  String[] numbersArray = numbers.split(",");
  if (numbersArray.length == 3) {
    throw new RuntimeException("Up to 2 numbers separated by comma (,) are allowed");
  }
  for (String number : numbersArray) {
    if (number.isEmpty()) { return 0; }
    returnValue += Double.parseDouble(number);
  }
  return (int) returnValue;
}

Amazingly, not a single test has failed! Every single test written by the author is still green, despite us injecting plausible bugs in every other line. Demonstrating that our test suite has the following holes:

  1. It is not testing for blank inputs
  2. It is not testing for empty/blank substrings where a number is supposed to be
  3. It is not testing for arbitrary numbers of inputs
  4. It is not testing for numeric non-integer inputs

And once you’ve identified the above holes, you can start plugging them by adding more tests. Tests that are now failing, and start passing once you revert all the injected bugs:

@Test(expected = NumberFormatException.class)
public void doubleInputProvided_shouldThrowException() {
  EXAMPLE.add("1.5,1.5");
}

@Test
public void blankString_shouldReturn0() {
  Assert.assertEquals(0, EXAMPLE.add(" "));
}

@Test
public void emptyStringAfterNumbers_shouldIgnoreIt() {
  Assert.assertEquals(1, EXAMPLE.add("1, "));
}

@Test
public void arbitrarilyManyNumbersProvided_shouldThrowException() {
  StringBuilder inputs = new StringBuilder("3,4");
  for (int i=0; i<10; i++) {
    inputs.append("," + ThreadLocalRandom.current().nextInt());
    try {
      int result = EXAMPLE.add(inputs.toString());
      Assert.fail("No exception thrown. Got result: " + result + ", for input: " + inputs.toString());
    } catch (RuntimeException e) {
      Assert.assertEquals("Up to 2 numbers separated by comma (,) are allowed", e.getMessage());
    }
  }
}

Let me be clear – the above tests are certainly not perfect. With further iterations of mutation testing, you can identify even more coverage holes that are missed by the above tests. Also, you really should be using far more sophisticated testing techniques (discussed here), in order to enhance your test coverage in a concise manner.

But at least this process helps us better understand where the coverage holes are, and get us much closer towards eliminating the most glaring ones.

But Is It Worth It?

Admittedly, going through the above process will take additional effort and time. And the end result will be a much more verbose suite of tests, many of which would seem redundant to the untrained eye. Is this really worth it? 

As always, it depends on your priorities. If you’re building a quick-and-dirty prototype and don’t mind smaller corner-case bugs leaking through, you’re probably fine. But if production bugs scare you, you should absolutely invest time and effort into enhancing your test suite. A rock solid test suite with minimal coverage holes, is the best defense against production bugs. And in the long run, it will actually increase your development velocity, by allowing people to refactor safely and deploy changes quickly, without spending gobs of time on manual testing.

People often talk about TDD as though it’s a silver bullet that “solves” testing. Clearly, this is not the case. Perhaps if you’re Jeff Dean or Sanjay Ghemawat, you can write a perfect test suite purely through reason alone. But for the rest of us mortals, the best way to identify and fix coverage holes in our test suite, is by empirically putting it to the test.

3 thoughts on “Mutation Driven Testing – When TDD Just Isn’t Good Enough

Leave a comment