Effective Code Coverage (and Metrics in General)

by Алекс Руис on August 3, 2010

This is the best definition of code metrics I have found so far1:

Code metrics is a set of software measures that provide developers better insight into the code they are developing. By taking advantage of code metrics, developers can understand which types and/or methods should be reworked or more thoroughly tested. Development teams can identify potential risks, understand the current state of a project, and track progress during software development.

The key here is, in my opinion, that the purpose of code metrics is to help developers understand where their code needs to be improved. Unfortunately, code metrics, like any tool, can be easily misused, doing more harm than good. Let’s take my favorite code metric, code coverage, as an example.

As I mentioned many times in this blog, I use code coverage tools to help me discover which parts of my code need more testing. I don’t care if my code coverage is 50, 80 or 90%. What I care about is that the most complex areas of my code are properly tested. As useful as code coverage is, it can also cause problems if misused or misunderstood. Here are some examples:

  • Management mandates to have X% of code coverage.
    This is actually not so uncommon, based on stories from friends and my personal experience years ago in consulting. Upper management, without a clue about the health of the codebase, mandates that developers should reach X% of code coverage. Period. (It gets even better when there is a deadline for reaching such number!)

    In many cases, when the codebase was not designed with testability in mind, I’ve seen developers writing a lot of test cases, without any assertions! All the tests pass, and the code coverage goal is met. The result?

    1. developers wasted time and effort without adding any improvement to the codebase
    2. management got a false impression that the code base is in a somewhat healthy state
  • Developers follow numbers blindly, without thinking about their meaning.
    It is too easy (in many cases, comfortable) for developers (including QA) to fall into this trap (I’m guilty of this too!) Common examples include:

    1. Tests suites containing only unit tests. We, developers, tend to avoid functional and integration tests because they usually run slower and need more work to write and set up. Unfortunately, unit tests only work based on developers’ assumptions, but do not test that the actual application, as a whole, works as expected in front of the user.
    2. The wrong perception that “more is better.” Probably this is normal human behavior: 50% code coverage must better than 10%, or even 0%. Actually, it all depends on the quality and types of tests. For example, having 50% of code coverage with tests that do not have any assertions is worse than having 0%, because it creates the illusion of a healthier code base. Another good example is my recent experience of having 100% code coverage…and faulty software!
    3. A single metric to rule them all. Having a high code coverage number does not imply that our test suite exercises all the possible testing scenarios: potential race conditions and all UI-interaction scenarios are good examples. Working towards achieving 100% code coverage can prevent us from seeing important areas that need to be tested. In addition, we should not focus on a single tool to determine the state of our code. Code coverage is just one of the many code metrics that we can (or must?) use to determine the health of our codebases.
  • Useless metrics reports.
    Even though we have excellent code coverage tools like Cobertura, EMMA (both open source) and Clover (commercial,) I have seen custom code coverage tools built in-house. There are many reasons for this, like NIH or lack of support for a specific language (e.g. Scala) in existing tools. What surprises me the most is the amount of time and effort put into building a custom code coverage tool that, at the end, does not allow developers to drill-down to the most basic unit: a single line of code.

    A code coverage tool that reports that method “X” in class “Y” has 10% code coverage, without providing a way to go one or more levels deeper, makes it impossible to figure out where we should add or improve our tests. In my opinion, this is not only useless, but harmful as well: pointing out a problem without offering a way to find a solution is not constructive, and it can start a blaming game within an organization.

In conclusion, code coverage (and metrics in general) not only need to provide accurate measurements in order to be useful. They also need to empower developers with a way to analyze those results to the minimum detail, to help them decide where and how a codebase could be improved. At the same time, developers need to understand that targeting a fixed number blindly is not only a waste of time and effort, it can also create unrealistic perceptions about how healthy a codebase is.

What do you think? Feedback is always welcome :)

1 MSDN. Code Metrics Values.

(Image taken from steeljam’s flickr stream under the creative commons license)

{ 3 comments… read them below or add one }

alexanderb August 4, 2010 at 12:41 am

hi alex!

thanks for your post :) this is something that I had in my mind for long time also :). I undrestand the coverage reports only as a information which areas have to be improved, what is missing, what can be removed.. that’s it) actuall not so important.

I think that tracking the percentage of coverage could be really good for new projects, then coverage metric importance and usage is declared to the team and every one follow the rules. Trying to insist of getting more coverage for legacy (bad design) code, could only give distraction or fake results.

Reply

Craig August 4, 2010 at 10:26 am

The Crap4J tool (which appears to be defunct) tied in cyclomatic complexity with code coverage requirements. This tool required that a method with a CC below 5 did not require coverage, while a method with a CC over 25 required 100% coverage. You might argue with these thresholds, but I think this approach is good in that it focuses unit testing on more complex code.

Reply

Andrew Au March 3, 2011 at 12:11 am

I am also thinking hard about code coverage being a misleading metric. Once you get to deep framework component that get called on upper stack, any upper stack test run through the component methods and does not generally mean the component is robust because upper stack are usually good citizens.

Testing should be a risk driven activity and aimed at finding bugs. Good citizen cannot help find bugs, but bad ones do.

Reply

Leave a Comment

Previous post:

Next post: