If you’re new to this newsletter, welcome 👋. My name is Zak, and I’m on a mission to discover what makes a high-value engineer. If you enjoy this content, perhaps you might consider subscribing to this newsletter. I release an article every Tuesday, and I aim to increase my content output in the future. Your subscription is a sign of your support and is greatly appreciated. Thank you. 🙏
To mandate coverage or not to mandate coverage, that is the question. It would appear to be one of the unsolved riddles in software engineering: how much testing coverage is too much coverage, and should it be mandated? On the one hand, we want to assure the quality of our code, but on the other, we don’t want to enforce arbitrary rules that force our engineers to write tests that solely satisfy the coverage threshold. Opinions are often strong on this subject, so it’s only right that I present a balanced set of proposals.
I'm going to break down three approaches to this problem. For each, I'll point out their benefits and their flaws. As we will see, there's no perfect solution, but some are certainly better than others. Let's get into it.
Option one: Mandate 100% coverage
Writing tests to achieve 100% coverage is like building a house and then hammering nails into the walls in the interest of assuring "quality". It might feel good to have 100% coverage but are your tests actually taking your code to task? Beyond a certain point, you are testing for the sake of achieving the coverage. Having said that, 100% coverage is not without its merits. Let's dig a little deeper.
The case for 100% coverage
It's important to recognise that if you're in a team that is having conversations about the efficacy of 100% coverage, then you're in a pretty good place. Sure, you might be in a world of over-testing, but better to be in that world than in the alternative. 100% coverage typically demonstrates a strong testing culture within a team. It's hard not to care about testing and to have 100% coverage at the same time so things look bright for your team.
Despite its shortcomings, 100% coverage deserves some credit. First, an enforced 100% rule does automate the process of partially validating tests. If, as an engineer, you are required to achieve the 100% threshold, then there's no way of gaming the system—the tests have got to be written.
Second, code coverage is not an insignificant metric. It's a crude measurement, but there's utility in having every single line of your codebase executed on each test run. Parts of a larger repository can fall behind or get dusty, but with 100% coverage, at least you can guarantee the execution of that code, if not the correctness of the output of executing the code.
Finally, your testing strategy should be diverse enough that such tests are not your only methods of testing. Integration testing and end-to-end testing—and in the case of the front-end, visual regression tests—all serve to probe the quality of the software we ship.
The case against 100% coverage
We've given the 100% approach the credit it is due, but let's talk about its shortcomings. I’d argue it has one major shortcoming: 100% coverage can become a crutch when both writing and reviewing tests. When this happens, the quality of our tests can take a hit. When I come to review a PR, if the CI/CD processes all pass for that PR, then I know that even after this code change, the repository still has 100% coverage. Will I be so diligent in my review of the tests? I’d like to think so, but when it’s a busy period or a high-pressure situation, I can’t be so sure.
To hammer the point home, in the front-end world, we have a category of tests called snapshot tests. Snapshot tests are a low-resolution method of "testing" the output of a rendered element. In a React application, for example, you would write a snapshot test for a given React component in your application. The output of running the test for the first time is a snapshot file, which is the rendered output of that component. After any code change, a rerun of the tests will compare the new snapshot to the old snapshot. If there's a single change, the test fails, and the engineer must either fix their code—if the change is not expected—or update the snapshot to get the tests passing. Snapshot testing can become a crutch because it typically gives high coverage by virtue of the fact that it renders the entire component. But snapshot tests are superficial—they can never achieve the same depth as the tests we can write as engineers.
Option two: Mandate <100% coverage
If mandating 100% coverage is proving onerous, then perhaps we can lower the mandate.
The case for <100% coverage
Lowering the coverage below 100% provides the engineering team with some slack when writing tests. You can forgo writing tests that only serve to satisfy the coverage. Whilst you’ve lowered the threshold, however, the team is still aspiring for high coverage without going overboard. The team maintains a positive attitude towards testing. Unfortunately, the <100% coverage has limited benefits.
The case against <100% coverage
When opting for the <100% approach, you can set the threshold to whatever your team feels appropriate. Let's say the team agrees to mandate 80% coverage—a high degree of coverage by most measurements. Let's also say the team currently has 100% coverage. Over the course of about six months, the team slowly chips away at that 100% as they forgo writing superfluous tests. One day, an engineer in the team is committing code but is prevented from pushing their changes as the coverage is now at 79.95%. The engineer needs to increase the test coverage to satisfy the 80% criterion. The engineer writes the necessary tests.
What's happened here? 80% has become the new 100%. Sure, for six months, the team successfully made use of the 20% buffer to avoid writing redundant tests, but now they must revert to writing such tests to achieve 80%. The problem was kicked down the road.
We should also acknowledge that any threshold other than zero or one hundred is arbitrary. For all their flaws, zero and one hundred are at least absolute. I'd argue that any threshold between these two extremes is crude and potentially obfuscates areas devoid of testing. For example, with our engineer above, if they need to increase the test coverage in order to satisfy the 80% rule, they could test a part of the application completely unrelated to their code change to get the coverage up. This speaks to the other fundamental problem with an arbitrary threshold: are you so sure that the <100% coverage is covering the right code? This is why I say the measurement is crude. At least with one hundred or zero, you know that all your code is covered, or none of it is.
Option three: Mandate 0% coverage—but still measure
I believe there is value in measuring test coverage. It's a helpful metric for gauging the quality of a codebase, and it reveals the areas of the code that are being executed per test run. That said, mandating coverage is problematic. As we've seen, this often has the opposite desired effect. Therefore, I propose measuring for code coverage but removing the mandate—zero coverage, if you will.
Why zero coverage?
In dropping the coverage mandate, you remove from your engineers the burden of hitting the threshold with every code change. Engineers can now write both quality code and quality tests for that code without being coerced into writing arbitrary tests. Let me be clear: you absolutely should continue to measure code coverage. Better yet, I’d recommend tracking coverage over time so that you may interrogate the data. For example, say a team is finding that they are experiencing an increase in defects raised against their code. Perhaps there’s a correlation between a reduction in coverage in one area of the codebase and the number of bugs raised. So continue to track coverage, but just don’t mandate it.
The zero coverage approach also avoids the pitfall of the <100% method, whereby we introduce an arbitrary threshold, all the while kicking the problem down the road. With zero coverage, there’s no threshold to hit—testing is at the discretion of the engineering team.
Why not zero coverage?
I said at the beginning of this article that there were no perfect solutions to this problem, and the zero coverage approach is no exception. The zero coverage method will put more pressure on the engineering team to produce quality tests. You can no longer lean on a failing pipeline to indicate that there is insufficient testing when reviewing an engineer’s code. You might consider incorporating analytics into your code pipeline that tracks the difference in coverage between code changes. This might draw your attention to a deficiently tested area in your codebase. Better yet, can the coverage statistics be localised for a particular feature? Rather than considering the code coverage over an entire repository, what about across the five modules that have been impacted by a specific change?
I’ve put forward three options for deciding whether to mandate a test coverage threshold or if a threshold is even necessary to begin with. I have no doubt there exist other brilliant ideas on this subject, and I want to hear them. Let me know below.
As always, if you read this far, I am humbled and grateful.
Onwards,
Zak