I did a EuroSTAR webinar last week on shipping products and talked about how shipping products using test case metrics is bad news. It reminded me of this story which I share now.
It’s very common for testers and teams to rely on a number of test case metrics and measures to work out when they are done, or to plan the work, or to measure the progress.
This can be very misleading.
A common metric I used to rely on, and see many people rely on often, is the classic “Test Case Completion” metric.
This metric is often used for planning and for measuring completion but more scarily for working out when to release a product or service.
It goes like this
Let’s say we have 10 testers. We also have 1000 test cases. With a bit of magic and maybe past history we can predict that each tester should be completing 10 test cases per day each.
So, that gives us an elapsed time of 10 days to complete all of these test cases. Right?
So now we can plan.
“It’s going to take 10 days to complete our testing”
This happens on almost every single testing project. Test cases and test completion rates are often the guiding factor for schedules and release planning.
We can also use this metric to measure progress.
On day 1 we should have completed 100 test cases. Day 5 we should have done 500 test cases.
If we don’t see these numbers trending in this way (or close to it) then we can adjust. We could *make* people work more hours, maybe achieving 15 test cases per day.
We could add more testers to the mix. Or we could even just not run some test cases.
There’s a very obvious problem with this approach. In fact, there are lots of problems yet it doesn’t stop this being the defacto way of planning testing.
One problem is that not all test cases are created equal. Some will take hours to run, some maybe even days and some a few minutes.
Another problem is that there is an assumption that the only testing that needs to be done is contained within the test case.
Another problem is that there is an assumption that testers are like robots who will perform the same each and every day. We all have bad days.
There’s also an assumption that the tester won’t find any problems and hence delay the running of a test case in order to investigate a bug.
A company once used to run a giant regression phase where all test cases would be run again on the “final” build.
They would print out all 3000+ test cases and stack them on a giant table in the office.
The expectation was that each tester would complete 10 test cases per day – this would allow them to hit the magic release marker of 100% tests run.
Here’s what happened.
At about 5:30am on the day of the regression a group of testers would arrive at the office and rifle through the test cases.
They would pick the really easy ones; the ones that took just a few minutes to run.
They would pick about 50% more than they had to complete.
They did this for two reasons.
Number 1 – they figures that they would be asked to work longer hours to complete more tests – so they already had a stash of easy ones to do whilst eating pizza.
Number 2 – even if they didn’t get asked to stay late they could excel by completing more tests than other people running up to the last few days of the phase.
A second group of testers would come in at 8 am and be left with the really hard test cases. Some of these test cases would require a days worth of setup and config just to run.
Each day the testers would mark how many tests they had done that day on a giant matrix stuck to a wall behind the manager. The first group would mark in the number 10. The second group would be lucky to register 2 or 3.
Some of the first group would surf the web in their spare time, some would help the other testers, some would do exploration, some would go home early.
All of the first group would game the system for a variety of reasons. Yet all of them would be doing what was asked of them according to a simple metric like test case completion.
Not so strangely, the second group of testers would simply not run all of the steps of the test cases (or even mark entire test cases as done without running the checks) in order to try and run 10 per day. When faced with the reality of a 1 day environment build to check one thing….what would you do?
The project shipped late. And it returned to be worked on further.
The scary thing is that this behaviour happens all the time.
When simple measures like Test Case Completion are used to measure progress or to plan projects you’re already skewing the process and opening it up for gaming, abuse and a false start.
What’s the alternative?
I’ve no doubt there are many alternatives to this problem and no system or measure is exempt from being skewed, gamed or misused.
My suggestion would be to move your testing to be nearer the code by pushing for more behaviour driven testing and unit testing which drives out the design, the code and some of the behavioral tests. And then to deliver in to your test environments as soon as possible. If this process works it means you no longer need test cases (or as many of them) as the checking is automated, therefore you don’t need test case completion metrics. Your automated checking becomes a set of results and it frees you up to explore the product and find the things the test cases (or checks) would never have caught….in other words… freeing you up to do testing.
It’s obviously not a simple change (I know I’ve been there) but it is possible and small steps towards these sorts of approaches are entirely possible in almost any context. The real question comes down to how much you’re willing to experiment with testing, reporting and project planning.
No matter what your approach or your context it pays to be aware of the pitfalls of relying on test case completion metrics and to spot the wrong behaviour it drives. At least if you spot it, you may be able to make some changes and encourage the right behaviour by changing your process, or measuring something different.