Are we halfway there yet?

I got asked a question the other day about metrics.

 

 

“If you don’t use metrics to assess test completion, how do you know when you are half way complete?” 

 

I won’t go in to all of the details surrounding the discussion that ensued, but I thought I would share with you the two stories I use to help testers understand the potential flaws with using metrics to assess completeness and project deadlines. 


 

Story 1 – The Regression

I used to work in a large team that would have a massive regression testing phase at the end of each monster Waterfall project.

The way this was managed was by printing out every single test case we had in the “regression pack” and putting them in “feature” stacks on a long table.  

The test team would then come in to work and grab a load of test cases and blast through them. The management insisted on everyone completing around 10 tests per day. 

 

The deadline was therefore estimated on the number of tests we had, the number of testers we had and the fact each one could complete 10 tests per day.

Unfortunately, it never quite worked that way. Here’s what would happen:

  • Some testers would roll in to work at 6:00am to grab all of the easy test cases. 
  • Some testers would roll in to work at 9:00am and be left with the rock hard, complicated or tedious tests.
  • Some testers would complete their 50 test cases (they would grab an entire weeks worth) in one or two days and spend the rest of the week exploring or learning or surfing the web.
  • Some testers would struggle to complete more than 1 or 2 tests per day because of the complexity or setup time.
  • Some testers would find a Boat Load of bugs from exploring which would bring the whole project release in to doubt.
  • Some testers would pass tests without even running the test case. After all, a bonus was paid out to those who completed 10 per day!

The whole process was flawed because it gave testers a metric driven system to game.

 

The management were not overly concerned with good testing and instead craved metrics to report further up the chain. It therefore didn’t work.

The above story shows a few things:

  • Not all tests are created equal. Some are harder, more complex, more tedious or more time consuming that others. 
  • Metrics will very rarely tell you how complete you are. 
  • Regression testing by simply re-running a load of already executed test cases is a flawed idea of regression testing. Automated tests and Talented testers doing exploratory testing is better <– I’ll save that one for another post.
  • The “switched on” testers will always find a way to game the system, especially if you add incentives based on numbers alone.


Story 2 – The Fuel Tank

 I used to own a tidy little Toyota MR2 Mk1. I loved it. A Classic. 

One thing I noticed about the MR2 (and every other Toyota I owned after this) was that I would get fewer miles from the top half of the tank than the bottom half.

“How can that be a half?” I hear you shout.

Well, technically, it wasn’t. But to be precise with petrol tanks and mileage is to assume that the fuel gauge in the car is 100% accurate…and needs to be.

I now own a Seat and I get fewer miles from the bottom half of the tank.

 

Eh?

 

Well here it is. Each car does around 300 miles from a full tank.

 

When the MR2 said half full on the dash indicator I would have covered about 100 miles of the 300.

When the Seat said half full on the dash indicator I would have covered about 190 miles of the 300.

 

Yet both would do 300 miles. For those who care, this is down to the shape of the petrol tank. For some reason, in some cars, it is easier just to say you are halfway down the tank height, than to actually work out how much petrol has been used. I’m sure some cars are very accurate thought by the way…

 

 

But it doesn’t matter, because I know how many miles I can get from a tank (roughly) and I get a light on the dash indicating I have roughly 50 miles left before I run out of fuel.

They are all indicators. They are all guides for me to make a judgement.

And used in that way they are very useful indeed. 

 

And this is the same as defect and test case metrics. They are a good indicator, but in most cases should not be used as an absolute.

Metrics aren’t always evil. In fact, they can be very useful indeed.

 

But I would always suggest you think deeply about what your metrics are reporting, to whom they are being reported and about whether or not that message could be misconstrued.That way you may find that you can drop some metrics, fine tune some others or maybe start collecting a different set all together.

10 thoughts on “Are we halfway there yet?

  1. I read this last week and meant to comment, but I was busy and I’ve only just got round to it.I can’t believe no-one’s commented yet. This is one of those pieces that should be drummed into the subconscious of all testers, project managers and anyone who might be considered a stakeholder and liable to utter the dreaded and dreadful words “how many test cases did the testers get through today”.If managers were targetted with getting through 20 items on their “to do” list each day they’d know it was nonsense and would just encourage people to game the system. “Task 1 – get cup of coffee. Task 2 – Check laptop hasn’t been stolen whilst fetching coffee”. That would be no different from targetting testers on getting through test cases. It’s all just the bizarre notion that one has to do lots of things, anything so long as one does lots of them.Now I’m all for slipping some easy tasks (quick wins) onto my list just to help me feel good and build up some spurious feeling of momentum till the real thing kicks in. That works fine with testing too, but it’s a purely personal matter of constructive self-deception. Trying to extend this to some sort of “objective” measure of progress that means something significant to the project as a whole is either;a- laughable naivety,b- communal delusion,c- evidence that no-one really cares whether it means anything so long as they’ve got processes, reports and metrics that can demonstrate that their collective backside is bullet-proof.The last possibility is the most worrying because the others are susceptible to reason and education. If nobody really cares whether it’s all bollocks so long as their corporate existence is justified and secure then all I can suggest is that you gather up what remains of your sanity and jump ship.Yours faithfully, Disgusted of Perth

  2. Hi James,Many thanks for commenting and I love your examples about getting coffee and then checking the laptop wasn’t stolen. Genius.You hit on some excellent points and as always, insightful stuff about communal delusion and evidence no-one cares.Great stuff and thanks for commenting.Rob..

  3. Hi Del,I must have completely missed that blog post. It’s outstanding. Thanks for sharing it with me and thanks for taking the time to comment.Rob..

  4. Rob,Nice post. Spot on. My favorite part of it was your gas tank example because, like Jennifer Aniston in her latest movie, it “plays against type;” you could have stopped, as others have, immediately after pointing out huge, gaping problems with many metrics programs. The gas tank example, though, highlights the common truth that, while metrics systems can be highly troublesome, if you’re smart about how you’re analyzing data and metrics produced by a system, you can still make some useful informed decisions. Understanding how the system generating the metrics is working (and understanding what perverse incentives are likely at play within the system) allows you to better interpret the data.Then again, as you and James point out, some metrics are so fundamentally pointless to collect that everyone involved would be better off if no one tracked them.I’ll refer to “Rob Lambert’s excellent gas tank example” in my future conversations with testers about metrics.- Justin

  5. Hi Justin,Many thanks for your comments. I’ve always found that the Gas Tank example is a really straight forward way of explaining the misinterpretation of essentially the same concept, driven by different data and different interpretations.Thanks for taking the time to comment.Rob..

Comments are closed.