Are we halfway there yet?

I got asked a question the other day about metrics.

 

“If you don’t use metrics to assess test completion, how do you know when you are half way complete?” 

 

I won’t go in to all of the details surrounding the discussion that ensued, but I thought I would share with you the two stories I use to help testers understand the potential flaws with using metrics to assess completeness and project deadlines. 


 

Story 1 – The Regression

I used to work in a large team that would have a massive regression testing phase at the end of each monster Waterfall project.

The way this was managed was by printing out every single test case we had in the “regression pack” and putting them in “feature” stacks on a long table.  

The test team would then come in to work and grab a load of test cases and blast through them. The management insisted on everyone completing around 10 tests per day. 


The deadline was therefore estimated on the number of tests we had, the number of testers we had and the fact each one could complete 10 tests per day.

Unfortunately, it never quite worked that way. Here’s what would happen:

  • Some testers would roll in to work at 6:00am to grab all of the easy test cases. 
  • Some testers would roll in to work at 9:00am and be left with the rock hard, complicated or tedious tests.
  • Some testers would complete their 50 test cases (they would grab an entire weeks worth) in one or two days and spend the rest of the week exploring or learning or surfing the web.
  • Some testers would struggle to complete more than 1 or 2 tests per day because of the complexity or setup time.
  • Some testers would find a Boat Load of bugs from exploring which would bring the whole project release in to doubt.
  • Some testers would pass tests without even running the test case. After all, a bonus was paid out to those who completed 10 per day!

The whole process was flawed because it gave testers a metric driven system to game.

The management were not overly concerned with good testing and instead craved metrics to report further up the chain. It therefore didn’t work.

The above story shows a few things:

  • Not all tests are created equal. Some are harder, more complex, more tedious or more time consuming that others. 
  • Metrics will very rarely tell you how complete you are. 
  • Regression testing by simply re-running a load of already executed test cases is a flawed idea of regression testing. Automated tests and Talented testers doing exploratory testing is better <– I’ll save that one for another post.
  • The “switched on” testers will always find a way to game the system, especially if you add incentives based on numbers alone.


Story 2 – The Fuel Tank

 I used to own a tidy little Toyota MR2 Mk1. I loved it. A Classic. 

One thing I noticed about the MR2 (and every other Toyota I owned after this) was that I would get fewer miles from the top half of the tank than the bottom half.

“How can that be a half?” I hear you shout.

Well, technically, it wasn’t. But to be precise with petrol tanks and mileage is to assume that the fuel gauge in the car is 100% accurate…and needs to be.

I now own a Seat and I get fewer miles from the bottom half of the tank.

 

Eh?

 

Well here it is. Each car does around 300 miles from a full tank.

 

When the MR2 said half full on the dash indicator I would have covered about 100 miles of the 300.

When the Seat said half full on the dash indicator I would have covered about 190 miles of the 300.

 

Yet both would do 300 miles. For those who care, this is down to the shape of the petrol tank. For some reason, in some cars, it is easier just to say you are halfway down the tank height, than to actually work out how much petrol has been used. I’m sure some cars are very accurate thought by the way…

 

5915098017_b7872162eb_o

But it doesn’t matter, because I know how many miles I can get from a tank (roughly) and I get a light on the dash indicating I have roughly 50 miles left before I run out of fuel.

They are all indicators. They are all guides for me to make a judgement.

And used in that way they are very useful indeed. 

 

And this is the same as defect and test case metrics. They are a good indicator, but in most cases should not be used as an absolute.

Metrics aren’t always evil. In fact, they can be very useful indeed.

 

But I would always suggest you think deeply about what your metrics are reporting, to whom they are being reported and about whether or not that message could be misconstrued.That way you may find that you can drop some metrics, fine tune some others or maybe start collecting a different set all together.

Check what other systems allow

I’ve been in the process of switching my broadband provider this morning. When I filled in my new providers submission form I was asked for my Twitter username, (@rob_lambert) so that they could get in touch with me whilst the broadband was being set up. Great, I thought.

 

The only problem was that the submission form would not allow anything that was not an alpha-numeric character as the Twitter username. As you can see, my Twitter username has a “_” in it. Oh dear. The form wouldn’t let me save with my very own twitter username.

 

When using/integrating with another system (in this instance, Twitter) it is absolutely crucial that you do your homework.

If Twitter allows non alpha-numeric characters then so should you. It is basic stuff. Whatever I can enter in X system, I should be able to enter in Y system. Test Number 1 in my testing book.

In fact, a good place to start would be to create a user on Twitter and see what they allow, and then check the same in your system. Or if you want to dig deeper – http://dev.twitter.com

 

Mistakes like these are very common. Sure, time pressures, lack of documentation, bad design, yada yada are all good excuses, but poor reasons. Twitter is free after all. A little bit of exploration and you could probably map out the valid/invalid boundaries and recreate the same checks for your system…….Compare and contrast. It’s a powerful test idea.