How to calculate the effectiveness of testing and testers

A perennial question I get asked is how do you measure the effectiveness of a tester?

I respond with “Why do you want to measure the effectiveness of the testers in your team?”

I do this to try and understand the motives behind why measures of individual’s performance are important to managers (as it’s typically managers that ask me that).

It is indeed important to know how someone performs against their job/objectives, but the motives behind working this out are important. Is it to improve the person, provide training and support, or simply to measure it for budgetary reasons.

Don’t get me wrong, I like measures. We measure all sorts of things here, but we’re careful about the conclusions we take from the measures.

The typical response I get to my question above is that managers (especially test managers) are trying to justify to those holding the budget the value a tester adds (or doesn’t add) to the team.

They are often trying to measure two things*

Firstly, whether Tester 1 and Tester 2 add the same value as each other. I.e. Can we switch them around and still fulfill the business objectives? Is one better than the other?

Secondly, they are trying to measure whether Tester 1, or Tester 2 are even adding value to the team. (what would happen if we took them away?)

This often leads managers to measure crazy numbers like test case completion, or defect detection rates as a way of measuring an individual’s performance.

As crazy as this sounds it’s incredibly common. Managers are after simple metrics to inform decisions about individuals.

I can see why. In fact, I know one blazingly obvious reason why this is so prevalent — it’s because managing people is really hard!

Managing people and understanding the value they add takes time and, in software development at least, cannot be measured by a single number (despite what many managers believe).

Managing people takes patience. It’s about building relationships. This is hard. Good management reveals truths about yourself as a manager that can be painful to accept.

So managers dig around for metrics to work out an individuals performance. Relying on a number is easier than relying a wide range of team numbers (naturally influenced and affected by more than an individual), your own observations and the feedback from others.

The most common measure I hear back from test managers when it comes to measuring people’s performance is the Defect Detection Rate.

It’s an interesting measure in its own right – interesting as in “does it actually tell you about the quality of the process and testing?”

It’s especially interesting when you start to use this measure as a way of measuring an INDIVIDUAL’S performance. Can a single tester (or any other team member) control how many bugs are in the product and the rate at which the team flush them out?

Someone the other day emailed me a massive calculation that she uses to measure her tester’s performance. The calculation was a combination of test case completion rates, the number of bugs detected, the period the bugs were found in, the speed of bug resolution and the number of tests created. Bizarre. It was so complicated and so open to gaming/misuse/inaccuracy/pointlessness that she floored me.

How astounding that a person’s future and career progression is literally put down to a calculation as bad as this. But that’s how it is in some companies.

It’s a common story to hear of testers (and their future prosperity) controlled by bug counts and test case completion rates. These measures are often taken within a business system that wouldn’t allow good testing to happen anyway. The system often doesn’t even support the goals and business objectives.

  • What about the team’s results?
  • What about the business value the software adds?
  • What about revenue generated by the value the software adds?
  • What about the teams ability to solve problems?
  • What about the cycle time of work?
  • What about the feedback from the customer?
  • What about the person’s ability to grow and improve?
  • What about the feelings, feedback and emotional reactions of others to this person?

Well, — these other ways are often hard to measure and don’t provide a single answer on their own. Combined together though and the above tell a compelling story….  that’s powerful stuff, but it’s hard for managers to gather this.

So here it is.

I provide you with a calculation to rule all calculations. A calculation that renders all other calculations of team/person performance redundant.

Trust me — use this calculation and it will give you a number.

This number is the right number.

It’s the most important number you have.

It’s the only number you need.

It’s the most complete number possible.


The only calculation you need

I will add this though:

PLEASE don’t use this number to make a decision about someone’s career and future.

Instead, do the work. Build the relationship. Understand the person.

Manage people, not just numbers.


Why not work out your number and leave it in the comments.


* Managers are often trying to measure many other things, and sometimes fall in to the trap of using a single measure to inform a bewildering array of decisions

27 thoughts on “How to calculate the effectiveness of testing and testers

  1. I ran out of pens in my pack, so my effectiveness is… #DIV/0!

    I’m very thankful to not be judged by that. And more so that we chase good software above counting subjectivity, although I’ve seen the result of bad choices from bad statistics calculated from bad data and to be fair it does seem to give people comfort in a self-deluding, dogmatic, homoeopathy kind of way.

    1. Absolutely. It’s important to be able to rely on measures but hopefully, these measures are not being used to assess individual performance. If you ever find me doing that…call me on it 🙂

  2. great post as always Rob. I see this myself so often and your spin spin on it has helped me in maybe articulating the message in a different way that might help. Interested to know what your view is on efficiency. For me it falls in to the same hole but interested to see if you think it’s any different to effectiveness

    1. Thanks Darren.

      I think effectiveness and efficiency are two different things. It’s entirely possible to be effective at achieving a result but inefficient in how the result is obtained. It’s entirely possible to be efficient at release software too, but missing the goal and result and thereby being ineffective. You could argue the two are interlinked more closely – I think they probably are.

      However, I tend to look at solving the effectiveness problem first before addressing the efficiency.

      For example, one of the core ways we approach process improvement is to fix the problem first. Once we fix the problem and improve the process (achieve effectiveness) we then start work on making that process more efficient. Improve the process first and then make it smoother, quicker, less problematic (whatever your measure of efficient is) later.

      In my experience when I’ve tried to address both at the same time I tend to fail to achieve the results we want.


    1. lol.

      I guess leaving just a number is considered a spam comment.

      Great number by the way. Very effective 🙂

      1. Well, mines came back at -0.014… (my pens have gone missing and I borrowed another team’s pens and they also went missing too – does that constitute to dividing by a negative number?)

        Great post Rob! It reminds me of an interview that I conducted a while ago, where the interviewee gave some interesting answers in a small test that didn’t meet the “expected answers” written by the guy that created the test (which meant that the guy actually technically failed the test). But I pushed ahead to conduct the interview and I found that the guy was perfect for the job and had a good mindset and great personality that would have fitted well in the team.

        I’m glad that we didn’t concentrate on the numbers and discount him straight away, like I imagine some companies might have done. So I think this doesn’t just apply with effectiveness in the job, but it applies with effectiveness with everything in general!

        1. Thanks Dan.

          Sounds like the pragmatic view of the candidate prevailed over the number scoring – that’s great, if only more hiring teams thought that way 🙂

          Thanks for taking the time to comment, and yes, I do think you are dividing by a negative number. Poor score too. 🙂

  3. I always get amused by the people that bring up numbers like “there are x number of bugs per 100 lines of code”. Numbers need context, and testing needs it as well. Some bugs you stumble into, they’re obvious or easy to find. Others require some thinking and clever design. Is the tester that finds 100 of the obvious ones considered better than the one that finds 5 obscure ones?

    The way we’re measured by others tell the story of the one doing the measuring, not the one being measured. People use what they can grasp (even if they do it wrong) and for the most part we feel comfortable with numbers. They’re “easy” to read, in fact extremely easy when we fail to apply context to them. They’re convenient to report to business people as we think they need no translation, so I can see why we rely on certain metrics.

    We’re humans, we love tangible things.

    Great post Rob.

    1. Carlos – you summed it up perfectly – thanks for sharing that comment and thanks for taking the time to read the post and respond.

  4. Great article rob, and I agree, using a bug count for seeing how good a tester is isn’t much use. In my workplace, using bug count as a measure, I would say that a tester would become less effective the longer they are in the team.

    I work closely with our devs and they pick up how I test, what I test and so over time, the usual suspect are all whittled out and rarely arise (still get checked though!) and so over time, the bug count would lower and I would become less effective 🙁

    Though when a graduate joins the company, my effectiveness gets much better ;P

    1. Hi Mark,

      Thanks for taking the time to leave a comment. Great to hear how you work and that the team whittle out the repeat offenders. Great approach that works.

      Thanks again

  5. When I saw the graphic, I started laughing so hard, I snorted my coffee out.

    Awesome, exceedingly timely, valid points. THANK YOU!

    1. Always happy to have made someone laugh. Sorry about the coffee 🙂

      Thanks for commenting

  6. The number of bugs/defects found does not show the effectiveness of the tester, it show the quality / ability of the development team.
    The only way to evaluate the tester is to count the number of bugs / defects found by the user in production that were not found by the tester in the test environment.

    1. Hi Gerry,

      Thanks for commenting. I’m intrigued as to whether or not the number of bugs in live would be a good evaluation of the tester. There are a lot of factors at play when a team build software including whether or not they are even building the right thing. I’m not a fan of measuring a testers effectiveness by a single number alone. I’d be interested to hear more about whether you’ve got that measure of live bugs working well for you?


  7. Number of pens aside, what other numbers have you seen, Rob? Are there actually any numbers you find useful in measuring not just “tester” effectiveness, but also *test* and *test suite* effectiveness? Do you use some sort of coverage matrix, or Some kind of relative measures? What works for you?

    1. Hi Greg,

      We measure a lot of stuff, sometimes stuff that we don’t even know how useful it will be. But we also, by choosing to measure somethings, choose not to measure others.

      There are some fundamental values that we try to “measure by”.
      All of our measures are time series. We don’t use a single number to make a conclusion. Instead we are looking at trends and patterns in the data to show us areas of interest.
      We don’t measure individual performance by any single number at all. We measure individual performance through behaviour. When I say measure I must also say that we don’t use numbers for this. We use feedback, observations and team outcomes.
      Another value we hold close to our actions is that those who are doing the work will improve the work, and that they should own the numbers. We encourage each team to measure it’s own performance using the data we have available.

      Saying that, some important measures can tell you the health of the systems we have in place:
      Cycle time of work – how long does it take for work to flow through our dev process – what can we do to improve it
      Velocity points – how many points have we achieved – do variations and anomolies point to problems (for example, mini-waterfall usually shows itself with every other sprint spiking in velocity – as the previous sprints work is closed out)
      How many releases to production?
      How many cases raised by customers?
      How many calls flowing through our call centre software?
      How many errors in the monitoring logs?
      How many people are involved in the release process?
      How many issues do our auto-checks catch?
      How many rollbacks?
      How many auto-tests, how much code churn, how many Exploratory Test charters, how many bugs reported?

      We observe behaviour and we do regular one-to-ones across the whole team so we’re quick to see when someone is struggling, or personal issues are present, or when a team is flying, or when someone is under performing, or when someone is unhappy or not as motivated as usual.

      But we don’t use any single one of these measures or observations on their own to conclude anything. They merely point at something that needs someone to “go and see for yourself” – and this is the final value we hold dear – we (or the right person) must go and see for themselves. Is there a problem?

      It’s that final value/action that gives us the best results.

      For example – the velocity is below average for a sprint….go and see (scrum master usually)…is there a problem?

      Hope the above helps Greg – thanks for taking the time to comment and ask a cracking question.

      What do you measure and is working for you?


  8. Defect detection rate can be very useful if all or most other things are equal. If Tester A is finding 100 bugs a week working on the same product on largely the same features with the same developers as Tester B, who is finding 1 bug a week, then you need to start asking questions.

    And that’s the thing. Most metrics are better off as being indicative rather than a target – a pointer to places you need to investigate. Maybe Tester B needs to modify his approach – but maybe Tester A is raising spurious issues and needs to modify his. By getting to root causes you’ll be able to help your employees develop.

    1. Hi Sean,

      Thanks for taking the time to comment.

      Absolutely agree – it can be used to point to a problem and when used sensibly it *could* be useful in the content you describe – like for like. The reality is work is rarely like this. And instead, in my experience, it’s often used across the board as an arbitrary measure. Like you say, all measures should be a pointer – I like that 🙂


  9. I agree, and yet I have found one metric that is mildly informative about the individual: time to first defect. In other words, how long after testing commences does it take the tester to report a defect.

    I would never use this as a sole means of evaluating someone’s work, that’s insane. But in my experience it does a fair job of measuring relative experience in a very broad way.

    1. Nice – sounds like an interesting measure/metric – I’d be keen to hear more about how you get on with that. It could be a great indicator of how rapidly a tester is learning the system and finding defects. I guess the big challenge is working out the value the defect has. If it takes one tester 5 minutes to find a defect and it’s a typo with minimal impact, versus another tester taking 10 minutes to find a system show-stopper. Like you said though, it’s a great indicator and not a sole measure of performance 🙂

      Thanks for sharing Blake.


Comments are closed.