Root cause analysis of bugs – 5 Why’s

Posted on

Introduction

Following my last blog post about “Why you should not rely on automated tests only” I would like to write about a technique to get down to the root cause of software bugs.

Although we have Continuous Integration and a manual checklist in place we still find unexpected or wrong behaviour in our application by exploratory testing and early user feedback.

Additionally to fixing these bugs we also want to understand why the bugs occur in the system in first place to prevent them from happening again.

By doing this analysis we should get a better understanding of how we currently work, why we introduce certain bugs and how we can possibly improve the way we work in future.

In our case we decided to do a root cause analysis by asking the 5 Whys?

How does it work?

First of all we took the bug cards (we use a physical board) and grouped them by areas like front-end, back-end and 3rd party integration and ordered them by how long it took to fix them. Having this done we looked at the individual bugs and asked the 5 Why’s (or less) to get to the root cause of the issue.

Example of a bug analysis

“The permissions from the back-end system where not applied to a front-end element”
  1. Why were the permissions not applied?
    Because the code was not waiting for the permissions to come through?
  2. Why wasn’t the code waiting for the permissions to come through?
    Because the way we implemented it was working fine and all tests were passing.
  3. Why did we have the problem although the tests passed?
    Because we did not consider a network latency?

In this case we got down to the root cause after 3 questions. All the tests passed but we did not consider network latency which was actually a problem of our client’s network but not in the network we developed and tested in. As an action we agreed on setting up WANEM to simulate network latency and from now on we will perform tests against this environment too.

Conclusion

You can not always avoid software bugs for whatever reason. However, if a bug occurs in the system the worst thing to do is fixing it without understanding it. If you only fix bugs the same mistakes will happen over and over again and you won’t avoid them. So it is better to learn from your mistakes, understand the mistakes and come up with some actions to prevent these issues in future.

And by the way – Testing is never the root cause of a software bug. Some might blame a software tester for not finding a bug. However they have not introduced it in the code – normally not. There are a lot of things to consider like missing or unclear requirements, wrong assumptions etc.

If you can’t avoid it in first instance then at least try to learn from it.

Leave a Reply

Your email address will not be published. Required fields are marked *

*