• 6 hours
  • Easy

Free online content available in this course.

course.header.alt.is_video

course.header.alt.is_certifying

Got it!

Last updated on 3/4/20

Debug Perpetually Imperfect Software Using the Scientific Method

Software Is Never Perfect: Identifying the Source of Bugs

Have you ever put together a perfect plan only to find out later that the real world wasn’t interested? Perhaps you missed a key detail or didn't realize that something would be a problem until it was too late? I have. 🖐 Most people have! It’s part of being human. As the saying goes, life has a way of interfering with the best-laid plans!

The same is true about writing code. When you write a Java program, you are putting together a plan a computer can follow to solve a problem. You’ll figure out your clients' needs, interpret them into business rules, then translate them to Java. This is done using brains, hands, and keyboards; in other words, instruments of human error! Typos, skipped logic, and mistakes happen.

A bug is a mistake in your software (causing your program to do something other than what you’d expect).

Bugs need to be fixed, but you don't have to panic. Your software is an intricate web of code, ideas, and software design, so frantically clobbering a bug might break something else. A more structured approach allows you to figure out what's creating the bug and fix its root cause.

While bugs can be stressful, they are also a great way to learn something new about your software. Hunting them down is an opportunity to understand what happened, fix it, and prevent it from happening again. That’s good!

Types of Bugs

A first step is understanding bugs better so you can prioritize which ones are worth your time. There are two main types of bugs:

Failures

Now and then, a bug will disturb the flow of your application so badly that an error bubbles up in front of the users; it may cause their program to crash, or give them an incorrect result! In extreme conditions, this could result in screams.😱 You don’t want your users to scream.

Errors that directly impact users are known as failures. Whether it’s an exception or stalling application, when your user is negatively impacted, the bug is a priority. 

Past developers who were too lazy to use try  and  catch gave Java applications a bad reputation for spamming users with technical looking exceptions and long stack traces.  The information indicates a software failure. It's useful to developers, but not for users because they can't do anything about it. The lesson is to always  catch  and handle your exceptions, so your users don't have to!

Faults

Not all bugs ruin your users’ day. Faults are bugs which may exist silently, without becoming obvious to the people using your software. They cause code to behave unexpectedly, but not enough that the user is visibly inconvenienced.

For instance, an error message might be poorly worded or displayed in the wrong color. Perhaps money values are rounded to five decimal places instead of two, while users don’t care about those values in the first place.

In many cases, testing will detect such mistakes as deviations from the specification; however, these bugs occasionally slip through into production. Why does this happen? Simply because they weren’t tested for, as no one considered writing one for that particular scenario with all its nuances.

So faults are just less severe failures?

Yes. Failures are the culmination of faults. Faults snowball into an issue that prevents your application from being used - which are failures! Remember that whatever you call it, it's something your software does that it should not.

Getting Rid of Bugs With a Scientific Process: Debugging!

Fixing software bugs is like proofreading an article. You need to find each one, make corrections, and ensure they are improvements on the original. This basic process is called debugging.  To effectively debug software, you need to use a methodical approach to identify and fix each issue. Specifically, for any bug you see, you should go through the following steps:

Step 1: Observe the Bug 

Before even assuming you're dealing with a bug, make sure it's real!

Have you ever seen software fail and suddenly start working again? Perhaps your network connection was down for two minutes, or your computer was doing an update at the same time your application spontaneously restarted. You might never see that particular issue again because it was down to a glitch and convergence of unlikely factors. Perhaps it wasn't even an issue in the first place!

Step 2: Write a Repeatable Test for the Correct Behavior You Expect

To start working on a bug, you should be able to verify that it is something you can repeatunderstand, correct, and prove it has been fixed. This means you need a way of repeating the issue so you can investigate it and check its behavior. What could be a reliable way to predictably set up your software to (a) do something and (b) check its outcome? That sounds like a job for an automated test! 🙂

To do this, create a test for how your software should behave without the bug. (It will fail when first run because you have a bug!) That way, when you think you've fixed the issue, you can run the test. If it passes, you'll know that you've successfully corrected it! What's more, with a test, you'll find out if anyone ever reintroduces that bug.

Step 3: Propose a Theory for Why the Bug Occurred

When you start with a test, you can investigate what in the code is making the test fail. Through studying the code as your test fails, you will come up with one or more ideas about possible causes for what's causing the problem. Each of these ideas can be listed and investigated by testing your solution. Guess what? Those great ideas you came up with are theories!

What if I can immediately see what the error is in the code? Do I still need to write a test and come up with theories? 

Some errors will seem obvious. On seeing the error message, you may intuitively think you know what's going on. Maybe you notice that an exception was thrown from a class you'd just modified. That seems like an obvious fix for that bug. Wouldn't it be tempting to start up your IDE, dive into the code, and find the fastest way to work around the issue?

But are you sure you know what the issue is yet? Although you might suppress the symptom (the exception) for the time being, you may not have dealt with deeper causes. By not taking the time to inspect possible root causes, there is still a risk that an underlying problem will cause havoc with your program in some other way.

Additionally, working around a bug usually means that you're adding a special case to your software, which is contrary to how your software should normally behave. By special case, I mean fiddling with the logic of your program in a way that isn't consistent with the original design of the code. This will complicate your software, and you'll have to consider unusual situations whenever you make a change.

If the error is evident, you have a theory to validate, and writing the test will prove it. I know it's hard not to jump in, but you want to pinpoint why the bug was triggered and not just how to eliminate it. By starting with a theory and an automated test to check it, you can target a specific issue and have confidence you fixed it!

Step 4 : Prove or Disprove the Theory 

Now try to prove your theory by investigating the code. Your goal is to fix the failing test. With your theory, you’ll hopefully fix the bug along with a range of other similar issues. If not, loop back to Step 2 and start again with a new theory until you've found an answer.

Step 5: Resolve the Bug

Keep in mind that bugs often come in multiples. A single mistake can manifest as a range of similar bugs. There are many times when I've fixed one bug and found it closing a range of other similar ones. Again, a good way to solve this is with further tests.

How should I apply those steps to investigating a real bug? 

Imagine that you've built an e-commerce application. You receive a bug report indicating that when a user attempts to remove a single item from her cart, she ends up with one more item rather than one less. How would you approach this using the structure above? 

Let's walk through it together:

  • Step 1: Reproduce! First, you’d want to make sure it’s a real bug. There are a few ways you can reproduce it in a running application: 

  • Have the user show you what happened, which might confirm that it's real.

  • Look at any logs you have to understand what happened in the software at that time (providing that you have good logs). If not, add to yours and try again.

  • Step 2: Write a test! You may know what you need to fix at this point. Perhaps there’s a + which should be a -. Rather than racing ahead, a methodical way of ensuring that you don't make another mistake would be to write a test. You could write a failing one that expects the basket to be empty when you remove the only item inside. It should fail.

  • Step 3: Come up with some theories! Look at the code and try to figure out why the test is failing. 

  • Step 4: Test your theories! Use the failing test to reproduce the bug, and based on your theories, test different fixes until it is resolved (i.e., clear items out of the basket.) The test will stay with your code and help other developers make sure it doesn’t break it again.

  • Step 5: Resolve your bug!  You might need some additional tests if you've uncovered more bugs. 

This process looks like it could take time. What if I'm under the gun and the bug needs to be fixed ASAP?

We've all be there. But fixing something ASAP doesn't mean it doesn't need to be reproduced, understood, and tested. It has to be to stand a chance of truly fixing it.

Imagine that you decide to fix something without following the procedure above. Your first impulse may be to do manual testing (or poking around), but that can take longer than you'd think. Stop and rethink your approach. Manual testing can be slow and unreliable compared to automated tests.

If you do release a quick and dirty fix, you still have to clean up after yourself. Wouldn't you want to be sure you fixed the real issue and didn't cause another one? Would you want to find out if the bug was accidentally reintroduced in the future? I would!

Don't make the same oversight that occurred when the code was first written. Take the time to fix your bug reliably.

Living in a World of Bugs

As a software engineer, you have as much chance of eliminating bugs from your software as gardeners do from their gardens.

Bugs are a result of human error. That is, they are part of nature. We've been trying to squash them forever, but the best you can do is reduce their likelihood. Software is never perfect.

The best you can do is try to avoid them by using the following pesticides:

  • Test extensively to prove that you meet your known business and technical requirements.

  • Write software to be as robust to failure as you can by using best practices. 

  • Approach bug reports calmly and find value in them as feedback about your software.

  • Take a bug as an opportunity to understand something you missed the first time around and methodically prove that it's fixed.

In this course, we're going to focus on the last two points and learn how to use tools to become familiar with the bugs the crawl out between the lines of code!

Let's Recap!

  • Testing can reduce the likelihood of bugs but is only as good as the scenarios thought of up-front.

  • Faults are mistakes in software that may lay dormant and are not always apparent.

  • Failures are caused by faults in software and impede users from using your software. 

  • To reliably resolve a bug, use the scientific method and a series of experiments against repeatable tests:

    • Write a repeatable test for the correct behavior you expect.

    • Investigate why that test fails and come up with some theories.

    • Test theories and solutions to rectify the failing test.

    • Make the failing test pass and resolve the bug.

Additional Reading:
  • On bugs, errors, faults, and failures:  Classification of Software Requirement Errors: A Critical Review, P.K. Chaurasia, R.A. Khan, (0975 – 8887), International Journal of Computer Applications, Volume 132 – No.7, December 2015.

  • Zero Bug Policies are becoming increasingly popular; with teams recognizing the importance of striving to eliminate bugs altogether. While I share this vision, bugs will always remain part of our daily reality. However, you can always strive to reduce their likelihood through sound engineering choices. Try out these #bugszero exercises in your favorite language.

  • Check out how we used a FAIL cake and other techniques to produce high-quality code: The Agile Developer’s Handbook by Paul Flewelling.

  • A study across all GitHub projects examining different languages and their propensity towards certain types of bug: A Large Scale Study of Programming Languages and Code Quality in Github, B. Ray, D. Posnett, V. Filkov, P. Devanbu, Communications of the ACM, Volume 60 Issue 10, October 2017, pages 91-100. 

Example of certificate of achievement
Example of certificate of achievement