REM State

06 Jan

Squashing Bugs: Intro and Repros

Introduction

So, you have encountered erroneous behavior in some system — but you just can’t seem to fix it. You have stared at your code until your eyes are dry and itchy, and just cannot see what could possibly be going wrong in that algorithm. Or maybe a configuration file clearly indicates that things should be working one way, but things are behaving in a totally different way in real life. What to do?

Discovering a discrepancy between expectations and actuality can be a difficult job (especially for developers and testers who are not really users). The actual fixes for a lot of bugs tend to be small; with a full understanding of what is going on, writing bugfix patches can be very fast. However, there is often a disjoint between the undesirable behavior that the end-user sees, and the actual cause of that behavior. Depending on what type of problem is cropping up, there are several ways to go about tracking down the source of a bug so you can squash it.

This series will look at various techniques I have learned, discovered and applied over the years, along with scenarios that they are appropriate for. Note that, while some of these tricks are limited to software development, others can be applied to a wider set of troubleshooting. All of the tips, however, will be geared towards finding the cause of the bugs, as opposed to discovering the symptoms.

The “Repro”

The first step in most bug-squashing efforts should be determining a consistent set of reproduction steps (known henceforth as a “repro”). Techniques are available for finding the cause of bugs that cannot be consistently reproduced, but it does tend to make life more difficult. When you make a repro produce the bug with more consistency, you reduce the number of real-world variables contributing to the cause, and increase the developer’s ability to track down and fix the problem. The bottom line is that you should strive to get the clearest and most consistent repro that you can.

As a tester performing ad-hoc testing, you should keep a mental stack of procedures you have performed. Be aware of as many things as possible when you are testing: things like mouse position, order of operations, if you performed an operation twice, and even the timing between operations, can all be factors in triggering a bug. If you do come across a bug in the program, start over from the last complete task you were attempting to perform. Try to replay that task in the same manner as you performed it before. If the bug does not repro the second time, start the task again and walk through the steps more carefully — try to recall anything different, unusual, or special that you did the first time that you encountered the bug. Repeat until you can either reproduce the bug, or convince yourself that you can’t find a working repro.

When you are given repros written by others, they may be incomplete, difficult to understand, or incorrect. If the bug was filed by a professional, simply direct them to the steps for a tester above. In other cases (such as with bugs filed by users), you will need to coax more information out of them. The best way to do this tends to be to have them attempt to reproduce the bug using their own instructions. If they insist that they can regularly reproduce the problem with their own steps, then send a tester to observe the repro firsthand if possible. If not, verify the consistency with them. If a user can consistently repro a problem, and you cannot, then there may be another issue at play.

One of the major causes of inconsistent bug repros is differing machine configurations. Be suspicious of configuration if one person can frequently reproduce the problem on their machine, but others have a hard time reproducing the problem at all. Comparing configurations is a general technique, but can be very slow. Fortunately, there are heuristic shortcuts that you can use in software testing that can greatly speed the process up.

Ideally, you can get at least two machines that can reproduce the problem, and at least one machine that can not. The reason you want two machines that can repro the bug is to help reduce the set of differences: there will undoubtedly be many things different about a single machine that can reproduce the problem, and a single machine that can not. There will also be many things that are the same between all three machines. The only things that are important to consider, though, are the things that are (a) the same between both the machines that can reproduce the problem, and (b) different from the machine that cannot reproduce the problem.

Experience will allow you to take shortcuts with this process. Specifically, there are many things that you’ll want to check out before attempting to do an exhaustive comparison between machines. The following are some common general issues:

  • Running in an “unclean” environment. Make sure that machines…
    1. Are running on the same version of the software from source control
    2. Don’t have extra bits that aren’t in source control
    3. Have rebuilt everything from source once the above two steps are taken care of
  • Running through different front-ends or processes. Examples include…
    • Attaching the debugger vs. not
    • Running test cases via an automatic test-running system vs. directly from a harness like NUnit
    • Using retail vs. debug bits
    • Running a program from its installed location vs. running it from the location it’s compiled to in the source tree
  • Missing software/dependencies.
  • Uniprocessor vs. multiprocessor machines.
  • ACLs and user permissions.

There can, of course, be many more application-specific discrepancies and special scenarios. Remember the problems that you encounter, because those same problems (or substantially similar ones) are likely to happen again in the future.

Once you have managed to get a consistent repro (or, for certain classes of bugs, an repro that will eventually work on any machine that follows the setup requirements), you should have enough information to start tracking down the root cause. Next week, I’ll cover the first technique for doing this: source control.

Table of contents for Squashing Bugs

  1. Squashing Bugs: Intro and Repros
  2. What did you Do!?
  3. Walking Through the Problem
  4. Locating Bugs Through Records

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Comment Preview:

© 2008 REM State | Entries (RSS) and Comments (RSS)

Global Positioning System Gazettewordpress logo