REM State

27 Jan

Locating Bugs Through Records

A customer calls up having a horrible problem with your software in a live environment. They aren’t privy to the internals of your application, and don’t really want to be — they just want the problem fixed. You can’t send over a dev with a debugger, or have them try several different versions to see if they can identify one that does not have the problem: both of these would be far too disruptive. The bug doesn’t even crash the software, so getting a stack trace is impossible. How on earth can we diagnose the customer’s problem, find the offending code and make a fix? The answer lies in well-implemented logging.

Logging is similar in many aspects to using the debugger — it allows you to get a feel for the internal state of the application, and to see the flow of execution. Unlike debugging, logging is entirely passive; it’s a read-only view of the application’s progress, and you cannot examine arbitrary pieces of information. If you need more information in your logs, you will have to change your logging statements and recompile your application. Also unlike debugging, it inherently allows you to save the run information for later use; you don’t have to have an exception occur or be working with the program in real-time to capture valuable information about a bug. The main benefit of logging over debugging is that logging works everywhere — both debug and retail builds will have it, and while it might not be enabled or set to the highest verbosity by default, customers have the ability to increase the detail of logging up to debug levels.

Implementing a good logging system can be surprisingly tricky, as the mechanics of logging have several subtle design considerations. If at all possible, consider using an existing logging library (log4c, nlog, etc.) instead of writing your own. Implementing a good logging system is a prerequisite for implementing good logging, which can itself be a tricky proposition. There are three main things to consider when implementing logging:

  • Performance. More logging statements and more complex string generation means more overhead to slow a program down.
  • Utility. If a nothing is logged when an unusual event happens, or that message does not contain enough information, you decrease your ability to use the logging output to debug the problem.
  • Formatting. Messages should also follow a standard format (especially messages within the same category) so that external filtration and reporting can be done on the log files.

Balancing performance and utility can be difficult, which is why many logging systems use some form of message categorization and runtime filtering logic. “Verbosity”- or “levels”-based approaches are a common way to tackle this — each message is assigned an importance (0 usually being the most important, and larger numbers being progressively less important), and the user specifies the importance threshold for logging. Any messages that fall outside the threshold will not get logged, while the messages inside the threshold will be. This is a simple solution, and sufficient for most end-user applications. However, a full categorization mechanism may be required for server applications. Categorization requires that each message be given one (or possibly more) categories. The software is then configured with which categories get logged, allowing, for example, only trace statements to be logged.

Formatting and consistency of log output can be difficult to achieve, but it is an important component for post-processing logs. Timestamps are a must for server and server-based applications; they’re less important for desktop applications, but generally don’t hurt. As a rule of thumb, log entries should always start with (or otherwise have attached to them) a date and time. Formatting beyond this is very application specific, but the general idea is that messages of the same category or type should have a predictably similar layout. If you take an info/warning/error/fatal error approach, for example, log entries should all have the severity logged after the time, but before more message-specific information. Messages logged for “should never get here” sections of the code (and other logic defect logs) should include a unique identifier — this will generally be the file name and line number that the message was logged on. Further examples could fill a post, but the idea is that similar messages should share a layout, so that if you can parse and understand one, you can parse and understand them all. Writing a thin wrapper on top of your logging library can help with this process, e.g., a LogLogicError macro in C could be used to set up an appropriate logging call, with file name, line number, and category parameters.

Ultimately, the type of application will determine how much and what type of logging you will need to implement, and in turn, how that logging is used to track down bugs. However, there are some common logging categories that are useful in most types of applications:

Argument checking
If any arguments violate the rules of the API, the event should be logged. It is beneficial to log a stack trace if possible, to help pinpoint the code that is calling the function incorrectly.
Tracing statements
Logging function entries and exits can be beneficial if you can’t generate a stack trace for the log. However, this type of logging is extremely intensive resource-wise — it takes lots of space and (relatively) lots of CPU time to trace every function entry and exit.
“Can’t get here” situations
Sometimes certain situations, like the default section of a switch block, should never be reached. If there are sections of code that should never be reached, or which make other sorts of assumptions, these assumptions should generally be verified, and logged if the assumption is not met.
Configuration decisions & state
In a lot of cases, sections of code will make configuration decisions dynamically, or load pre-determined configuration from a file. It can be useful to log when such decisions are recorded in internal structures (generally global ones).

Note that most of the above scenarios are also appropriate places for placing asserts. Your logging code should reflect this, and both logging and assertion should be performed (preferably by a single function or macro) when appropriate.

This sums up the Squashing Bugs series; you should now have a good arsenal of tools to go hunting with.

Table of contents for Squashing Bugs

  1. Squashing Bugs: Intro and Repros
  2. What did you Do!?
  3. Walking Through the Problem
  4. Locating Bugs Through Records

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Comment Preview:

© 2008 REM State | Entries (RSS) and Comments (RSS)

Global Positioning System Gazettewordpress logo