Debuging is a highly variable activity, you never know how long it will take, especialy if you must also debug you tests. Somtimes you will have one missing character which will stump you for days(Damn assignment/equality operators in pascal and C get confused).


Raven: You are arguing Ad populum.

Debugging techniques

There are a thousand different debugging techniques. The following are those that I have gotten the most mileage out of over the years:

Dump Routines

When you are defining a data structure, take a few minutes to create a routine that dumps the contents of the structure, in human readable form, to stdout or stderr (or if debugging Windows® software - use OutputDebugString() as mentioned below). This may seem like a waste of time, but it makes tracking down dynamic structure allocation problems much easier.

Logging Tools

Once your program is released, you will inevitably find that your users report bugs. Unless you can visit each user's site you need some way to get debugging output from your program, while it is running on the user's system.

This is where logging/tracing earn their keep. Build your program from the ground up with a trace/logging facility. Design your program so that a command line argument, environment variable, or menu option will the enable trace output.

Create a set of multi-level logging routines that provide trace output from your program. Design your program so that a command line argument, environment variable, or menu option will enable trace output. The option to save trace output to a file gives you a way to get program traces from your users. Assign consistent levels to various functions. The higher the traces level the more information that gets dumped. For example:

Level 1 - Function Entrance and Exits
Level 2 - Function Return Values
Level 3 - Function Parameters
Level 4 - Intermediate results

Feel free to assign your own scheme, but do it consistently, across all modules.

Windows® Debugging

Make use of OutputDebugString() to output debug messages. When used in combination with a multi-level logging capability, you have a powerful tool for tracking down software weirdness. Several third party tools are available which allow you to tap into the output of this routine, and most of these tools allow you to save the output to a file.

When attempting to debug multi-threaded programs, the trace output can be very helpful when trying to resolve thread synchronization issues.

Memory Allocation

Look at your malloc(), calloc(), realloc(), and free() calls. Look at each and every malloc() or calloc() call, can you easily point to its matching free() call? If not, get suspicious quick! Avoid calls to realloc() unless you really understand it (and how it behaves on your given operating system). When possible use calloc() instead of malloc(), calloc by default initializes all allocated bytes to a value of 0x0.

Further Reading

No Bugs!, by David Thielen, published by Addison-Wesley, ISBN 0-201-60890-1

Code Complete, by Steve McConnell, published by Microsoft Press, ISBN 1-55615-484-4

Windows is a registered trademark of Microsoft Corporation.

Debug"ing,

a. & n. from Debug, v. (hah missed that one Webster 1913 ! What ? The word didn't exist in 1913 ? You win this one Webster 1913, but you'd better watch your back from now on...)
Put simply it is the act of finding (and hopefully fixing) the bugs (ie errors) in your code. Often infuriating, but usually rewarding when you finally find and crush the little bugger.
Debugging normally starts when your program exhibits some abnormal behaviour. If you're lucky you will be able to reproduce the problem easily and get to work. If you are unlucky the program will behave fine every time you try. You are of course neglecting the influence of the alignment of the planets on your code.
Your first task is to gradually locate where in your code the problem is occurring. You may already have a fairly good idea of where the problem is thanks to things such as core dumps, Stdlog files (generated by Macsbug on versions of the Mac OS prior to Mac OS X), crash logs etc., which provide information on the state of the program if it actually crashes. Here's one from a program I've been working on :
Date/Time:  2002-07-18 02:17:45 +0200
OS Version: 10.1.5 (Build 5S66)
Host:       localhost

Command:    Speed Download
PID:        2495

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

Thread 0:
 #0   0x70000978 in mach_msg_overwrite_trap
 #1   0x7024a0e8 in SwitchContexts
 #2   0x702c9eec in YieldToThread
 #3   0x00015190 in ThreadSchedulerTimer
 #4   0x70196cbc in __CFRunLoopDoTimer
 #5   0x7017c244 in __CFRunLoopRun
 #6   0x701b70ec in CFRunLoopRunSpecific
 #7   0x7017b8cc in CFRunLoopRunInMode
 #8   0x79587904 in RunEventLoopInModeUntilEventArrives
 #9   0x7959a818 in ReceiveNextEventCommon
 #10  0x7974dbac in _AcquireNextEvent
 #11  0x795f1090 in RunApplicationEventLoop
 #12  0x00010d6c in main
 #13  0x000040ac in start
 #14  0x00003edc in start

Thread 1:
 #0   0x7000497c in syscall
 #1   0x70557600 in BSD_waitevent
 #2   0x7002054c in _pthread_body

Thread 2:
 #0   0x7003f4c8 in semaphore_wait_signal_trap
 #1   0x7003f2c8 in _pthread_cond_wait
 #2   0x705593ec in CarbonOperationThreadFunc
 #3   0x7002054c in _pthread_body

Thread 3 Crashed:
 #0   0x0002abec in HTTPValidator
 #1   0x000345fc in StartHTTPDownload
 #2   0x7027ae50 in CooperativeThread
 #3   0x7002054c in _pthread_body

PPC Thread State:
  srr0: 0x0002abec srr1: 0x0000d030                vrsave: 0x00000000
   xer: 0x20000014   lr: 0x0002abc8  ctr: 0x70002af0   mq: 0x00000000
    r0: 0x00000000   r1: 0x021974c0   r2: 0x00000000   r3: 0x01dc91b7
    r4: 0x00000000   r5: 0x00000006   r6: 0x01dc91b0   r7: 0x01dc91b4
    r8: 0x000001fc   r9: 0x8024099c  r10: 0x000bc1a0  r11: 0x84000280
   r12: 0x70002af0  r13: 0x00000000  r14: 0x00000000  r15: 0x00000000
   r16: 0x00000000  r17: 0x0005ab34  r18: 0x00000000  r19: 0x01dcec90
   r20: 0x00000000  r21: 0x0006da51  r22: 0x0006dbf0  r23: 0x00000000
   r24: 0x01dc91b7  r25: 0x00000000  r26: 0x01dcec90  r27: 0x00000000
   r28: 0x000e8ff0  r29: 0x000f20a0  r30: 0x00164000  r31: 0x0002ab34

Besides providing me with information such as date and time, this crash log tells me what caused the crash
Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000000
This tells me that my program crashed because I tried to access some memory which i wasn't allowed to (specifically at address 0). A bit further down you can see
Thread 3 Crashed:
 #0   0x0002abec in HTTPValidator
 #1   0x000345fc in StartHTTPDownload
 #2   0x7027ae50 in CooperativeThread
 #3   0x7002054c in _pthread_body
It is giving be a stack trace for each of my threads, which basically tells me the name of the function that was executing when my program crashed. As you might expect, this narrows down the problem significantly.

If the bug you are hunting doesn't actually cause a crash then you're best bet is to follow the program and see what it is doing. If you are lucky enough to have a debugger (which you will almost certainly have these days), you will usually put breakpoints at various places in your code. When the execution of your program reaches one of these points the debugger will step in and let you examine the contents of memory and variables, and step through your code line by line. If a crash occurs, the debugger will often show you what line caused it. If you are doing the wrong thing you may see it happen. If you don't have a debugger, you will have to add statements to your code that output data you are interested in. This is of course less flexible and may also interfere with the problem you are trying to fix. Depending on your operating system, you may have other tools at your disposal, for example environment variable that cause system libraries to print extra information about what they are doing (or sometimes separate "debug" versions of these libraries).

Debugging your program may alter the way your program works. For example if the problem is caused by 2 threads trying to access a same piece of data or resource at the same time (a common problem, known as a race condition) then you interrupting the execution may stop the simultaneous access from happening. Even something as innocent as adding a printf statement can alter execution of your program in some way.

Often the problem is simply the final result of an earlier problem. Part of your program may trash some data another part of your program relies on. A crash may happen when the second part executes, but this may give you very little information on where the actual problem occurs. Even worse is when your program is trashing the stack or the heap, which will usually cause your program to crash at seemingly random points.
Yet another type of problem, is what is known as a deadlock. When this happens you don't actually get a crash, the program just locks up. This happens when part A of the program is waiting for part B to complete, part B is waiting for part A to complete. As you can see, when this happens you will wait forever.

All these previous types of bugs are what I might call an implementation bug: you had the right idea when you were writing you code, you just messed up when you converted your ideas into code. Equally insidious is what i call a logical bug, i.e. a bug that is caused by a fault in your logic or design. You can step through code till you're blue in the face, it won't help much until you realise you were thinking about the task your program is doing in the wrong way. And even then you have to come up with the right way of doing it, which may involve rewriting significant amounts of code.

At some point you will probably end up looking through hundreds or thousands of lines of code trying to work out what is happening, cup of coffee in one hand, mouse in the other. You may make random changes, and keep your fingers crossed while you run the program or send it off to testers. Oh the sinking feeling when you get an email with the subject "Bug not fixed"!!

But in the end it's all worth it, the feeling of satisfaction you get when you have sent the little bugger into the other world keeps you going until the next bug is found.

I hope some of the non developers out there have gained a brief insight into what we are actually doing staring at our screens at 2 am and would like to finish with a few words of advice if you ever submit a bug report:

  • Don't just say "The program crashes": it's not very helpful
  • Do be specific and give details
  • Do try and find a scenario that causes the problem to happen reliably: fixing bugs you can't reproduce is hard
  • Do be clear and articulate about the problem
  • Do give any details you have (such as core dumps etc.)
  • Don't be abusive: it doesn't help anyone
If you're thinking "Hey I'm just a user, you're the developer, it's up to you to fix all that!" then bear in mind that the better the bug report is, the easier it will be to find and fix the bug and you will have a better product sooner.

Help a geek today, send in nice bug reports !

Log in or register to write something here or to contact authors.