There are a thousand different debugging techniques. The following are those that I have gotten the most mileage out of over the years:
Dump Routines
When you are defining a data structure, take a few minutes to create a routine that dumps the contents of the structure, in human readable form, to stdout or stderr (or if debugging Windows® software - use OutputDebugString() as mentioned below). This may seem like a waste of time, but it makes tracking down dynamic structure allocation problems much easier.
Logging Tools
Once your program is released, you will inevitably find that your users report bugs. Unless you can visit each user's site you need some way to get debugging output from your program, while it is running on the user's system.
This is where logging/tracing earn their keep. Build your program from the ground up with a trace/logging facility. Design your program so that a command line argument, environment variable, or menu option will the enable trace output.
Create a set of multi-level logging routines that provide trace output from your program. Design your program so that a command line argument, environment variable, or menu option will enable trace output. The option to save trace output to a file gives you a way to get program traces from your users. Assign consistent levels to various functions. The higher the traces level the more information that gets dumped. For example:
Level 1 - Function Entrance and Exits Level 2 - Function Return Values Level 3 - Function Parameters Level 4 - Intermediate results
Feel free to assign your own scheme, but do it consistently, across all modules.
Windows® Debugging
Make use of OutputDebugString() to output debug messages. When used in combination with a multi-level logging capability, you have a powerful tool for tracking down software weirdness. Several third party tools are available which allow you to tap into the output of this routine, and most of these tools allow you to save the output to a file.
When attempting to debug multi-threaded programs, the trace output can be very helpful when trying to resolve thread synchronization issues.
Memory Allocation
Look at your malloc(), calloc(), realloc(), and free() calls. Look at each and every malloc() or calloc() call, can you easily point to its matching free() call? If not, get suspicious quick! Avoid calls to realloc() unless you really understand it (and how it behaves on your given operating system). When possible use calloc() instead of malloc(), calloc by default initializes all allocated bytes to a value of 0x0.
Further Reading
No Bugs!, by David Thielen, published by Addison-Wesley, ISBN 0-201-60890-1
Code Complete, by Steve McConnell, published by Microsoft Press, ISBN 1-55615-484-4
Windows is a registered trademark of Microsoft Corporation.
Date/Time: 2002-07-18 02:17:45 +0200 OS Version: 10.1.5 (Build 5S66) Host: localhost Command: Speed Download PID: 2495 Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000 Thread 0: #0 0x70000978 in mach_msg_overwrite_trap #1 0x7024a0e8 in SwitchContexts #2 0x702c9eec in YieldToThread #3 0x00015190 in ThreadSchedulerTimer #4 0x70196cbc in __CFRunLoopDoTimer #5 0x7017c244 in __CFRunLoopRun #6 0x701b70ec in CFRunLoopRunSpecific #7 0x7017b8cc in CFRunLoopRunInMode #8 0x79587904 in RunEventLoopInModeUntilEventArrives #9 0x7959a818 in ReceiveNextEventCommon #10 0x7974dbac in _AcquireNextEvent #11 0x795f1090 in RunApplicationEventLoop #12 0x00010d6c in main #13 0x000040ac in start #14 0x00003edc in start Thread 1: #0 0x7000497c in syscall #1 0x70557600 in BSD_waitevent #2 0x7002054c in _pthread_body Thread 2: #0 0x7003f4c8 in semaphore_wait_signal_trap #1 0x7003f2c8 in _pthread_cond_wait #2 0x705593ec in CarbonOperationThreadFunc #3 0x7002054c in _pthread_body Thread 3 Crashed: #0 0x0002abec in HTTPValidator #1 0x000345fc in StartHTTPDownload #2 0x7027ae50 in CooperativeThread #3 0x7002054c in _pthread_body PPC Thread State: srr0: 0x0002abec srr1: 0x0000d030 vrsave: 0x00000000 xer: 0x20000014 lr: 0x0002abc8 ctr: 0x70002af0 mq: 0x00000000 r0: 0x00000000 r1: 0x021974c0 r2: 0x00000000 r3: 0x01dc91b7 r4: 0x00000000 r5: 0x00000006 r6: 0x01dc91b0 r7: 0x01dc91b4 r8: 0x000001fc r9: 0x8024099c r10: 0x000bc1a0 r11: 0x84000280 r12: 0x70002af0 r13: 0x00000000 r14: 0x00000000 r15: 0x00000000 r16: 0x00000000 r17: 0x0005ab34 r18: 0x00000000 r19: 0x01dcec90 r20: 0x00000000 r21: 0x0006da51 r22: 0x0006dbf0 r23: 0x00000000 r24: 0x01dc91b7 r25: 0x00000000 r26: 0x01dcec90 r27: 0x00000000 r28: 0x000e8ff0 r29: 0x000f20a0 r30: 0x00164000 r31: 0x0002ab34
Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000
Thread 3 Crashed: #0 0x0002abec in HTTPValidator #1 0x000345fc in StartHTTPDownload #2 0x7027ae50 in CooperativeThread #3 0x7002054c in _pthread_body
If the bug you are hunting doesn't actually cause a crash then you're best bet is to follow the program and see what it is doing. If you are lucky enough to have a debugger (which you will almost certainly have these days), you will usually put breakpoints at various places in your code. When the execution of your program reaches one of these points the debugger will step in and let you examine the contents of memory and variables, and step through your code line by line. If a crash occurs, the debugger will often show you what line caused it. If you are doing the wrong thing you may see it happen. If you don't have a debugger, you will have to add statements to your code that output data you are interested in. This is of course less flexible and may also interfere with the problem you are trying to fix. Depending on your operating system, you may have other tools at your disposal, for example environment variable that cause system libraries to print extra information about what they are doing (or sometimes separate "debug" versions of these libraries).
Debugging your program may alter the way your program works. For example if the problem is caused by 2 threads trying to access a same piece of data or resource at the same time (a common problem, known as a race condition) then you interrupting the execution may stop the simultaneous access from happening. Even something as innocent as adding a printf statement can alter execution of your program in some way.
Often the problem is simply the final result of an earlier problem. Part of your program may trash some data another part of your program relies on. A crash may happen when the second part executes, but this may give you very little information on where the actual problem occurs. Even worse is when your program is trashing the stack or the heap, which will usually cause your program to crash at seemingly random points.Yet another type of problem, is what is known as a deadlock. When this happens you don't actually get a crash, the program just locks up. This happens when part A of the program is waiting for part B to complete, part B is waiting for part A to complete. As you can see, when this happens you will wait forever.
All these previous types of bugs are what I might call an implementation bug: you had the right idea when you were writing you code, you just messed up when you converted your ideas into code. Equally insidious is what i call a logical bug, i.e. a bug that is caused by a fault in your logic or design. You can step through code till you're blue in the face, it won't help much until you realise you were thinking about the task your program is doing in the wrong way. And even then you have to come up with the right way of doing it, which may involve rewriting significant amounts of code.
At some point you will probably end up looking through hundreds or thousands of lines of code trying to work out what is happening, cup of coffee in one hand, mouse in the other. You may make random changes, and keep your fingers crossed while you run the program or send it off to testers. Oh the sinking feeling when you get an email with the subject "Bug not fixed"!!
But in the end it's all worth it, the feeling of satisfaction you get when you have sent the little bugger into the other world keeps you going until the next bug is found.
I hope some of the non developers out there have gained a brief insight into what we are actually doing staring at our screens at 2 am and would like to finish with a few words of advice if you ever submit a bug report:
Help a geek today, send in nice bug reports !
printable version chaos
Everything2 Help