The most elusive type of problem in a C program

by neil

Fri Jul 28 2000 at 9:52:26

Some would argue that the most elusive type of problem in a C program is the fact that it's written in C.

Problems that are in some ways endemic to C programs:

Memory leaks. This has to be, by far, the subtlest and hardest to fix. ``Easy enough'', you say, ``make sure you free everything you allocate.'' That works in the simple cases. In more complicated situations, it's just not possible. The problem is often in shared data structures---in a language without GC, it becomes important to have a good ownership protocol, and that's hard to do.
Buffer overflows. I don't think I need to even go into this.
Fixed-size buffers. This is a similar problem. So you found that buffer overflow, and replaced gets(buff) with fgets(buff, 80, stdin). The only problem is, now your program silently truncates lines longer than 79 characters. Lose, lose.

Problems that aren't C-specific:

The whole ``I'll use it once and throw it away'' concept. Almost invariably, you don't throw it away. And so your quick-and-dirty hack that doesn't handle the case where lines are longer than 80 characters or the file couldn't be opened or whatever suddenly becomes an integral part of your company's business strategy.
Closely related to the previous is `worse is better'. Of course, that's what got Unix where it is today, and is quite probably better than never getting it done at all.

Solutions? There's no panacea. GC will correct many, but not all, memory leaks---and may introduce more if you forget to break links. Using a safer string or vector library will fix many buffer overflows (as well as arbitrary buffer-size limits), but there are many cases where you need, for whatever reason, to muck around with pointers or C arrays. As for the non-C-specific problems, they can only be fixed through brainwashing. Programmers will always write bad programs, no matter what tools they are given.

I like it!

2 C!s

(idea)

by everyone

Tue Oct 31 2000 at 17:07:11

neil: as for fixed-sized buffers:

char *buf = NULL;
size_t len = 0;
int c;

while( (c = getchar()) != EOF ) { 
   buf = realloc( buf, ++len );
   buf[len-1] = c;
}
buf = realloc( buf, len+1 );
buf[len] = 0;

No loss. That's what realloc is there for. Just don't forget to free().

Speaking of realloc, one that's tricky is stale pointers with that. You realloc something, you have to re-assign each pointer to it; this can be difficult in a threaded program.

NULL pointer stuff is pretty obvious, so I won't go into that. There's also the problem of never assigning a value to a pointer, then trying to dereference it. That's pretty easy to track and fix once you find it though. There's also accidentally casting an int or some such into a pointer (ouch!), but again, that's pretty easy to track. I'd say it's definitely forgetting to free(). The rest can be fixed with a good debugger.

I like it!

(idea)

by plonk plonk

Sat Dec 09 2000 at 7:41:15

Core-dumping show stoppers are usually easy to locate with the debugger and some decent detective work. After all, you have an event that provides a built in break point for your debugging. Memory leaks are just evil to find and fix in most cases. You start with little to no clue about where the problem is. You have to step through much of the code to even start to have a hope of finding the problem.

This sort of shite is one of the reasons I use C++. It isn't perfect, and you still use pointers enough to get into trouble, but it reduces their use to the point where you spend a lot less time groveling around for that leaking memory. Got bless destructors!

I like it!

(idea)

by illusionist

Thu Jan 25 2001 at 20:29:16

So C isn't perfect. The hardest types of errors to track down are compiler errors, and trust me, they do exist.
How do you get out of the aforementioned errors? Listen up:
Syntax:

= vs. ==: Common simple mistake. Hardly even worth making more than once, but it happens all the time, especially to people who type quickly. How do you get around it? Get in your head to think like this:
```
   if(0 == myVar)
```
Get people to stick the number on the left. What happens when you put a single equals sign? "Invalid L-value", right there, in your face, in the compiler, problem squashed.

Binary:

Memory leaks: Perform memory sub allocation, or do your own memory management. Think of it like matter in chemical reactions: Memory can neither be created nor destroyed. You shouldn't have more than zero when you close down, and you shouldn't try to get rid of something that shouldn't be. Overload new (if you are a c++ junkie), or make a debugging malloc macro that writes down how much memory you are malloc-ing and add it to a global variable. If you end up with too much ( > 0)left over, get rid of it, or throw an assertion. Using a memory sub allocation library to do all of your memory allocation (like to do it in 4k chunks for performance on a paged system), would do you well, but it a little over the top for some applications.
Crashes: GDB, VS Debugger, WinDbg, these are all your friends. Set breakpoints, trace, watch your locals.
Buffer overflows: It's definately hard to spot a place where buffer overflows can happen. A lot of ways people check uncharted memory is to null it out with something first, something that would show up really well in a debugger. We used to try to come up with new and interesting ones, like DEADBEEF or BADF000D. All in all, you need to use some kind of protection on your input, and be dead careful. This is something that plagues the industry, because buffer overflows happen (it's a weakness in C, since it is compiled code, and doesn't have language support for flex buffering).

The worst errors you can run into are the compiler or API errors. For compilers, they are usually memory or optimization related, and have to do with when you have a massively complicated program you are writing. Why are they terrible? Because it's hard to prove it's the compiler. It's your code vs. theirs. API bugs are bad, because the point of most APIs is to keep the user away from the lower levels of the program. (Bummer, it died in WinExec? What the heck is that?!?).

C is as good as any other compiled language. It has it's ups and downs. When working at Microsoft, a friend of mine always complained in that he had to work with beta software, on a beta system, with a beta compiler. In those situations, every little line can send your program burning down in flames. Systems reach an entropy point, and sometimes I used to felt that we always teetered on the edge of that. If only "oh I forgot an equals sign", or "oh darn, off by one, forgot about the null terminator" was the cause of most of these problems

I like it!

1 C!

(thing)

by Blue Neon Head

Mon Mar 05 2001 at 0:32:45

There is, without a doubt, nothing as infuriating as memory management problems in C or C++ - certainly enough to make me long for Java or Scheme or some other such language with garbage collection facilities. At least in C++, the potential exists for reference counting (see the C++ FAQ for info), so that memory holes may be made to take care of itself.

One major problem I had (and still do occasionally) concerned the terminating null on strings, which I would routinely forget to allocate or properly reallocate, with disastrous results. ("Dammit, why is free() segfaulting?! There must be a bug in glibc!")

I like it!

(thing)

by Azure Monk

Thu Apr 12 2001 at 13:08:12

A note about = vs ==, a good compiler should be able to catch this one. For instance, CodeWarrior (one of the two major MacOS compilers) will warn you if you do something like if( a = 5 ) { do_stuff(); }. However, it's fine with if( (a = 5) ) { do_stuff(); }, so you won't have to wade through warning messages when you do want to assign a value.

Similarly, it will pick up a == 5; without an if, but I don't think too many people make that error.

As far as the most elusive type of problem goes, I have to concur with the general conclusion here: memory leaks. An unassigned pointer or buffer overflow crashes sooner or later (probably sooner), and that tends to give you a good idea of what is causing it and where it is. Hunting it down tends to be fairly direct, if time consuming, because you know where to look. They can be nasty as hell, but they aren't usually very elusive.

Memory leaks, on the other hand, don't give you any warning. Unless you specifically look for them by keeping track of allocated and freed memory, you won't even know they exist. Once you do determine that you're leaking memory, you really have no idea where to start looking. They aren't impossible to fix or particularly nasty, but they are elusive.

I like it!

The evolution of Microsoft Windows from the utterly useless UI tweaks perspective	Will nanotech destroy science fiction?	Java is no good for nuclear power plants	Caller ID buffer overflow
This program cannot be run in DOS mode	Obfuscated C++ Contest	Why C sucks	We're all missing the point on computer security
garbage collection	C Programming Tips and Tricks	DEADBEEF	Worse is better
Bad Programmers	buffer overflow	WARNING: Noders May Not Be What They Seem to Be	realloc
memory leak	free()	Type I Volkswagen	Dining philosophers problem
Sun keyboard	Turing Tarpit	girlfriend