We're going on a little journey here into the foothills of real hairy-chested he-man C programming country.

Anybody who's spent much time with C knows printf(), sprintf(), and fprintf(). What those three functions have in common is that they can be given a variable number of arguments. In ANSI C, we indicate that with an ellipsis: "...". man printf on a Linux box shows us the following prototypes for these popular members of the printf family:

    int printf(const char *format, ...);
    int fprintf(FILE *stream, const char *format, ...);
    int sprintf(char *str, const char *format, ...);

The named arguments are mandatory; the ellipses indicate that they may be followed by an arbitrary number of other arguments. The other arguments can be of any type. Since this is C, there is no way for the function being called to identify what their types are: There is no built-in run-time type information in C. Furthermore, they don't even have names. Picture yourself implementing printf:

    int printf(const char *format, ...)
    {
        /* You are here. */
    }

Okay, now what? The format argument could be followed by from zero to a million other arguments (stack space permitting), of any type the language allows you to define. Where are they? It turns out that they're within easy reach after all: They're on the stack.

How do you get to the stack from where you're sitting now? That's not so hard after all: In C, things are what they say they are, and you can take their addresses with the unary & ("address-of") operator. You've got the format argument, which is on the stack. &format will give you a pointer into the stack. That pointer is enormously useful because the layout of the stack frame is knowable (and known as well, we trust). (There may be variations from platform to platform; we'll be getting to that). All the arguments to the function are lined up in a row: If you know where one is, the next one is right alongside it, and the next one after that, and so on.

This would all be easier if C data types were all the same size, but then again, it wouldn't be possible at all if C were that kind of language. The size problem is this: The stack isn't a series of pointers to disparate objects. It's the objects themselves, packed in like sardines, shoulder to shoulder. An int followed by a double followed by an int will look like this on a 32-bit Intel system, where the values are 1, 2.3, and 4 respectively:

    address   data
    -----------------------------
    0012FF60: 01000000
    0012FF64: 6666666666660240
    0012FF6C: 04000000

You'll notice two things, right off the bat: We're little-endian, and floating-point formats are mysterious. You'll also notice that the double is twice the size of the ints1. That's the kicker here.

You need to know the type of each argument on the stack, so that you can know how big it is. Otherwise, you'll be unable to make any use of it, and you'll never be able to locate the next one either. There is no good solution to this problem: You can't call sizeof on a dereferenced void pointer, because that information just isn't there after the program is compiled. There's also no way to know how many arguments there are. You need some help here. That's why the printf family uses a format string: The number and type of the format specifiers in that string tell the function what to look for on the stack. This is considered unsafe because it depends on the programmer not making a mistake. Some compilers (MSVC++ is one) know about the printf family, and they'll make a valiant effort to warn you when the arguments don't match the format string. That's not much help, really, because it doesn't get you anywhere with other functions that have a variable number of arguments. For that reason, C++ programmers are encouraged to avoid this stuff (but most of us can't let go).

Now we know, essentially, how to write a function which takes a variable number of arguments, with varying types: You get a pointer into the stack and you do some pointer arithmetic. You need several things to make it all work:

  1. A pointer into the stack: Therefore, the first argument must be named, so you can take its address.
  2. The type of each argument: You need this in order to use the value of each argument, and also to determine its size so you can find the argument that follows it (of course, you can't say anything meaningful about the value without knowing the size, either).
  3. The number of arguments: How else would you know when to stop?

Enough talk. Let's do it. Here's the function we used to generate that little stack map above:

    void argtest( int start, ... )
    {
        unsigned char * arg = (unsigned char *)&start;

        printhex( arg, sizeof( int ) );

        /* +=?! see note 2 */
        arg += sizeof( int );

        printhex( arg, sizeof( double ) );

        arg += sizeof( double );

        printhex( arg, sizeof( int ) );
    }

You can see that we're just assuming that somebody's passing us an int, a double, and another int. You'd never do that in real life, but this is just a quickie to demonstrate something.

What we've got here works fine. We could buzz through a printf format string, pulling an appropriate number of bytes off the stack for each format specifier and incrementing the pointer accordingly. The problem is that life isn't really that simple. This is not portable code, because we're wiggling our fingers around in something that is not defined by the Standard. The world is full of different compilers, operating systems, and processor architectures. There are no safe assumptions to make2. This code works as intended on both of the computers that sit on our desk, but so what? There are many other computers out there. Finally, we're in sight of the real subject here.

stdarg.h is a header file which is part of the ANSI C standard. It defines an interface for what we did above, but using macros which hide the details. Here's how it works. We'll write a little function that calculates the sum of a series of integers. We keep track of how many arguments by asking the programmer to tell us; we keep track of their types by assuming blindly that they're all ints.

    int sum( int count, ... )
    {
        va_list args;
        int     total = 0;

        va_start( args, count );

        /* Do something useless to demonstrate va_arg() */
        while ( --count >= 0 ) {
            total += va_arg( args, int );
        }

        va_end( args );

        return total;
    }

va_list is data type: In some way, it represents a pointer into the stack. Microsoft's implementation defines it as a char * for most processors, but for the DEC Alpha it's a struct. Remember that unsigned char * in the non-portable example above? Well, it looks like you need more than that to do the job on an Alpha.

va_start initializes a va_list. It's called to set up your stack pointer. In our non-portable example far above, we did the same by assigning &start; to that unsigned char *.

va_arg does two things: It returns the value of the next argument, and it also advances the va_list pointer on to the next argument. It can do both of those things because you're giving it the presumed type of the current argument ("presumed" because the function being called can never know for sure). This is why it absolutely must be a macro: Data types in C aren't objects; they're just instructions to the compiler. At runtime in C, there is no such thing as a data type. Therefore, you can't pass a data type to a function. You can only "pass" it to a macro, which is expanded before the compiler even sees the code.

va_end cleans up whatever needs to be cleaned up, which is usually nothing. You never know, though: Maybe some strange implemention somewhere allocates something dynamically in va_start(). It's best to be safe and play by the rules.

Here's what the macros expand to, when we tell the compiler (MSVC 6) to preprocess our sum() function to a file. We broke up some of the longer lines to make it more readable.

    int sum( int count, ... )
    {
        va_list args;
        int     total = 0;

        ( args = (va_list)&count 
            + ( (sizeof(count) + sizeof(int) - 1) & ~(sizeof(int) - 1) ) );
        
        while ( --count >= 0 ) {
            total += ( *(int *)((args += ( (sizeof(int) + sizeof(int) - 1) & 
                ~(sizeof(int) - 1) )) - ( (sizeof(int) + sizeof(int) - 1) & 
                ~(sizeof(int) - 1) )) );
        }

        ( args = (va_list)0 );

        return total;
    }

All that monstrous sizeof() gibberish is from a macro called _INTSIZEOF(), defined in Microsoft's stdarg.h. I'm not going to try to figure out what the point is: ariels or JayBonci (or somebody) will eventually read this writeup and anoint me with the clue-stick anyway. The real point is, look at all that nonsense! And it's completely different on another platform: GCC 2.9.6 expands them to __builtin_stdarg_start(), __builtin_va_arg(), and so on. I have no idea what goes on when that gets to the compiler. And again, on an Alpha or PowerPC you'd have something radically different. Alignment is going to be an issue on some platforms and not on others. On some platforms, chars go onto the stack as ints, so that's got to be a special case: Our argtest() function works identically in Windows when we change the type of the first argument to char, but the same change breaks it completely in Linux. That breakage is some serious fun when you're playing around and investigating things, but play time is play time. You don't want to waste your life away trying to write portable code to do this stuff. Use the macros.




1 There's another interesting point, but it's not quite on-topic. This is part of the code we used to generate that output:

    struct foo {
        double  d;
        int     i;
    };

    struct foo  args;

    args.d = 2.3;
    args.i = 4;

    //  argtest( int, ... ) assumes int, double, int
    argtest( 1, args );
    argtest( 1, (double)2.3, 4 );

We only gave you the result of the first call to argtest(), because both calls produce identical output (and the whole thing produced identical output with MSVC on NT4, and with GCC on Red Hat Linux, both Intel). In C, when you push a struct onto the stack, the whole thing just gets slapped in there, byte for byte. In C, furthermore, a struct is exactly what it claims to be: In memory, struct foo is a double followed by an int, and nothing else. Pushing a struct foo instance onto the stack produces results identical in every way to pushing an int and a double on there in the same order. The radical simplicity of C data types is a stumbling block for some young people when they first learn the language. We've known a few who never did get the picture.

2 The usual calling convention with C specifies that the arguments are pushed onto the stack from right to left. This means that for arguments a, b, and c, they'll appear on the stack in reverse order: c, b, a, if you're travelling from the bottom of the stack to the top.

Interestingly, the x86 architecture grows the stack downwards. That means that even though we're pushing them onto a stack, they'll still appear in memory in the order we'd naïvely expect: Their right-to-left order in the source file corresponds to their lowest-to-highest order of their addresses in memory. That's why our code above actually works: We compiled it on two different operating systems, but they were both running on x86 boxes. The following programlet investigates stack-growth direction:

    void stackptr( int arg )
    {
        /* print any old pointer into the stack frame for this call */
        printf( "%p: %d\n", &arg, arg );

        if ( --arg > 0 )
            stackptr( arg );    /* recurse */
    }

    main()
    {
        stackptr( 3 );
    }

d:\tmp stack2
0012FF80: 3
0012FF78: 2
0012FF70: 1

$ ./stack2
0xbffff9d0: 3
0xbffff9b0: 2
0xbffff990: 1

The macros in stdarg.h let us ignore these weird issues.




References:
The C Programming Language, 2nd ed., Brian W. Kernighan and Dennis M. Ritchie
VC98\include\stdarg.h, Microsoft Corporation
The MSVC++ version 6 compiler, Microsoft Corporation
/usr/lib/gcc-lib/i386-redhat-linux/2.96/include/stdarg.h, the Free Software Foundation, Inc.
The GCC version 2.96 compiler, the Free Software Foundation, Inc.
ocelotbob's excellent stack frame writeup right here on E2.



This writeup is dedicated to our beloved printf(): Her danger only adds to her allure.

Log in or register to write something here or to contact authors.