When people talk about x86-64, they always seem to focus on the 64 bit address space (which is really only a 48-bit address space; the first generation of x86-64 will require that the top 16 bits of all addresses be set to 0). If they feel like discussing it more in-depth, they'll mention the fact that registers are 64 bits. However, x86-64 has a lot of advantages beyond that. In particular, x86 is the most register starved architecture still in use: 7 registers, if you disable debugging and don't use shared libraries (5 if you allow debugging and use PIC). Many integer-heavy applications will spend 50% or more of their time just doing moves between registers and memory. Adding 8 new integer registers reduces register pressure immensely, and additionally makes things easier for the compiler.
In fact, 64 bit registers don't gain you much with most applications. There are a few applications that can use them effectively, like crypto, some media-based programs, operating systems, and graphics. However, these same applications are the ones that use up a lot of CPU time, and so doing anything to reduce that overhead can be a huge win. Personally, I can't wait to get ahold of a 64x64->128 multiply instruction. But that's just me.
Another advantage, one that could end up being the biggest performance enhancement of all, is the new ABI. The x86 ABI specifies that all arguments to functions are passed on the stack, so each time one is called, the calling function has to push them on, and the receiving function has to pop them off. This can add up to a lot of overhead, especially when the function turns around and calls another function with the argument without actually doing anything with it itself. So the x86-64 ABI specifies that most arguments (up to 6 integer arguments and 16 (!) SSE arguments) are passed via registers.
Consider this C-like pseudo-code:
int foo(int a, int b)
int c = a + b;
return bar(c, b);
int bar(int c, int b)
The compiler can generate code for x86-64 that is much faster than x86 in this case (note that we're not taking advantage of the 64 bit types or large address space of x86-64). There are several optimizations the compiler can do that it can't do with the standard issue x86 ABI, including:
1) It knows that c will be passes as the first argument to bar, so it allocates it directly into the register used as the first function argument. Then it can compute the sum and call bar, without worrying about pushing anything onto the stack, etc.
2) It knows that b is already in the right register for being passed as the second argument to the function, so it can leave it as is.
In terms of elegance, x86-64 is one ugly piece of work. But practically speaking, I think it will do a lot better than IA-64. The problem is that nobody has ever written a good VLIW compiler for general purpose use; up until IA-64 most VLIW machines were for media processing, so that's what the compilers focused on. On the other hand, writing a compiler for a standard looking 64-bit machine like x86-64 isn't terribly hard (or at least, it's a solved problem). The additional advantage of being able to run x86 binaries at full speed is a nice feature as well.
You can find out more about x86-64, including that status of the ports of GCC, Linux, FreeBSD, and NetBSD, and the ABI documentation, at http://x86-64.org/
Thanks to call for an e2 HTML tip