This is yet another paper I wrote for college. This was for my CS251 class, Computer organization and systems. The majority of the information in this paper came from PDF files located on Intel's website.

Technology has always moved at a quick pace. While the focus of this advancement is usually the speed of the processor itself, Intel, in joint with Hewlett-Packard, has developed a 64 bit architecture and a powerful chip to run on it. Touted as the most significant architectural advancement since 1985, Intel's Itanium, and the IA-64 architecture, are able to handle much more than it's IA-32 counterpart. IA-64 tries to solve several problems that have arisen due to technological advancement and architectural stagnation, namely a larger and improved addressing system, predication, speculation, bundled instructions, support for multiple processors and Explicit parallelism to name a few.

The Itanium takes advantage of EPIC, Explicitly Parallel Instruction Computing, design concepts for a closer relation between software and hardware. EPIC was designed by Hewlett-Packard in order to avoid a possible leveling off of processor speed due to overly complex superscalar ILP (Instruction-level parallelism) processors. EPIC was designed with three ideals in mind, that the compiler plays a key role in designing a plan of execution (POE), the architecture contains features that assist the compiler in exploitation of statistical ILP and that the architecture provide communication mechanisms so that the compiler can let the hardware know it's POE. EPIC is an evolution of VLIW (Very Long Instruction Word) processors. In VLIW architectures, the compiler "Identifies parallelism in the program and communicates it to the hardware by specifying which operations are independent of one another." This allows the hardware to know, without any further checking, what operations it can start executing in the same cycle. EPIC leads to a better performance overall due to several changes in design style. It enables the software to exploit all compile-time information and deliver this information to the hardware quickly and efficiently. It addresses and provides solutions for several performance bottlenecks found in modern computers, such as memory latency, memory address disambiguation and control flow dependencies.

The Itanium processor provides a 6-wide 10-stage deep pipeline, running at 733 at 800 MHz providing abundant resources exploiting ILP as well as high frequency, minimizing the latency between each instruction. In addition, each processor provides hardware for several execution units, specifically 4 integer ALU’s, 4 multimedia ALU’s, 2 extended precision FP units, 2 additional single precision FP units, 2 load/store units and three branch units. All this hardware allows the Itanium to fetch, issue, execute and retire six instructions each clock. This leads to many more operations being executed per cycle. Each processor function is divided into five groups, instruction processing, execution, control, memory subsystem and IA-32 execution.

When processing through programs, one issue that always hurts performance are branches and jumps. Branches ruin the flow of a program, and hurt the pipeline and it’s flow. In order to keep this at a minimum the Itanium employs a hierarchy of branch prediction structures in order to deliver high accuracy and low penalties. The branch prediction is aided by Branch Hint directives provided by the compiler, in the form of explicit BRP instructions, as well as hint specifiers on the branch instructions themselves. These directives provide target addresses, static hints on branch direction, and indicators that tell when to use dynamic prediction. The processor provides up to four progressive predictions and corrections to the fetch pointer, Special single-cycle branch predictor, adaptive two-level multi-way predictor and return predictor, and two branch address calculation and corrections.

The Itanium also utilizes predication to help curb the branching penalties. Predication allows the computer to “switch off” branches that are untrue or not going to be run. It does this by 64 one-bit predicate registers. On branches execute normally, however one that is turned off execution ceases. Predicates handle the complex control flow issues that occur when the compiler pursues overly aggressive instruction-level parallelism. Predication also reduces branches and associated mispredicts, increasing performance.

The main advantage of the IA-64 architecture is the 64 bits of addressing space. This allows the computer to accommodate 18 billion gigabytes of physical memory. This alone helps out many websites, and data storage companies. This also dissolves the need for the expensive bank switching that the Xeon, Pentium III and AMD's Athlon had to do, With Bank switching a thing of the past, a good deal of software and hardware complexity and overhead is a thing of the past as well, not to mention the hit in computer performance that bank switching made.

The Itanium processor contains a floating point unit (FPU) capable of delivering up to 6.4 Gflops and provides full support for single, double, extended and mixed mode precision computations. Throughput for single precision floating point computations is increased by parallel floating point instructions operating on pairs of 32-bit numbers. The FPU contains a 128-entry floating point register file with eight read and four write ports which can support full bandwidth operation. These eight read ports can feed two FMACs as well as two floating point stores to memory. The floating point registers are divided into two backs, an even one and an odd one, allowing increased write bandwidth into the FPU from memory.

The jump from a 32 bit to a 64 bit architecture leads to one main problem, how does one use the old programs on the new system? This problem is solved through emulation. The Itanium processor supports IA-32 applications and operating systems in either uniprocessor or multiprocessor configurations. The IA-32 engine is designed to use the registers, caches and execution resources of the EPIC machine.

The IA-64 allows for several advances for many different consumers and companies. the Itanium’s ability to handle large amounts of addressing space greatly help out E-Buisnesses and those that specialize in Data Warehousing, memory databases and Data Mining. It also helps several aspects of technical computing, specifically electrical and mechanical design modeling and simulation, video editing, 3d rendering and scientific, financial, seismic and weather analysis. The predication in the IA-64 also helps large server applications that would normally overwhelm a traditional IA-32 processor. The IA-64 also meshes more with the Java programming language. The write barrier, which is a performance bottleneck found in garbage collection, is no longer a problem as predication, as well as the increased amount of registers reduce the overhead normally associated with it. The Itanium also allows for a large working date set and intensive, loopy floating point code. Graphics applications also receive a boost as the IA-64 architecture can divide and square-root to allow speed and acuracy tradeoffs not otherwise enabled by equivalent hardware.

The Itanium chip, combined with the IA-64 architecture allow for a lot of advances in technology and in computer science. Increasing the power of a computer by an incredible amount, this new technology can be useful to almost any job or task. While the chip itself comes at a hefty price, anywhere from $1000 for a 733 MHz chip to $4000, this advancement comes rather expensively. Yet after seven years of development, Intel and Hewlett Packard have put together, what might soon be, the next standard in computing.