McKinley is Intel's name for the currently newest version of the Itanium processor, which will be released under the name Itanium 2. It runs Intel's IA-64 Instruction Set Architecture (ISA), which is a VLIW architecture based off of EPIC. Hewlett-Packard and Intel jointly developed the entire Itanium line and much of the old Compaq Alpha Processor design team now works for Intel designing the two future Itanium processors, Madison and Deerfield.

Itanium Features

Itanium has several design features from the beginning that allow the chip to operate much quicker than traditional processors. The Explicitly Parallel Instruction Computing (EPIC) allows for speculation, predication, prefetch/branch/cache instruction hints, and register stacking. The interconnect technology allows for easy scaling for multiple processor systems. There also exists a radically increased register file, which makes it easy to resolve register conflicts.

McKinley Optimizations

Runs old IA-64 code, no recompile neccesary. The biggest improvement probably comes from the new reduced pipeline. McKinley has an 8 stage pipeline down from Itanium's 10+ stage pipeline. The Core Pipe is made up of the following stages:

CORE - | IPG | ROT | EXP | REN | REG | EXE | DET | WB  |
FPU  -                               | FP1 | FP2 | FP3 | FP4 | WB  |
L2   -                         | L2N | L2I | L2A | L2M | L2D | L2C | L2W |

The stages are defined as follows:
IPG - IP Generate, L1I Cache (6 inst) and TLB IP Generate, L1I Cache (6 inst) and TLB
ROT - Instruction Rotate and Buffer (6 inst)
EXP - Expand, Port Assignment and Routing
REN - Integer and FP Register Rename (6 inst)
REG - Integer and FP Register Rename (6 inst)
EXE - ALU Execute(6), L1D Cache and TLB ALU Execute(6)
DET - Exception Detect, Branch Correction
WB  - Writeback, Integer Register update
FP1-WB - Floating Point Pipeline
L2N-L2W - Memory Access Pipeline

The new processor has improved cache latencies. Which makes cache miss, a huge performance hit on Itanium, much quicker. Faster FSB frequency allows the proccessor to interact quicker with the mainboard. Lower branch prediction penalties as well as a faster core clock frequency. With the shrink in transistor size, there is increased die area for more integer units and overall more ways to implement the potential 6 instructions per clock cycle.

Processor Features

McKinley has several improvements over the first Itanium processor (marked with an asterisk), and the following important design features.

System Bus
128 bits wide
200 Mhz/400 MT/s
6.4 GB/s*

Width
2 bundles per clock
6 integer units*
2 floating point units
328 total registers
2 loads and* 2 stores per clock
11 issue ports

Caches
L1 - 2 X 16 KB - 1 clock latency*
L2 - 256K - 5 clock latency
L3 - 3MB - 12 clk latency
      32 GB/s bandwidth

Addressing
50 bit physical addressing*
64 bit virtual addressing
Maximum page size 4GB

"Intel estimates McKinley based systems will deliver ~1.5X – – 2X performance improvement over today’s Itanium™ based systems based systems."


Sources: ISSCC McKinley Design Improvement Paper.
Available at: http://www.cpus.hp.com/technical_references/ia64.shtml

Log in or register to write something here or to contact authors.