The R12000 is a 64-bit processor produced by MIPS technologies
making its debut in 1999. It is the successor to the MIPS R10000
, which was launched in 1996. The key differences between the R12k and the R10k
- 2x increase in MRU table to 16Kbit paths. This allows better predictions when searching the secondary cache.
- 50% increase in active register list, up from 32 to 48. This allows more commands to be either waiting in queues for execution, or to be executing in the instruction units. When encountering a branch, the processor is able to speculatively execute more commands.
- Extra pipeline storage
- 4x increase in branch prediction table, up from 512 to 2048, with an option for branch path dependent on history of previous branch paths. In the R10k, a branch causes a 1-cycle delay in the pipeline while the target address is calculated. The R12k eliminates this when the branch leads to the cache.
- 32-entry branch target address cache
- Address queue has seperate pipelines for address calculations and tag checking. This improves overall efficiency, for example if the cache tag memory is busy, the address calculation can independently determine the cache bank, saving a tag check cycle.
- Load/store instructions use the integer queue. This means that instructions can be decoded if the address queue becomes full. After address calculation, they are removed from the integer queue and placed back on the load/store queue. This has minimal performance impact, but does simplify the design.
- 8-entry content addressable memory branch target address cache buffers
On the R10k
, transferring data from the system bus
to the secondary cache locked the cache controller
while a complete cache line
was read from main memory
. Since the system bus is clocked slower than the processor, a full cache line transfer can cause the secondary cache controller to wait many cycles before completion, during which time no other requests can be submitted. On the R12k, transfer of a cache line from the system bus to the secondary cache is done in 4 or 3 blocks (for data and instruction, respectively), with idle cycles in between during which other operations can be performed by the secondary cache controller. This significantly improves performance for applications that frequently miss the processor caches.
Another improvement over the R10k is the relaxed set locking of the data cache. The R10k locks a cache line whenever it is accessed by a load or store instruction. If another instruction requires access to the same line while the first is still executing, it can share the lock, but if it requires a different cache line it is stalled until the first instruction releases. This reduces performance, but is necessary to eliminate the risk of deadlock in out-of-order execution. The R10k allows only the oldest instruction to obtain a lock on the other cache line, preventing further instructions from locking it. The R12k allows a lock for the oldest instruction that does not already own a lock, so instructions that do lock will not stall instructions that only require a lock on the second line. The net result of this is to lower contention in the data cache.
The R12k is used in the SGI Octane 2 workstation at 270 and 300mhz. Previous Octanes (with E graphics) used the R10k at 225 and 250 Mhz, and Octanes prior to that (I graphics) had R10ks at 175 and 195 Mhz (like the Indigo2 IMPACTs). The latest Octanes (Octane2) and Fuels use the MIPS R14000. Octane owners can upgrade their R10ks to dual 300mhz R12ks by replacing the CPU daughterboard.
(My personal experience: An Octane with a 225mhz R10k is roughly equal to a Sun Ultra 60 with a 360mhz UltraSPARC-II. A 300mhz R12k is 1.2-1.5x faster for raw CPU and should benefit from better cache management too.)