Simultaneous Multithreading (SMT) is another evolution in processor architecture that allows the CPU to process a greater number of instructions per clock cycle.

Out of Order processors allow a processor to execute instructions in an arbitrary order with some instructions happening in parallel. Only one process or thread can run at a time, a pipeline flush has to occur to switch to another thread. Often a single thread can only be parallelized a small amount by the hardware.

SMT combines out of order processing capability with the ability to run multiple processes or threads "at the same time." Since the threads don't use the same registers or memory space(*) then the processor can run many more instructions in parallel than with a single thread - there are no dependencies between the instructions. The additional complexity added to the processor is not trivial, but the performance increase can be very large.

Intel's version of SMT is called Hyperthreading, and can run 2 threads or processes simultaneously. The processor keeps the instructions in the same buffers, giving the processes different register sets and a few other seperate buffers.

The original P4 contained a full SMT implementation that actually worked, but in a few corner cases it slowed the entire processor to a crawl. Intel decided to release the processor with "Hyperthreading" turned off until they fixed the performance issues bogging these exceptional cases.

(*) The SMT architecture appears as two seperate processors to the operating system, so it's like having two processing units, two sets of registers, two sets of memory spaces, etc. The reality is that they are sharing the same pool of registers and processing units. So program A's register 1 is not the same register as program B's register 1. This means that any instruction from program A and any instruction from program B can be run simultaneously even if they use the 'same' architectural registers (with some minor exceptions regarding locks, memory access, etc). This is not as difficult as it may seem since out of order processing, used in processors for many years, renames the registers as they come into the processer to a shared pool of registers.