SSE3 is

Intel's 3rd generation Streaming

SIMD Extensions. It is comprised of 13 new single

instruction multiple data

CPU instructions. Intel introduced SSE3 in 2004 with the

Prescott core revision of its

Pentium 4 CPU.

Here is a brief outline of what each new instuction does:

**FISTTP** (Store Integer and Pop from x87-FP with Truncation) behaves like the FISTP instruction but uses truncation, irrespective of the rounding mode specified in the floating point control word (FCW).
**MOVSHDUP** loads/moves 128-bits, duplicating the second and fourth 32-bit data elements.
**MOVSLDUP** loads/moves 128-bits, duplicating the first and third 32-bit data elements.
**MOVDDUP** loads/moves 64-bits (bits: 63-0 if the source is a register) and returns the same 64
bits in both the lower and upper halves of the 128-bit result register. This duplicates the 64 bits
from the source.
**LDDQU** is a special 128-bit unaligned load designed to avoid cache line splits.
**ADDSUBPS** has two 128-bit operands. The instruction performs single precision addition on
the second and fourth pairs of 32-bit data elements within the operands; and single precision subtraction on the first and third pairs.
**ADDSUBPD** has two 128-bit operands. The instruction performs double precision addition on
the second pair of quadwords, and double precision subtraction on the first pair.
**HADDPS** performs a single precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the third and fourth elements of the first operand; the third by adding the first and second elements of the second operand; and the fourth by adding the third and
fourth elements of the second operand.
**HSUBPS** performs a single precision subtraction on contiguous data elements. The first data
element of the result is obtained by subtracting the second element of the first operand from the
first element of the first operand; the second element by subtracting the fourth element of the first
operand from the third element of the first operand; the third by subtracting the second element
of the second operand from the first element of the second operand; and the fourth by subtracting
the fourth element of the second operand from the third element of the second operand.
**HADDPD** performs a double precision addition on contiguous data elements. The first data
element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the first and second elements of the second operand.
**HSUBPD** performs a double precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the second element of the
second operand from the first element of the second operand.
**MONITOR** sets up an address range used to monitor write-back stores.
**MWAIT** enables a logical processor to enter into an optimized state while waiting for a write-back store to the address range set up by the MONITOR instruction.

AMD has announced that will support the SSE3 instructions in future versions of its

Athlon 64 processors.