SSE (Streaming SIMD Extensions) were introduced by Intel in the Pentium III processor. SSE was intended to compete with the 3DNow! instruction set AMD introduced with the K6/2. At the time, 3DNow! was revolutionary, providing SIMD instructions that were actually useful in real-world applications.

MMX had been intended to speed up 3d games, audio and video applications, etc. but turned out to be totally unsuitable. In the time between MMX's development and implementation, games and CODECs moved from integer to floating point math. The first generation of 3d accelerators allowed games to offload the integer-intensive drawing operations to the 3d card, leaving only the floating-point geometry calculations for the CPU. Newer perceptual coding methods such as MP3 and MPEG2 allowed for far greater quality in a much smaller filesize, but used computationally expensive floating-point math. MMX only provided integer operations, making it all but worthless for the majority of programs by the time of its release. 3DNow!, by contrast, provided floating-point operations that accelerated MPEG, and MP3 decoding, and the geometry calculations used by 3d games. The 3DNow! version of Quake 2 allowed the much cheaper, lower-clocked K6/2 to outperform Intel's flagship Pentium II. Intel returned to the drawing board, and designed a new SIMD instruction set, addressing the failings of both MMX and 3DNow!.

While MMX and 3DNow! used the same registers as the FPU, SSE has its own set of registers. This means that the processor can move from floating-point/MMX mode to SSE mode and back again, without having to save and reload the floating-point registers. Each of the eight SSE registers holds four 32-bit numbers, which the SSE unit operates on simultaneously. This means that (under optimum conditions) the SSE unit can carry out calculations at four times the rate of the FPU.

Numbers in the the SSE registers can be treated as either single-precision floating point numbers or 32-bit integers, meaning that certain integer operations can be carried out without having to switch into MMX mode (and thus avoiding the hit of having to save the FPU registers). Treating the registers as floating-point numbers, SSE provides instructions equivalent to AMD's 3DNOW!. While the two instruction sets are incompatible, they are roughly as capable as each other at floating-point math. Realising that SSE was winning the 'SIMD war' (in part due to the money Intel was throwing into marketing it), and 3DNOW! support was waning in modern applications, AMD eventually licensed SSE, incorporating it into their Athlon XP and later processors.

SSE support is seen mostly in the lower-levels of the operating system - inside video drivers, 3D APIs (Direct3D, OpenGL, etc.), etc. Applications that have a particular use for SIMD (games, audeo/video encoders, graphics programs, etc.) will often use SSE to improve performance. Applications that use SSE include Photoshop, Winamp, Quake 3, software DVD players such as PowerDVD, MPEG4 codecs such as DIVX 5.0, XVID, etc.

I would take everything Oscarfish says about SSE with a pinch of salt, with the possible exception of the phrases Pentium III, and photoshop.