3DNow! is a set of instructions which AMD added onto their range of processors, starting with the K6/II. In concept, these are similar to Intel's MMX instruction set, and is in fact "backwards"-compatible with them.

3DNow! tailored programs generally show a significant performance boost, much more so than with "plain" MMX. While MMX is used to do several integer operations in parallel, 3DNow! adds new floating point instructions that can operate on 1 or 2 single-precision floating point values at a time. 3DNow! supports addition, subtraction, multiplication, division, negation, absolute, comparison, conversion to and from integers, and square root. Division and square root operators can also be limited to 12 bits precision for extra speed.

The original 3DNow! instruction set consists of:

FEMMS
A faster version of EMMS. Always use this instead if you're targetting AMD processors specifically.

PAVGUSB
Averages unsigned bytes. This is not a floating point operation. According to AMD's docs, this is intended to speed up MPEG playback.

PF2ID & PI2FD
Convert between packed floating point and packed 32-bit integer.

PFACC
Adds "sideways" as compared to PFADD.

PFADD, PFSUB, & PFSUBR
Add or subtract respective elements of the source and destination. PFSUBR (subtract reverse) subtracts the destination from the source.

PFCMPEQ, PFCMPGE, & PFCMPGT
Packed floating point compare: equal, greater than or equal, and greater than.

PFMAX & PFMIN
Select maximum or minimum of the given packed floating point values. Pretty straight forward.

PFMUL
Multiply. Note that there is no divide instruction per se, you have to multiply by the reciprocal, see PFRCP.

PFRCP, PFRCPIT1, & PFRCPIT2
Calculate the reciprocals. This can be done with one instruction for low (14-bit) accuracy or with three and an extra register for hih (24-bit) accuracy.

PFRSQRT & PFRSQIT1
Calculate the reciprocal square root, multiply by the input values to get the square root. As with PFRCP, you can calculate the reciprocal square root to either low (15-bit) or high accuracy.

PMULHRW
Not a floating point operation: like PMULMH, but rounds instead of truncating.

PREFETCH & PREFETCHW
Suggest that data be loaded into the cache without actually using it. PREFETCHW hints that the memory will be modified. Otherwise it's a nop.

One benefit of 3DNow! instructions is that you can feely mix them with MMX instructions. For example, to get an absolute value, just use PAND to set the sign bit to zero.

Log in or register to write something here or to contact authors.