Very fast adder design for n-bit binary numbers. Whereas the naïve design (chaining full adders) have an O(n) component count and takes time O(n) to add, the carry lookahead adder requires O(n log n) components (which is a *lot* more), but adds in time O(log n).

The design is based on a fast method of calculating where the carry bits will be added. Suppose we're adding the 2 binary numbers `x _{n-1}...x_{0}` and

`y`. If we knew to which bits we need to add carry (define

_{n-1}...y_{0}`c`to be 1 if a 1 is carried out from adding bit j, else 0), we could just XOR each of these columns to get the result:

_{j}which takes constant time (and O(n) components).c_{n-1}c_{n-2}...c_{0}_{ }x_{n-1}...x_{1}x_{0}_{ }y_{n-1}...y_{1}y_{0}

So we need a fast way of calculating the `c _{j}`'s. To do this, we define 3 possible carry states, and assign one such to each of the n bits:

- If both
`x`and_{j}`y`are 1, then a 1 will always be carried out of that bit; this is carry state_{j}**S**("set"). - If both
`x`and_{j}`y`are 0, then 1 will never be carried out of that bit; this is carry state_{j}**C**("clear"). - If exactly one of
`x`and_{j}`y`is 1, then 1 will be carried out of that bit iff a 1 was carried into that bit; this is carry state_{j}**P**("propogate").

**S**), that 1 will never be carried out of the block (

**C**), or that 1 will be carried out if a 1 was carried in (

**P**). We define "multiplication" on the carry states so that the carry state of the concatenation of two blocks is the multiplication of their carry states (read the state on the right on the column, and the state on the left on the row).

. S C P S S C S C S C C P S C P

This operation certainly isn't commutative (nor does it define a group), but it turns out to be associative! We may calculate the carry state of each bit using the above rules in constant time and O(n) components; let *k*_{j} be the carry state for bit j. Then `c`_{j} is 1 iff *k*_{j}...*k*_{1}*k*_{0} is **S**. So using the parallel prefix algorithm, we can calculate these block values, and therefore the `c` values, using O(n log n) components and in time O(log n).

The really amazing thing about this adder design is not that it works; it's that today similar designs are used in real CPUs!