Algorithm for computing all "prefixes" of an expression `a`_{1}#a_{2}#...#a_{n}, where `#` is some associative binary operation (e.g. addition, multiplication, AND, OR, XOR, or something more exotic, such as the weird monoid used in the carry lookahead adder). That is, *parallel prefix* is a way of calculating

a_{1}
a_{1}#a_{2}
a_{1}#a_{2}#a_{3}
...
a_{1}#a_{2}#a_{3}#...#a_{n}

You can calculate them all in

linear time; the parallel algorithm uses more computational resources (does more

work), but takes only

logarithmic time.

Ω(n) processors are needed to do this. For convenience, I'll assume n=2^{k}. First, calculate all these pairs:

a_{1}#a_{2}
a_{3}#a_{4}
...
a_{n-1}#a_{n}

(n/2 independent operations, time O(1)).

Now combine every adjacent pair of results, to get n/4 "quartets" (n/4 independent operations, time O(1)). Continue in this manner, so as to have 2^{n-k} results of combining 2^{k} elements. Total for this stage: time O(k)=O(log n).

Create all the prefixes by combining the largest possible blocks, using the binary representation as a guide which blocks to combine. For instance, 13_{10}=1101_{2}, so we compute

(a_{1}#...#a_{8})#(a_{9}#...#a_{12})#(a_{13})

where all the bracketed expressions have already been calculated. These are n independent operations, each taking time O(k)=O(log n) (the number of binary digits in a number 1,...,n).

We see that the algorithm finishes in time O(log n), and computes all needed prefixes.

Of course, a similar result holds in circuits, since no logic (beyond that for computing `#`) is required. Just hook up each step. It uses O(n log n) gates and runs in time O(log n).