Algorithm for computing all "prefixes" of an expression `a _{1}#a_{2}#...#a_{n}`, where

`#`is some associative binary operation (e.g. addition, multiplication, AND, OR, XOR, or something more exotic, such as the weird monoid used in the carry lookahead adder). That is,

*parallel prefix*is a way of calculating a

_{1}a

_{1}#a

_{2}a

_{1}#a

_{2}#a

_{3}... a

_{1}#a

_{2}#a

_{3}#...#a

_{n}You can calculate them all in linear time; the parallel algorithm uses more computational resources (does more work), but takes only logarithmic time.

Ω(n) processors are needed to do this. For convenience, I'll assume n=2^{k}. First, calculate all these pairs:

(n/2 independent operations, time O(1)).a_{1}#a_{2}a_{3}#a_{4}... a_{n-1}#a_{n}

Now combine every adjacent pair of results, to get n/4 "quartets" (n/4 independent operations, time O(1)). Continue in this manner, so as to have 2^{n-k} results of combining 2^{k} elements. Total for this stage: time O(k)=O(log n).

Create all the prefixes by combining the largest possible blocks, using the binary representation as a guide which blocks to combine. For instance, 13_{10}=1101_{2}, so we compute

where all the bracketed expressions have already been calculated. These are n independent operations, each taking time O(k)=O(log n) (the number of binary digits in a number 1,...,n).(a_{1}#...#a_{8})#(a_{9}#...#a_{12})#(a_{13})

We see that the algorithm finishes in time O(log n), and computes all needed prefixes.

Of course, a similar result holds in circuits, since no logic (beyond that for computing `#`) is required. Just hook up each step. It uses O(n log n) gates and runs in time O(log n).