Algorithm for computing all "prefixes" of an expression a1#a2#...#an, where # is some associative binary operation (e.g. addition, multiplication, AND, OR, XOR, or something more exotic, such as the weird monoid used in the carry lookahead adder). That is, parallel prefix is a way of calculating

a1
a1#a2
a1#a2#a3
...
a1#a2#a3#...#an
You can calculate them all in linear time; the parallel algorithm uses more computational resources (does more work), but takes only logarithmic time.

Ω(n) processors are needed to do this. For convenience, I'll assume n=2k. First, calculate all these pairs:

a1#a2
a3#a4
...
an-1#an
(n/2 independent operations, time O(1)).

Now combine every adjacent pair of results, to get n/4 "quartets" (n/4 independent operations, time O(1)). Continue in this manner, so as to have 2n-k results of combining 2k elements. Total for this stage: time O(k)=O(log n).

Create all the prefixes by combining the largest possible blocks, using the binary representation as a guide which blocks to combine. For instance, 1310=11012, so we compute

(a1#...#a8)#(a9#...#a12)#(a13)
where all the bracketed expressions have already been calculated. These are n independent operations, each taking time O(k)=O(log n) (the number of binary digits in a number 1,...,n).

We see that the algorithm finishes in time O(log n), and computes all needed prefixes.

Of course, a similar result holds in circuits, since no logic (beyond that for computing #) is required. Just hook up each step. It uses O(n log n) gates and runs in time O(log n).

Log in or registerto write something here or to contact authors.