parallel prefix algorithm - Everything2.com

Algorithm for computing all "prefixes" of an expression a₁#a₂#...#a_n, where # is some associative binary operation (e.g. addition, multiplication, AND, OR, XOR, or something more exotic, such as the weird monoid used in the carry lookahead adder). That is, parallel prefix is a way of calculating a₁ a₁#a₂ a₁#a₂#a₃ ... a₁#a₂#a₃#...#a_n You can calculate them all in linear time; the parallel algorithm uses more computational resources (does more work), but takes only logarithmic time.

Ω(n) processors are needed to do this. For convenience, I'll assume n=2^k. First, calculate all these pairs:

a₁#a₂
a₃#a₄
...
a_n-1#a_n

(n/2 independent operations, time O(1)).

Now combine every adjacent pair of results, to get n/4 "quartets" (n/4 independent operations, time O(1)). Continue in this manner, so as to have 2^n-k results of combining 2^k elements. Total for this stage: time O(k)=O(log n).

Create all the prefixes by combining the largest possible blocks, using the binary representation as a guide which blocks to combine. For instance, 13₁₀=1101₂, so we compute

(a₁#...#a₈)#(a₉#...#a₁₂)#(a₁₃)

where all the bracketed expressions have already been calculated. These are n independent operations, each taking time O(k)=O(log n) (the number of binary digits in a number 1,...,n).

We see that the algorithm finishes in time O(log n), and computes all needed prefixes.

Of course, a similar result holds in circuits, since no logic (beyond that for computing #) is required. Just hook up each step. It uses O(n log n) gates and runs in time O(log n).

carry lookahead adder	Transitive closure	Mathematical Games	serial monogamy
Programming by approximation	tridiagonal systems	first order linear recurrence	parallel processing
irreducible complexity	Tomasulo	chi square	Median
Omega	Ω	exotic	Floating point
complexity	binary	work