Algorithm for computing all "prefixes" of an expression a1#a2#...#an, where # is some associative binary operation (e.g. addition, multiplication, AND, OR, XOR, or something more exotic, such as the weird monoid used in the carry lookahead adder). That is, parallel prefix is a way of calculating
You can calculate them all in linear time; the parallel algorithm uses more computational resources (does more work), but takes only logarithmic time.
Ω(n) processors are needed to do this. For convenience, I'll assume n=2k. First, calculate all these pairs:
(n/2 independent operations, time O(1)).
Now combine every adjacent pair of results, to get n/4 "quartets" (n/4 independent operations, time O(1)). Continue in this manner, so as to have 2n-k results of combining 2k elements. Total for this stage: time O(k)=O(log n).
Create all the prefixes by combining the largest possible blocks, using the binary representation as a guide which blocks to combine. For instance, 1310=11012, so we compute
where all the bracketed expressions have already been calculated. These are n independent operations, each taking time O(k)=O(log n) (the number of binary digits in a number 1,...,n).
We see that the algorithm finishes in time O(log n), and computes all needed prefixes.
Of course, a similar result holds in circuits, since no logic (beyond that for computing #) is required. Just hook up each step. It uses O(n log n) gates and runs in time O(log n).