Polynomials

As a student, I always had a lot of trouble visualizing what a discrete Fourier transform actually meant. Its continuous relatives at least had some physical interpretation - a Fourier series breaks down a periodic function into sums of sine and cosine waves, while the continuous Fourier transform breaks down any function into its component frequencies. In both cases, there's a transformation from the time domain into the frequency domain. That didn't seem to be the case with their discrete cousin. There are n input values going in, and n output values coming out, usually complex numbers. I could fudge the issue somewhat, and imagine some waveform going through the n points, but the more I learnt about the DFT and its uses, the less and less I saw it having anything to do with waves at all. After all, you can start merely from a raw definition:

Given a sequence of n (possibly complex) values (a0, a1, a2, ... an-1) their discrete Fourier transform (f0, f1, f2, ... fn-1) is given by

fj = Σk akωk*j
and the corresponding inverse transform is given by
aj = (1/n) * Σk fkω-k*j
where
ω = exp (2*π*i/n)
is a primitive n'th root of unity.

There's some sort of rotation involved, what with all those ω terms flying around, but it still seems more like voodoo than anything meaningful. However, after I first saw one little trick, almost anything I read about the discrete Fourier transform started to make sense. The original sequence (a0, a1, a2, ... an-1) can be viewed as a polynomial A(x), using a construction called a generating function:

A(x) = a0 + a1x + a2x2 + ... + an-1xn-1
It seems to have got us no further other than using powers of x as placeholders in the sequence, but suddenly, the discrete Fourier transform has meaning:
fj = A(ωj)
In other words, the discrete Fourier transform is nothing more than the values of the polynomial at the n nth complex roots of unity. We can visualize the inverse transform similarly, it's given by the unique polynomial of degree less than n that takes those values at this rather special choice of n points. It's even more of a pleasant surprise, again using F(x) to denote the polynomial representation of the sequence (f0, f1, f2, ... fn-1), that the inverse transform can be formulated similarly:
aj = (1/n) F(ω-j)
In this case, the inverse transform is (aside from division by n) the evaluation of the corresponding polynomial over the nth complex roots of unity taken in the reverse order.

Multiplication

While this is very nice to know, and may help visualize what is going on, it would be better if there was something even more concrete to show for our efforts, and it turns out we don't have to look far. Suppose we wish to multiply two polynomials A(x) and B(x), to get the product P(x). If we use the convention as above of using lowercase letters with subscripts to denote the corresponding coefficients, then
pj = Σk ak*bj-k
taking the summation over all indices where the corresponding a and b terms are valid. This happens to be a type of sum called a (discrete) convolution, and there happens to be a convolution theorem in the discrete world much as in the continuous. The discrete Fourier transform of a convolution is the pointwise product of the individual Fourier transforms. A little bit of massage is necessary before we can apply the convolution theorem. In particular, we need to pad the sequences with enough zero elements. Since we're multiplying polynomials, as long as there are enough to make the sequences as long as the product is going to be, we will be fine. The result of all this padding is we can now allow the sum in the convolution to range over all indices, assuming they wrap around.

So, now we have it. To multiply our polynomials, we pad them, discrete Fourier transform them, multiply them, and inverse transform them... and this was meant to be easy? But it was. Think back to the meaning of the Fourier transform, the values of the polynomials at the complex nth roots of unity. In effect, to multiply the polynomials, just multiply their values at these points. We now have the values at a set of points which are exactly the values taken by the product, but those values denote a unique polynomial - it must be the product we are looking for.

Note the important step was choosing n large enough. If we are trying to find a polynomial based on its values at the complex nth roots of unity, it's unique up to multiples of xn - 1 - which vanishes at those points by definition. Our result has to be unique, so our product has to be of degree less than n for this to work.

Complexity

It seemed to take a lot of work, but the problem of multiplying polynomials was reduced to the problem of multiplying n pairs of values - oh, plus the small matter of calculating those n values in the first place, and reversing the calculation afterwards. Evidently, it's the Fourier transforms that are going to dominate. Can we do better than the obvious way of multiplication (which is O(n2) or what about the Strassen algorithm for polynomial multiplication, which is O(n1.585)?

It turns out we can in fact do much better, because we can use the Fast Fourier Transform (FFT), which has time complexity O(n log n). Of course, that Big-oh notation hides a constant, but, nevertheless, the result comes out. For large enough n, it's faster to multiply polynomials using the Fast Fourier Transform. A few additional tricks, such as using the bit reversal function to store the polynomials optimally, and the method is the most efficient known for very large n.

And finally, to numbers!

That's all well and good, but how can we apply this to the multiplication of large numbers? Certainly, computations like the Fast Fourier Transform are not in the domain of pencil and paper calculation, nor even integer arithmetic on a computer. It is going to take quite some floating point muscle to get this to work, and we still need to work out how to get the numbers involved. To see where they come in, we go back to the polynomial representation of a sequence of values, using a generating function, and we compare that with the notion of place value in a radix based number system such as base ten or hexadecimal. Again, we can use the variable as a placeholder, so for instance, the decimal number 2047 can be represented as

A(x) = 2x3 + 0x2 + 4x + 7
and we can recover the original value using the substitution x = 10. From there, our multiplication algorithm is complete:
  • Choose a suitable radix. We might use 10, but our computer probably would prefer some nice power of 2.
  • Using that radix, write the two numbers as polynomials.
  • Take the Fast Fourier Transform of the two polynomials. Remember to pad them out to make them long enough to contain the result.
  • Multiply the two transforms, pointwise. Effectively all we're doing is mulltiplying the values of the polynomials at the nth roots of unity.
  • Use the inverse transform to find the product polynomial.
  • Substitute the radix back in, to find the numerical result.

While it sounds daunting, it turns out to be very efficient for very large numbers. There are a few pitfalls, mainly because any calculation involving floating point has to be wary of rounding error, but it can and does work. It is used extensively in such areas as primality testing, such as searches for the largest prime number, dominated for a considerable time by the GIMPS effort to search for Mersenne primes. The GIMPS software uses the Lucas-Lehmer test, and several other neat Fourier transform tricks, to do its calculations very efficiently - but those tricks are best left for another node.

Sources

Weisstein, Eric W. "Fast Fourier Transform." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/FastFourierTransform.html , viewed on July 12, 2005.