Toom-Cook multiplication (idea) by macro

Description

The Toom-Cook algorithm for multiplication of large numbers on computer is a divide and conquer approach that combines features of many other methods. Like Karatsuba multiplication, it operates by dividing the input numbers into limbs of smaller size, and expresses the larger product in terms of calculations made on the smaller pieces. Recursion can be applied to do the calculations on these smaller pieces, again using Toom-Cook, until the individual pieces are small enough to succumb to a more straightforward approach such as the same "long multiplication" performed on pencil and paper.

The method takes the names of its two discoverers. In 1963, Andrei Toom produced a construction that proved that the time complexity of multiplication had an upper bound that was considerably lower than the O(n²) of conventional long multiplication. The proof included an implicit description of the method, which Stephen Cook included, slightly modified, in his 1966 PhD thesis. One feature of the Toom-Cook method is it does not define a single solution; instead, it defines an entire family of methods for efficient multiplication. Implementors have considerable freedom to select a particular Toom-Cook method that makes most sense on a particular computer architecture, such as by making use of powers of 2 which suit the computer's binary notation.

The method has considerable commonality with multiplication using the fast Fourier Transform, in that it works on the same principles of polynomial multiplication. The input numbers are split into limbs of a given size, and each is written as a polynomial, using the limb size as radix. Instead of multiplying the polynomials directly, they are evaluated at a set of points, and the values multiplied together at those points. The product polynomial is then determined, based on the products at those points. Finally, substitution of the radix returns the final answer. The degrees of freedom available to choose an appropriate algorithm are the number of limbs the input is divided into, and the points at which the polynomials are evaluated.

An example

Let us take an example, multiplying 123,456,789 by 987,654,321. We can apply a "Toom-3" algorithm, splitting each number into 3 limbs of length 3 digits, and can choose a set of points, for example, {-2, -1, 0, 1, 2}. The calculations so far look like this:


A(x) = 123x² + 456x + 789
B(x) = 987x² + 654x + 321

x= -2  A=  369  B= 2961  AB=  1092609
x= -1  A=  456  B=  654  AB=   298224
x=  0  A=  789  B=  321  AB=   253269
x=  1  A= 1368  B= 1962  AB=  2684016
x=  2  A= 2193  B= 5577  AB= 12230361

We could have used Toom-Cook again to perform the multiplications AB, but this will do for illustrative purposes. There are efficient ways to calculate the values of A and B, such as the method of finite differences, which can be used to calculate the successive values of A and B using only addition. It now remains to find the polynomial P(x)=A(x)B(x) based on the values at the five points. There is a general solution to this problem, but we can do much better. The solution we are looking for is of the form

P(x) = p₄x⁴ + p₃x³ + p₂x² + p₁x + p₀

and we can substitute the values of x at the points to get a set of simultaneous equations that are linear in the unknowns p_i:

16p₄ - 8p₃ + 4p₂ - 2p₁ + p₀ =  1092609
  p₄ -  p₃ +  p₂ -  p₁ + p₀ =   298224
                         p₀ =   253269
  p₄ +  p₃ +  p₂ +  p₁ + p₀ =  2684016
16p₄ + 8p₃ + 4p₂ + 2p₁ + p₀ = 12230361

That's a few linear equations to solve, with a little algebra. Fortunately many of the terms are easy to separate and we get the solution:

p₄ =  121401
p₃ =  530514
p₂ = 1116450
p₁ =  662382
p₀ =  253269

which we can finally turn back to a number by substituting x=1000 in the product polynomial P(x). The easiest way to do this is to just write each coefficient with the right place value.

121,401
    530,514
      1,116,450
            662,382
                253,269
=======================
121,932,631,112,635,269

That may have seemed like hard work, but the difficult steps (evaluating the values and solving the equations) turn out to be quite easy for a computer to do, requiring simple operations such as addition, subtraction, shifts, and multiplication and division by small constants. The important part is to note that the body of the problem was reduced to five multiplies, each on numbers about one-third the size of the original inputs.

Variations

The simplest variations on the Toom algorithm rely on judicious choices of the set of points to evaluate all the polynomials at. A good choice will mean that the initial evaluations and the solution of the equations are easy (for the computer!). Some authors make a formal presentation of the method, in terms of matrix manipulations and the linear algebra solution of the equations. However, a good choice of points always supports a more efficient solution at this step.

There are always two of the coefficients which can be computed directly. The numerical coefficient of the product polynomial can be got by multiplying the trailing terms in the inputs (which corresponds to the substitution x=0), and the leading coefficient is likewise the product of the original leading terms (which corresponds to the asymptotic behavior of the function as x approaches infinity). Since these terms required one multiplication each, many authors compute those first and eliminate them from the other equations immediately.

The method generalizes immediately to splits of the number into more pieces. In general, a "Toom-K" algorithm will split the inputs into k limbs of equal width, evaluate the polynomials at (2k-1) points, and solve for the (2k-1) coefficients in the product polynomial. Knuth shows this in action with the choices of points {0, 1, ... 2k-2} or {-(k-1), -(k-2), ... k-2, k-1} in The Art of Computer Programming. The multiplication part then becomes (2k-1) calculations, at 1/k of the original size. A "Toom-2" algorithm can be shown to be equivalent to Karatsuba multiplication.

The gmp library for multiple precision arithmetic uses a Toom-3 algorithm for numbers of tens of thousands of digits, and only switches to a Fourier transform method when the numbers reach hundreds of thousands of digits. It uses the points x=0, 0.5, 1, 2 and the point at infinity to perform simple evaluations and solution of the equations.

Efficiency

The time complexity of a Toom-k algorithm is relatively easy to calculate. Since the algorithm recursively replaces a multiplication by (2k-1) smaller ones, k times smaller than the previous iteration, the theoretical complexity is O(n^{(log(2k-1))/(log k)}). This would suggest a larger k value gives a more efficient algorithm, but for larger k, the evaluation and interpolation steps can become quite hairy. (However, some authors have attempted values of k as large as 16).

Time complexity of some Toom-k sizes

 k | Notes       | Complexity for n digits big-Oh notation
============================================================
 2 | Karatsuba   |                O(n^1.585)
 3 | Most common |                O(n^1.465)
 4 | Feasible    |                O(n^1.404)
...|             |
16 | Difficult!  |                O(n^1.239)
============================================================

The table illustrates increased sizes experience diminishing returns, and while the method can approach the O(n log n) of fast Fourier Transform methods with larger values of k, the computation efforts outside of the multiplications can rapidly become significant. Nevertheless, the big-Oh notation hides a constant, and empirical results indicate that a lovingly crafted Toom algorithm will beat an FFT method for all but the most enormous calculations.

Sources

Cook, Stephen A., On the Minimum Computation Time of Functions, 1966 PhD thesis, Harvard University Department of Mathematics. Excerpt available online at http://cr.yp.to/bib/1966/cook.html , viewed on July 14, 2005.

Toom, Andrei L., The complexity of a scheme of functional elements realizing the multiplication of integers published in Soviet Math (translations of Dokl. Adad. Nauk. SSSR), 4, No. 3, 1963. Available online (in Russian) at http://www.de.ufpe.br/~toom/articles/rusmat/Multipli.pdf , viewed on July 14, 2005.

Knuth, Donald E., The Art of Computer Programming, volume 2, Seminumerical Algorithms, 3rd edition, Addison-Wesley, 1998.

Grandlund, T. Toom-Cook 3-Way Multiplication, from the GNU MP documentation. http://www.gnu.org/software/gmp/manual/html_node/Toom-Cook-3-Way-Multiplication.html , viewed on July 15, 2005.

Karatsuba multiplication	Multiplication using the Fast Fourier Transform	Egyptian multiplication	finite difference
How the FFT works	Radix	Mental multiplication	Lagrange interpolation
Why do computer geeks feel they were born in the wrong time?	Diminishing Returns	Linear algebra	theoretical
Montgomery multiplication	integer FFT	Andrei Toom	Comba multiplication
Windows XP Service Pack 2	Freedom of Choice	linear equation	Multiplication algorithm
Multiple precision arithmetic	degrees of freedom	leading coefficient	Equations