Multiple precision arithmetic

Multiple precision arithmetic is the term for algorithms and data structures which can be used to implement arithmetic operations with values larger than can be stored in a register of a CPU, which is usually between 16 and 64 bits. Usually this refers to integer arithmetic, but floating point is also possible. Some people refer to this as arbitrary precision arithmetic, which I don't like, because, as a computer is a finite state machine, it is impossible to support truly arbitrary precision. (It's things like this that make me a pedantic jerk, BTW).

These techniques are useful in a few certain areas. The only reason I know them is because I needed to implement public key cryptography for a crypto library I work on. They could also be potentially useful in things like modeling or AI. Also, some programming languages, such as Scheme, use multiple precision arithmetic at all times.

We denote a k-word integer x as being an array of words, k-1...0, with 0 being the least significant word. It is wise, for speed purposes, to choose the array size to coorespond with the size of the CPU's registers. In all cases, we assume that the base used is a power of 2. The base is the maximum value of a word, plus 1. For example, if you were using 32-bit words, the base would be 2^32.

The algorithms are written out in what is hopefully a reasonably understandable C-like pseudocode.

The algorithms assume their inputs are unsigned. It is fairly easy to figure out how to handle signs, given these algorithms, and thus is left as an excercise for the reader. Note too, that while these algorithms are reasonably fast, at least for moderately sized inputs, there are a huge number of possible optimizations. I know of some, and I'm sure there are many more that I don't yet know about. Here's a hint to think about: SIMD instructions.

Addition

Addition is done just like how you do it by hand, except it uses a larger radix (2^16 or 2^32 versus 10 for mere humans).

Input: x and y, each of size n (if they are not the same size, pad the smaller one out with 0s).

Output: w = x + y, of size (at most) n + 1 words.

carry = 0
For i from 0 to n - 1 do:
  w[i] = (x[i] + y[i] + carry) mod b
  If((x[i] + y[i] + carry) < b)
    Then carry = 0
    Else carry = 1
w[n] = carry
Return w

Subtraction

Input: x and y, each of size n (if they are not the same size, pad the smaller one out with 0s). It must be the case that x >= y.

Output: w = x - y, of size (at most) n words.

carry = 0
For i from 0 to n - 1 do:
  w[i] = (x[i] - y[i] + carry) mod b
  If((x[i] - y[i] + carry) >= 0)
    Then carry = 0
    Else carry = -1
Return w

Multiplication

This is not the fastest method of doing multiplication, but it is the easiest to describe (and it's what's listed in HAC). Consider also using Kartatsuba or Toom-Cook, which are asymptotically faster, though often slower in practice for smaller numbers. The technique of using both, using the classical algorithm for small numbers and switching over to the Karatsuba method for large numbers, is particularly effective. Comba multiplication works well for certain sized inputs (in particular, a 4x4->8 or 8x8->16 Comba multiply is very fast). The only reference I know of for Comba multiplication is Comba's paper in the IBM Systems Journal, at least until someone nodes it.

Input: x and y, where x has n digits, and y has t digits.

Output: w = x * y, of size n + t words.

For i from 0 to t-1 do:
  carry = 0
  For j from 0 to n-1 do:
    (uv) = w[i+j] + x[j] * y[i] + carry
    w[i+j] = v
    carry = u
  w[i+n+1] = u
Return w

In the above notation, (uv) refers to a "double word" value, ie one that is at least twice the size of the regular word size.

Division / Modulus

Division is pretty nasty; avoid it if at all possible. If you are doing the modulo operation many times with a single modulus, Barrett Reduction can be used to speed things up.

Input: x, of size n, and y, of size t. n >= t >= 1, y{t} != 0

Output: The quotient q = (q{n-t},...,q{1},q{0}) and the remainder r = (r{t},...,r{1},r{0}), such that x = q*y + r

While(x >= yb^(n-t)) do:
  q[n-t] = q[n-t] + 1;
  x = x - yb^(n-t)
For i from n down to (t+1) do:
  If(x[i] == y[t])
    Then q[i-t-1] = b-1
    Else q[i-t-1] = floor((x[i]*b + x[i-1]) / y{t})
  While(q[i-t-1]*(y[t]*b + y[t-1]) > x[i]*b^2 + x[i-1]*b + x[i-2])
    q[i-t-1] = q[i-t-1] - 1
  x = x - q[i-t-1]*y*b^(i-t-1)
  If(x < 0)
    Then x = x + y*b^(i-t-1)
         q[i-t-1] = q[i-t-1] - 1
r = x
Return (q,r)

Some other algorithms useful when using multiple precision algorithms are the binary GCD algorithm, the binary extended GCD algorithm, the square and multiply exponentiation algorithm, and the Miller-Rabin primality test.

I took these algorithms from Chapter 14 of the Handbook of Applied Cryptography. For many more details on multiple precision arithemetic, read it.

I probably made a transcription error or two in here, if you're going to implement these algorithms, double check them against the original source. Please /msg me if you notice any errors.

Comba multiplication	Babylonian square root algorithm	Karatsuba multiplication	arbitrary precision
fast exponentiation	Barrett Reduction	Handbook of Applied Cryptography	Standard Cryptographic Algorithm Naming
Wilson's test for primality	computing the inverse	Fast 6502 multiplication	Binary GCD algorithm
public key cryptography	RSA	Montgomery multiplication	Toom-Cook multiplication
Rabin-Miller primality test	null-terminated number	Botan	modular arithmetic
chord multiplication property	Division Algorithm	SIMD	limb