Cryptography and the Discrete Logarithm Problem
A crucial part of modern cryptography is the use of one-way functions for public key systems. These (roughly) are mathematical rules which are very easy to perform in one direction, but significantly (ideally, exponentially) harder to reverse. One popular approach is the RSA cryptosystem; its main rival is the elliptic curve discrete logarithm problem (ECDLP), which has found its way into various government standards due to certain advantages it affords over RSA.
However, no functions have yet been proven to be one-way. The discrete logarithm problem (DLP) is not specific to elliptic curves; it can be applied to any group. This is a mathematical structure with a rule, the group operation ⊕, for combining two group objects into another. Of course, if we can do this once, we can do it again, which gives rise to a notion of scalar multiplication: write g for g⊕g, g for g⊕g⊕g and so on. We can rapidly compute h=[n]g in log2(n) group operations. The DLP asks us to do the reverse: given g and h with h=[n]g, recover n. In the abstract setting of a group with a prime number of elements N, it can be proven that this problem requires at least of the order of sqrt(N) group operations to solve; that is, exponentially longer than scalar multiplication in the forward direction. This is no contradiction to my earlier assertion; any cryptographic implementation requires the choice of a particular group, and there may be additional structure there which can be exploited to solve the DLP more quickly. Thus there could be a shortcut through the ECDLP despite the proof that there can be none for the DLP in general.
Baby Steps, Giant Steps
The Baby-Step, Giant-Step (BSGS) algorithm is a generic algorithm that falls into the class of square root algorithms - that is, those with best possible performance without exploiting properties of a particular group. Let's consider an example first.
Suppose we know that [n]g=h, with n being in the range 0...99. Then the naive approach to determine n would be to simply start with the identity g, then perform successive group operations to find g, g, g etc., comparing each to h until a match is found. At worst, this will take 100 steps; on average, 50.
But if n is a two digit number, we can think of it as 10a+b for a and b in the range 0...9. Then we have
(where [-b]g=[b](-g) for -g the group inverse of g)
Thus we can consider instead all possible values of both the left- and right-hand sides of this expression, knowing that there will be a common value. How is this of any benefit? There are only 10 possibilities for h⊕[-b]g, since b ranges from 0 to 9. We describe these as the baby steps h, h⊕-g, h&oplus[-2]g... h⊕[-9]g. Then we can start computing giant steps for the right hand side- g, g, g etc.. until we obtain a match.
Each step (baby or giant) requires only a single group operation (let u=g, then the giant steps are u,u,u...), so although we commit to 10 group operations up front to tabulate the baby steps, we require at most 20 and on average 15 operations to recover n.
In general (by the remainder theorem), for a fixed s there will be always be a,b such that n=as+b and b is in the range 0..s; thus we can proceed by forming s baby steps for each b before generating giant steps sg, 2sg,3sg...- at most N/s, where N is a bound on n (at worst, the cardinality of the group can be used). To balance the time spent forming baby steps against the speedup of having larger (and hence fewer) giant steps, the optimal choice of s is around sqrt(N), with at worst around 2sqrt(N) group operations then being required.
Notice that there is a time-space tradeoff required: the naive approach only stored one group element at a time, whereas we require s=sqrt(N) elements - the baby steps - to be stored. Optimal performance will also depend on an efficient way to compare giant steps to these values, such as hash tables.
Input: An element g of a group of order N, and some h from the group with h=[n]g.
Output: The value of n.
- Set s=floor(sqrt(n))+1.
- Set b=h, and store (b,0) in a hash table.
- For j from 1 to s:
set b=b⊕(-g), and store (b,j) in the hash table
(Hash Table now contains baby step pairs (b_j=h⊕[-j]g,j) for j from 0 to s)
- Set i to 0 and G to the group identity (g)
- While (true) do:
- If G=b_j for some j (hash table lookup of G hits), return is+j
- Set i=i+1 and G=G⊕[s]g
Variants and refinements
The BSGS can be adapted in situations where more information is available, such as when n is constrained to lie in a particular interval, or its equivalence class modulo some other value is known. Gains can also be made from parallelisation, by splitting the generation of baby steps and then testing of giant steps amongst multiple processors or nodes.
Adaptive width variants also allow for a BSGS search when no bound is known on n, Starting with a small value of s and periodically increasing it, topping up the Baby Steps as necessary. A particularly elegant approach along these lines (Terr, 2000) uses the triangular numbers 0,1,3,6,10... as giant steps, thus requiring only one additional baby step per giant step.
The basic strategy of BSGS lends itself to other areas: in fact, it was originally documented by Shanks (1971) for computing the order of ideal class groups. Finding element orders can be viewed as an instance of the discrete logarithm problem, but very recently (Sutherland, 2007) a variant of BSGS has been developed for this which beats the lower bound for the DLP, making the order problem easier than the general DLP.
BSGS is not the only square root algorithm; a popular alternative is Pollard's Rho method, based on the birthday paradox, which has smaller memory requirements.