Simply put, coding cryptographic algorithms like DES and SHA-1 is hard to do right. The problem is that the slightest mistake, which would only be an tiny problem in most other kinds of algorithms, will cause your code to act in a way that is totally incompatible with how that algorithm is supposed to work. It gets easier with experience, but it's still a very trivial manner to screw things up.

So, how to ensure that everyone is on the same page when it comes to using a particular block cipher or hash function? The key is test vectors. The nice thing about crypto, is that while most of it is very complicated internally, the inputs and outputs are usually just byte strings. Simply by describing those inputs and outputs, a programmer can check to make sure their implementation is working correctly. Test vectors are, as far as I know, unique to crypto; at least, I can't think of any other field where all inputs and outputs to and from an algorithm can be described easily as byte strings. An especially important point is that, due to the general randomizing requirements on the algorithm (a single bit change in any input should change, on average, half of the output bits), the slightest mistake is very likely to be noticed, particularly if many such vectors are tested.

Of course, test vectors don't guarantee that your code is correct. For example, in some early Blowfish implementations, there was a bug that made Blowfish much easier to attack than normal, but if the key was made up entirely of ASCII characters, it wouldn't be noticed. Since the Blowfish test vectors of the time were all ASCII strings, the code would seem to be correct despite having a fatal flaw. For this reason, test vectors should be randomly generated, though with hash functions a certain set of "standard" inputs have formed, such as "a", "abc", and a million "a" characters.

I feel that I must emphasize that if you are working with a black box algorithm (for example, a binary-only library), it is impossible to guarantee that the algorithm is implemented as advertised, no matter how many test vectors you try. The problem being that it is possible for this black box to implement the algorithm as normal most of the time, and then do something evil1 only in certain circumstances (for example, when a particular key is used).

It is expected that any new algorithm proposal include at least one or two test vectors these days. Most algorithms, particularly the ones that are commonly used, have a large body of such vectors available for such testing.

Of course this assumes that the test vectors themselves are correct. Early implementations of the Tiger were deemed "wrong" (including the authors' own reference implementation), which of course screwed everyone up. Some implementations are still wrong, and this had certainly not helped Tiger's chances of becoming a widely used algorithm. And Serpent (designed by the same people), printed the test vectors backwards, which caused a great deal of confusion for a while, because while the reference implementation was correct, anyone working off just the specification and the test vectors would get thrown completely off track.

I have no idea if anyone on e2 is a professional cryptographer, but if you are, some words of advice from a guy who spends his time coding this crap:

  1. Make sure your reference implementation is correct.
  2. Generate lots of test vectors, using random inputs. If you algorithm supports parameters (variable key size, number of rounds, etc), provide plenty (at least 250) for each parameter combination.
  3. Publish at least a few in the paper presenting your algorithm.
  4. Make all of your test vectors available online, along with the reference implementation.
  5. Make sure your reference implementation is correct!

1: In this situation, I would consider anything that doesn't do exactly what the algorithm is supposed to do, evil.

Log in or register to write something here or to contact authors.