RC6 is a block cipher which was submitted to the AES process, and made it into the final 5 out of the 15 original submissions. It has a 128 bit block size (though a 256-bit block variant also exists), and can accepts keys of any length between 0 and 2040 bits in increments of 8 (thanks to alisdair for pointing me at the RC6 FAQ for the upper limit).

The key schedule converts the key, consisting of a basically arbitrary length octet string, into a sequence of 32-bit words, denoted by S. The precise method of doing this isn't actually very important -- all that matters to to make sure that different keys are mapped to different S arrays, and that knowing some subset of S does not allow you to figure out any other values of S, or any bits of the original key.

The reason for the huge upper limit on the key size (2040 bits, as compared to 256 bit maximums for most other ciphers) is convenience, not security. In fact, the paper presenting RC6 is careful to mention that very large keys are probably not more secure than normally sized (128 bit or 256 bit) keys. When doing something like Diffie-Hellman key agreement, the shared secret is very large, and there needs to be a way to hash that down to something useful. But in resource constrained environments (like smartcards) the overhead of having an implementation of SHA-1 could be quite prohibitive. With RC6, it is not necessary; just toss the whole thing into RC6's key schedule as-is.

The algorithm breaks each input block into 4 32-bit variables (using the little-endian convention). These 4 variables are usually denoted as A, B, C, and D. They are run through a series of rounds, which is by default 20. Each round looks like this (using fairly standard C-like notation):

   T1 = rotate_left(B*(2*B+1), 5);
   T2 = rotate_left(D*(2*D+1), 5);
   A = rotate_left(A ^ T1, T2 % 32) + S[2*round];
   C = rotate_left(C ^ T2, T1 % 32) + S[2*round+1];
   (A,B,C,D) = (B,C,D,A);

Where rotate_left(X, N) does a cyclic rotation of X by N bits to the left (moving bits upward). In addition to these 20 rounds, there is some masking before the first round and after the last round, where an entry of the S array is added to a subset of {A,B,C,D}.

The intent of the rotation by 5 on T1 and T2 is to ensure that the 5 high bits of T1 and T2 are used to determine the rotation amount in the next steps. Variable rotations are at the heart of RC6's security, and so "optimizing" the unpredictability of the rotations is paramount.

RC6's use of variable rotations and 32-bit multiplications made it unattractive in dedicated hardware (where a general 32x32->32 multiplier is expensive) and in small systems like smartcards. In addition, benchmarks done on early IA-64 hardware showed that RC6's performance scaled very poorly in comparison to other finalists (in particular Serpent, Rijndael, and Twofish) on future hardware. This was hardly a surprise, given that previous ciphers designed by Ron Rivest, particularly RC4 and RC5, have had great performance on their target processors (486 and Pentium, respectively), but have not scaled as well as other algorithms on newer CPUs. It did a great deal of damage to RC6's chances, when compared with other ciphers which only use simple operations requiring little investment in hardware. In the final vote to decide among the 5 finalists, RC6 placed last.