My mate Dan

Cache Behaviour Analysis

CBA is a side-channel attack aimed at cryptosystems. There are a number of approaches to defeating cryptosystems, this is a modern approach investigating the exploitation of systems with caches.

Requirements

Differential power analysis requires intimate power tracing to perform difference analysis. Cache behaviour analysys requires only hit/miss behaviour. This can be gleaned from much more gross power analysis traces or line-activation-probes (when memory is 'used'). Clearly, a side channel attack requires physical access to the cryptosystem to make the measurements.

The goal is to make a cheaper search space for the key. If we can say "there are ten candidate keys" -- we've saved a huge pile of work from the 2^128 keys that comprised the original search space.

Example

We will start with a very simple cipher and a very simple cache. The implementation of which are not totally real-world, but for the sake of explanation are not too frar fetched. For a real example, the breaking of DES with CBA is presented in a report by My Mate Dan.

The cipher we will use is something-like-RC4 . This is a stream cipher as opposed to the block cipher broken in the report. The part of the cipher we need look at to attack it is simply the key-schedule, where the key is injected into the substitution-box (sbox). The amount of code we need is as follows:

char sbox={0,1,2,3,4,5,6,7,8,9,10, ..., 253, 254, 255};

void keyInit(char key[16])
{
  int i, j, temp; 
  for(i=0; i<256; i++)
   {
     j = (j + sbox [i] + key[i % keylen]) % 256;
     //Swapping
     temp = sbox[i];
     sbox[i] = sbox[j];
     sbox[j] = temp;
   }
}

We don't need to worry about anything more. The cipher itself is suspect to known plaintext attacks using xor operatations dependent only upon the state of the sbox. The cheat to make the example simple is that the sbox initialised by the compiler -- it's put in the initialised data segment of the final executable image. Details, details, what this means is that the contents of sbox are correct at the start and they have not been through the cache.

We'll start with a one element per-line cache -- that's one sbox item per line in the cache. How far this is from something real (or at least contemporary is for discussion elsewhere). We can describe easily how this becomes more representative with simple expansion. Anyway, the other assumption is that the cache is big enough to hold the whole sbox (and the key). That's not big assumption -- basically no replacement misses. Oh, and that the caching of key doesn't affect the caching of sbox (no mutual caching benefit -- this is automatic for one element per line, but is required for multi-element lines below).

Finally, all scalars are assumed to be in registers. If your compiler doesn't do this, you're using something funny like a stack-based machine and I'm not going to get into that.

Analysis

This is where we detail what goes on in the execution and how this ties down the keyspace. The memory system sees a bunch of requests. We only see the first sixteen requests for key and some distribution of sbox according to the contents of key. The gross heuristic is as follows:
  • Cache miss --> sbox[i] is a new element, therefore one of the set of sbox elements we have not yet seen/used.
  • Cache hit --> sbox[i] is something we've seen before.
It's the set of things we've seen before (slowly grows from the empty set). Whether we get a hit or a miss, we can narrow the search space for sbox[i]. This implicity narrows the key space since that generated i. This'll make sense in a worked example.

The memory reference sequence starts with sbox[0]. This is a definite miss. The first thing. We know the content of this sbox element since the sbox is still in the initial state -- it was zero. We add this element to the list of things we've seen. If we get hits, we know it was in this set of things. The next reference is key[0], definitely a miss (the key references are all misses for the first sixteen, nothing to glean from these references at all).

The second reference is sbox[j]. At this stage we know that a miss means j!=0. So, we could say something simple like j is in the set {1,...,255}. However, we have an equation for j that puts a tight bound on j -- the assignment statement. This in turn puts a bound on key[0]. Basically, this means that we can definitely say that key[0] is in the set {1,...,255} which is ever so slightly smaller than the full eight bit search space we started with for key[0]. This will only improve as we continue. Finally, we add this sbox element to the things we've seen (with it's constraint).

A hit at this stage gives us an even tighter bound. We can say that j=0. More than that, from the assignment to j, we can say that key[0]=0 -- reducing the eight bits of search space for this key byte to zero. We don't add anything to the set of things seen, but we can reduce the space of key elements in the already worked search space. This is the first round, there are no other searched performed, so we can't simplify other stuff.

Iteration two is the important step, indicating the narrowing of the search space for the key components. The first reference is to sbox[1]. If this is a miss, we can say that sbox[1] remains the same, initial value of 1. It means that the miss for sbox[j] in the previous iteration did not reference element 1. If the previous iteration did not determine that key[0] was 0, only that it was in the set {1,...,255} then we can now say, since sbox[1] was missed, j in the previous iteration was also not 1. Transitively, this means that key[0] is now in the smaller set {2,...,255}. returning to this iteration, we can say that key[1] is in the set {2,...,255}. If, however, the previous iteration determined that key[0] was 0, then we can only say that key[1] is in the set {1,...,255}.

If the reference sbox[1] is a hit, it must be from the set of elements we have thus far seen. We have seen sbox[0] and potentially one other. In fact, we must have seen one other and on the first iteration, j was not zero. We must have then seen sbox[1]. This gives j=1 in the previous iteration. The swap operation means that sbox state now has swapped elements zero and one. That aside, this means key[0]=1. This will reduce it from the current larger set {1,...,255}. This reduction (from the set {iterator,...,255} to a subset of {0,...,iterator}) occurs for each hit after a miss.

So where is this going? This is a lot of talk for only getting into the second byte of the key. Using equations to describe constraints on parts of the key makes a much more comapct (but perhaps less easy to understand for lay-people). Equations are used to bind bits of the key to each other (this comes from the cipher algorithm) and their hit and miss behaviour defines their value (or range). You can automate the generation of candidate keys that are derived from knowledge of an algorithm and the cache profile. For exmaple, the CBA empirical study in the source publication uses only a couple of iterations of DES (doesn't need the full trace) to reduce the 56bit search space to eight candidate keys (that's a three bit search space). The cost of generating the candidate search space obviously has to be less than simply grunting through the search space that you would otherwise address with a brute force attack. Fortunately, this is the case.

Product

The product of this particular exercise is a regular-expression-like specification of the key. For each byte of the key, there will be a domain that it must be a member of. The first byte will have a domain of {0,...,255} or smaller (depending on feedback of later hit/miss correlation). Each of the other bytes will have smaller domains. Some key-bytes may be equal. With this expression for the candidate key set, you can then use the brute force approach to try each candidate in turn -- if you have no other options. The reference paper for CBA required only a couple of iterations (rounds in crypto speak) of hit/miss profile to reduce a 56bit DES keyspace to eight candidate keys. Brute forcing eight candidates clearly costs much less than 2^56 keys.

Shortcommings of the example

It may be pretty obvious that there are a few assumptions made for the sake of describing the idea. The one that's most worth discussing is the one-element-per-line. If we generalise this to 2^n-elements per line, then hits become more coarse in that instead of saying, for example, that a hit means key[0]=0, we instead say key[0] is in the set {0,1,2,3}. As for the other rough edges, I'll acknowledge that the reader is kind enough to let me use the example for illustration.

Conclusion

Like other side channel attacks this requires physical access to the device. It does not require tampering with the device tho'. Gross power analysis and timing will generate the hit/miss profile. CBA has been demonstrated (elsewhere) to considerably reduce the cost of breaking DES.

Side channel attacks have made the smart card community incredibly nervous and umpteen mechanisms are being developed to prevent them. Attacks on implementation will continue while the mathematical basis of the cipher remains strong. The physical tampering attacks have moved onto the less expensive 'scope & PC DPA attacks. It's also, unfortunately, sometimes cheaper to simply bribe someone to breach security. It will simply boil down to how much something is worth.

Log in or register to write something here or to contact authors.