A Data Dependency is roughly a condition where the value of a calculation in the future is based upon a calculation being made now. (or conversely a value of a calculation being made now depends on a value calculated in the past).

Data dependencies play a big part in much of Computer Science/Engineering. Data dependencies (data hazards) can be created in pipelined code, required data forwarding or stalling of a modern pipelined microprocessor. Data dependencies can cause race conditions in parallelized code. Also, compilers nowadays frequently try to unroll loops, and detecting data dependencies can allow the copiler to perform several optimizations.

There are four basic types of data dependency, frequently called different things based on what part of computer technology they are being used in.

  1. Flow Dependence - A value is written then read
  2. Anti Dependence - A value is read then written
  3. Output Dependence - A value is written, then written over
  4. Read Dependence - A value is read, then read again. It is not an issue, included for thoroughnes
Here is a code example of a flow dependence in pseudo C code.

do i 1 to n
   x[i] = y[i] + 1
   z[i] = x[i-1] + a[i]
end do
Note how the value x[3] will be written on one iteration and then read on the next. This flow dependency prevents the code from being trivially parallelizable.

Now here is an example of an anti-depedency.

do i 1 to n
   z[i] = x[i+1] + a[i]
   x[i] = y[i] + 1
end do
Again, note how x[3] is read in iteration "2" and then written in iteration "3".

Neither of the above examples can be directly parallelized. If you parallelize a large loop to make it run faster, you do not directly control which processor or thread runs which iteration. These two dependencies require that iteration "2" run before iteration "3", otherwise one of the calculations will recieve the wrong value of x[ ].

Log in or register to write something here or to contact authors.