Superscalability is a phenomenon which is occasionally observed when parallelizing an application (i.e. taking an application designed to run on a single processor and changing it to take advantage of parallel processors (i.e. multiple processors)). When a program runs more than N times faster on N processors than it does on one processor then it has exhibited superscalability1. Such behaviour not only appears to violate Amdahl's Law, it's just plain weird and can easily cause one to make many MANY carefully measured runs of the program before one is convinced that what one is seeing is actually happening and not just a case of bad luck (e.g. the serial version was slowed down because the processor was "distracted" in some unexplained way).

Upon closer analysis, the explanation for the superscalable behaviour of the program is usually (always?) that the parallel version of the program benefited from cache effects that weren't available to the scalar version. Specifically, the performance of the serial version suffered because the cache on the single processor wasn't large enough to hold an "appropriate" amount of the program's data (i.e. the processor had to spend an excessive amount of time re-fetching from main memory data which was recently in the processor's cache). In contrast, the parallel version gained an "unfair" advantage as the cache on each of the individual processors was large enough to hold an "appropriate" amount of the portion of the program's data assigned to each processor. The end result is that the program might run more than N times faster on N processors than it does on one processor (it might not - see Amdahl's Law for more information).


  • if a program exhibits superscalability then it is often possible to improve the performance of the serial version by restructuring it to properly take into account the size of the cache in a single processor. This isn't always possible as the parallel version really is running on a "conceptual" processor with a larger cache than the serial version (i.e. there may be fundamental advantages to be gained by running with a larger cache vs running with a smaller cache).

  • a program which exhibits superscalability isn't really violating Amdahl's Law as the parallel program isn't actually doing the same work as the serial version (i.e. the serial version is having to perform more fetches from main memory than the parallel version). Amdahl's Law only applies if all of the work performed by the serial version of the program is also performed by the parallel version.

  • a program might, of course, benefit from the sort of cache effects described above without actually running more than N times faster on N processors than it runs on one processor (due to the impact of Amdahl's Law or just plain sloppy program design and/or implementation).

1Obviously, the comparison is meaningless unless all the processors are equivalent (i.e. same clock speed, same model, same memory subsystem, etc. etc. etc).


  • Personal experience.