In computer systems, Amdahl's Law will show how much performance increase one can expect from any kind of enhancement; improved CPU architecture, optimising compiler technology, parallelism, higher speed memory, or anything else. The law itself is fairly intuitive, but it's results are surprising and often very disappointing.

Simply stated, assuming one can test a computer system without the enhancement, where E is the fraction of the original execution time that could influenced by the improvement, and S is the expected speedup given by the improvement, the overall speed-up is given by:

1/((1-E)+(E/S))

To give an example, suppose that 95% of our execution time is due to a program segment that can be improved by using a smarter compiler that will give it a 100x speed-up. This seems like quite a generous and unlikely situation, and one would expect a massive, near 100x overall speed-up. But, substituting E=0.95 and S=100 gives a speed-up of only 17 times. With S=10,000, we only get a 20x boost! Ack!

Qualitatively, one might say that as we improve the performance of the original 95%, we increase the significance of the unimproved 5%.

It's not too hard to apply this law to your software development tasks. Once you've coded your program and you've begun testing its performance, this law really comes into play.

The use of a profiler is how you gather the measurements of how much of the total execution time of a program is spent in each segment. Thus by combining this law with the output of your profiler, you can make better decisions about what parts of your program to optimize most aggressively.

The theorem that has come to be known as Amdahl's Law was first described by Gene Amdahl in his 1967 paper titled "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities". The key statement appears to be:
If F is the fraction of a calculation that is sequential, and (1-F) is the fraction that can be parallelised, then the maximum speed-up that can be achieved by using P processors is 1/(F+(1-F)/P).
A few examples are probably in order:
  • if 90% of a calculation can be parallelized (i.e. 10% is sequential) then the maximum speed-up which can be achieved on 5 processors is 1/(0.1+(1-0.1)/5) or roughly 3.6 (i.e. the program can theoretically run 3.6 times faster on five processors than on one).
  • If 90% of a calculation can be parallelized then the maximum speed-up on 10 processors is 1/(0.1+(1-0.1)/10) or 5.3 (i.e. investing twice as much hardware speeds the calculation up by about 50%).
  • If 90% of a calculation can be parallelized then the maximum speed-up on 20 processors is 1/(0.1+(1-0.1)/20) or 6.9 (i.e. doubling the hardware again speeds up the calculation by only 30%).
  • if 90% of a calculation can be parallelized then the maximum speed-up on 1000 processors is 1/(0.1+(1-0.1)/1000) or 9.9 (i.e. throwing an absurd amount of hardware at the calculation results in a maximum theoretical (i.e. actual results will be worse) speed-up of 9.9 vs a single processor).
  • if 99% of a calculation can be parallelized then the maximum speed-up on 1000 processors is 1/(0.01+(1-0.01)/1000) or 91.
  • if 99.9% of a calculation can be parallelized then the maximum speed-up on 1000 processors is 1/(0.001+(1-0.001)/1000) or 500.
The point that Amdahl was trying to make was that using lots of parallel processors was not a viable way of achieving the sort of speed-ups that people were looking for. i.e. it was essentially an argument in support of investing effort in making single processor systems run faster.


There are at least two somewhat colloquial ways of expressing the essence of Amdahl's Law but in a more general context:

The performance of any system is constrained by the speed or capacity of the slowest point.

For example, if the "system" is assembling cars and the "system" is able to build one car body every ten minutes and four wheels every five minutes then speeding up the production of wheels won't make any difference to how fast the "system" is able to produce completed cars (assuming that each car requires four wheels).

The point of constraint of a system is often referred to as the system's bottleneck.

and
The impact of an effort to improve the performance of a program is primarily constrained by the amount of time that the program spends in parts of the program not targeted by the effort.

Realizing that this is true is a VERY sobering experience (see below).

Roughly speaking, this is the form of Amdahl's Law described by spiregrain above (see his writeup for details).


Speaking from personal experience, both as a developer of software on 1000+ processor systems and as someone training people to develop/port software to 1000+ processor systems, developing a true understanding of Amdahl's Law is a very sobering experience. The reasons for this include:

  • being able to parallelize 99.9% of any program is often a MASSIVE undertaking and a program which is 99.9% parallel has a maximum theoretical speed-up of only 500 on 1,000 processors.

    If the goal is to achieve a speed-up of close to 1000 on 1000 processors then one must parallelize essentially the entire program. For example, parallelizing 99.9% of the program will yield a maximum theoretical speed-up of only about 500 on 1000 processors while parallelizing 99.99% will only get you a maximum theoretical speed-up of about 900 and parallelizing 99.999% (i.e. an impossible task in almost all situations) will get you a MAXIMUM theoretical speed-up of about 990.

  • Amdahl's Law is a statement of the MAXIMUM THEORETICAL SPEED-UP you can ever hope to achieve. The actual speed-ups are always less than the speed-up predicted by Amdahl's Law.

    The reasons why the actual speed-up is (almost) always less than the theoretical speed-up are many including:

    • distributing work to the parallel processors and collecting the results back together is extra work required in the parallel version which isn't required in the serial version (i.e. the parallel version has to do more computation than the serial version which results in an increased execution time for the actual parallel version over the theoretical speed of the parallelized serial version).

    • in order to reduce the amount of time spent communicating with other processors, the program is often modified so that certain computations which were performed once in the serial code are performed once by each of the parallel processors. The end result is that these computations are, for all practical purposes, running at the speed of a single processor (i.e. these computations represent parts of the program which havn't been parallelized).

    • even when the program is executing in the parallel parts of the code, it is unlikely that all of the processors will be computing all of the time as some of them will likely run out of work to do before others are finished their part of the parallel work. This is called the straggler problem because some of the processors are straggling behind when others have finished their work.

      As more and more processors are assigned to the program, the total amount of time wasted by processors which run out of work increases quite rapidly. For example, one processor out of 1,000 with work left to do means that the other 999 processors are not contributing to the speed of the program. Again, this effectively means that less of the program has actually been parallelized than might originally have been thought.

The bottom line is very simple: anyone intending to write parallel software simply MUST have a very deep if not intuitive understanding of Amdahl's Law if they are to avoid having unrealistic expectations of what parallelizing a program/algorithm can achieve and if they are to avoid underestimating the effort required to achieve their performance expectations.


Sources

  • Gene Amdahl, "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities", AFIPS Conference Proceedings, (30), pp. 483-485, 1967
  • personal experiences as a developer (see Myrias Research Corporation) and as a trainer of developers of parallel software on massively parallel computer systems

Log in or registerto write something here or to contact authors.