01 September 2011
Parallel computing brings some new issues to the table. In some cases, more is not always better.
As the computer industry becomes more parallel, Amdahl's law of parallel computing places limits on the speed-ups one can expect. Perhaps you recall "the law" but don't really think it applies to you. Well, in very short order, it will apply to everyone. As an example, the market is embracing parallel computing in almost every sector. The need for parallelism is due to the "Gigahertz Wall" facing the microprocessor manufacturers; the higher the clock frequency (measured in Gigahertz), the hotter processors become. Heat creates problems and can ultimately lead to premature chip failure. Rather than employ expensive and sophisticated cooling technologies, processor vendors have decided to "go wide" and add more cores to the processors. Instead of scaling up in clock frequency, vendors are scaling out with cores.
Aside from introducing a new (and complicated) software programming model, the multi-core approach also falls prey to what many in the High Performance Computing community have known for years. That is, you cannot keep adding cores to a problem and expect the speed-up to continue apace forever. This limit was noted by Gene Amdahl and is appropriately called Amdahl's Law. Mathematically it is expressed as follows:
Speedup = 1 / (S + (P/N))
where S is the serial portion of the program, P is the parallel portion of the program, and N is the number of parallel processors. Basically, Amdahl's law indicates that as N gets larger (adding cores), the parallel part of the program gets faster, and thus the serial portion begins to dominate the equation. In the above equation, the P/N term goes to zero as N, the number of processors, increases. Thus, it is a fancy way of saying that you can't go faster than your slowest step.
There is good news and bad news about Amdahl's's law. The good news is that for some problems, the P (parallel portion) scales independently of S, the serial portion. It is then possible to achieve very good speed-up on large problems because the serial portion is effectively reduced as the problem gets bigger. Often these programs show "linear" speed-up as processors (cores) are added. Keep in mind, however, that these highly scalable problems are often impractical to run on a small number of processors due to the size of the problem (amount of data) and single core processing speeds.
The bad news comes from cases where S is related to P. This situation is often overlooked by many programmers. For instance, suppose one examines a program and concludes "my program spends 5% of its time doing sequential processing and 95% of the time doing a loop that can be parallelized. If I use an eight node cluster, Amdahl's law states that I should get a six times speed-up." The programmer re-writes the program only to find the speed-up is actually less than four!
Clearly, something did not go as planned. After some analysis, the poor result can be attributed to an increase in the serial component (S). Programmers often forget about the parallelization overhead that is required before the parallel portions can begin their work. This overhead can also include serial work that must be done after the parallel portions are complete. Several factors can contribute to this overhead. These include the extra bookkeeping needed to create the parallel portions, communication overhead (moving data to and from the worker nodes), and final processing of parallel results.
The communication overhead is usually the largest contributor and is very dependent on the interconnect between cluster nodes. A fast interconnect can significantly reduce the communication overhead and improve speed-up. The communication overhead can grow as the problem size grows, and thus slow the speed-up. There are similar considerations for SMP (multi-core) systems as well. Instead of communication overhead, there is memory access overhead, which can take the form of locks and semaphores.
Understanding Amdahl's law will help set reasonable expectations rather than leaving you guessing your way through parallel computing. Because we now live and work in a "parallel computing world," there is a new law, and programmers have no choice but to obey it. The best option is to acknowledge and work with it rather than ignore it and hope for the best.


