Let say it takes 6 minutes to do a computation on a set of data in a single processor. Then you have 3 processors altogether. Would your task finish in 2 minutes (6 divided by 3)? NO IT WON'T!
Why? Because your algorithm is a sequential algorithm. Even if with 100 processors your computation would still need 6 minutes to complete. To be able to exploit multiprocessors you would also need PARALLEL ALGORITHM. That said algorithm that was designed from the ground up with parallelism in mind.
Now let's see the other way around. Let say it takes 2 minutes to finish a computation using a parallel algorithm on 3 processors. Would it take 6 minutes if you run that algorithm in a single processor? Maybe. If you are lucky. Chances are it would take more than 6 minutes. Parallel algorithm that runs in a single processor may have worst performance compared to a sequential algorithm runs in a single processor (I'll show you the example on future posts). It is because a parallel algorithm was designed to run on multiprocessors, not on single processor. And it maybe the case that your parallel algorithm will choke in a single processor.
Summary of paragraphs above:
-
The number of processors is irelevant if your algorithm is not a parallel algorithm, you would need a parallel algorithm to exploit multiprocessors.
-
Parallel algorithm is superior only on multiprocessors, in single processor it may not run properly, or it will run with worst performance compared to its sequential algorithm counterpart.
So, algorithm is still the most important factor, more important than the number of processors itself.
It is also dangerous to believe that it is a good thing if a language construct/compiler/API can adapt/switch an algorithm to run on multiprocessors or single processor automatically. The optimal sequential and parallel algorithm for the same problem maybe totally different. The parallel algorithm works best on multiprocessors, and the sequential algorithm works best on single processor. If your language construct/API can make your parallel algorithm to run on a single processor, then it may not be better compared to algorithm designed with sequential in mind. Similar with that, if your algorithm was designed with sequential in mind, then using a language construct/API that automatically make it run on multiprocessors may not result program that runs faster than program with algorithm designed with parallelism in mind.
Again, there's no silver bullet. If you will use multiprocessors, design/use a parallel algorithm. If you use only a single processor, use the best sequential algorithm there is. It is the algorithm that matters, language construct/compiler/API is just a helper.
Design your algorithm properly!
Speed Up & Efficiency
Now I'd like to talk about some terms you may need to understand to explore further on parallelism.
Let's start with the definition of Speed Up. Speed Up is defined as:
SN = Ts/Tp
Where,
SN = Speed Up
Ts = Time taken to run the fastest sequential algorithm on a single processor
Tp = Time taken by a parallel algorithm on N processors
As one can see, Speed Up is determined by algorithm used. Not by number of processors. SpeedUp is about comparing a parallel algorithm runs on N processors with the fastest sequential allgorithm runs on a single processor.
Then we define Efficiency as follow:
EN = SN/N
Where,
EN = Efficiency of a parallel algorithm on N processors.
SN = Speed Up (as defined above).
N = number of processors.
As one can see, the efficiency here means the efficiency of the parallel algorithms. For a given number of processors, two different parallel algorithm may have their own Speed Up and Efficiency.
So, these metrics are measuring algorithms!
100% Efficiency is only achieved when Speed Up is equal with the number of processors, or in other words, when Time Taken by a parallel algorithm run on N processors is equal to 1/N time taken by fastest sequential algorithm. Which may never be achieved due to several factors that limit the Speed Up as follow:
-
Software Overhead; there will always be softare overhead, even with a completely equivalent algorithm with sequential algorithm. As an example, there may be additional index calculations needed by the manner in which data are "split up" among process. There is genrally more lines of code to be executed in parallel progam compared to sequential progam.
-
Load Balancing Overhead; Speed Up is generally limited by the slowest node. So, it is an important consideration to balance the load of work thru out all nodes. It also means a load balancing mechanism must be applied, and it means additional work.
-
Communication Overhead; It is clear that communicating data between processors degrades the Speed Up. So, it is important to keep all processors busy but at the same time reduce the amount of communications. Larger grain size of Tasks is generally better. There is a metric related to Communication cost called Machine Dependent Ratio (MDR) defined as:
MDR = Tcomm/calc
Where,
Tcomm = Time to transfer a single word (byte) between nodes
Tcalc = Time to perform some floating point calculation
The lower the MDR, the better it is. We become less dependent to the machine (lower communication cost).
-
Amdahl's Law; I will make a dedicated post on this. For now I can summarize it as follow: not all portion of algorithm can be made parallel. There is still a fraction that needs to stay sequential. So, parallelism is really be of help if the fraction that can be made parallel is bigger. If most portion cannot be made parallel, then parallelism is not so helpful on that kind of problem. In short, again, parallelism is not silver bullet.
As one can see, there are overheads you must pay for parallelism (Software Overhead, Load Balancing Overhead and Communication Overhead). Ensure what you pay is worth it. Parallelism is useful only if the Speed Up produced by a parallel algorithm is far bigger than these overheads you have to pay. In addition to that, Amdahl's Law helps you to stay aware whether the parallel fraction of your problem is significant. Otherwise, you make this parallel portion to run faster (with big efforts) but then you only get insignificant improvement in total (in case the sequential part is a lot bigger).
Then it is clear that n processors won't make your program runs n times faster. It depends on the algorithm you use, whether parallelism cost is cheaper than the Speed Up gain, and it depends to the nature of the problem, whether the parallel fraction is significant.
So, now you can be more critical to those who sells multicore machines or APIs that says parallelism can solve all your problems faster. They are just trying to sell you their machines or API.
Machines & APIs cannot compensate algorithm design ignorance.
Reference:
-
Seminar, Kudang B., "TM5 Speed Up & Efficiency", Lecture Notes, Deparment of Computer Science - Faculty of Mathematics and Natural Sciences, IPB, Bogor.