Going Parallel? See how big your Serial Fraction First!
As promised in my earlier post, here I'd like to talk about Amdahl's Law, thru which you'll get the formal idea that speed up by parallelism only really works for certain kind of problems. Problems which algorithms to solve have small serial fractions. My goal with this post is to give you a fair understanding that you won't subscribe to the overrated view on parallelism, especially from those who sell multicore machines and Parallel APIs.
Another thing is that to make you see that learning and optimizing sequential algorithm is still very important even if we have all multicore machines in the market.
First, let's look at what is defined as Serial Fraction.
In an algorithm to solve a problem, there is always part that has to be done sequentially (serial), and there is part that can be made to run in parallel. So, the time needed to run this algorithm on a single processor can be expressed as follow:
TSeq = S + P
Where,
TSeg = Time spent to run an algorithm on a single processor
S = Time spent on the sequential part of the algorithm on a single processor
P = Time spent on the parallel part of the algorithm on a single processor
Now, if you run the algorithm in N processors you would have:
TPar = S + (P/N)
Note that the number of processors only affect to the parallel part of the algorithm.
Then the Speed Up that you will get is:
SN = TSeq/TPar
= (S+P) / (S+(P/N))
Lets back to TSeq = S + P, we divide both sides by TSeq we will have:
1 = (S/TSeq) + (P/TSeq)
Now we define S/TSeq as Serial Fraction and name it F.
F = S/TSeq
We now have 1 = F + (P/TSeq) or we can put it this way:
P = (1-F)*TSeq
Let's substitute this back to Speed Up, so we have:
SN = (F*TSeq + (1-F)*TSeq) / (F*TSeq + ((1-F)*TSeq)/N)
= (F + (1-F)) / (F + (1-F)/N)
SN = 1 / (F + (1-F)/N)
Here we have Speed Up as a function of Serial Fraction and number of processors.
So,
-
Ideal Speed Up only happens if F = 0 (there's no Serial Fraction in the algorithm) as it will give you N times Speed Up.
-
If F = 1 (the whole part is Serial), then no matter how many processors you use you won't get any Speed Up, as the Speed Up is 1.
As an example, your program is made up of 0.7 Serial Fraction and 0.3 Parallel Fraction. You use 4 processors to run this algorithm (as if your Parallel Fraction is reduced into 1/4th, you get Speed Up by 1/(0.7+(0.3/4)) that is 1.29 times faster compared to if it was run on a single processor. If you can tweak your Serial Fraction by come up with better Sequential Algorithm and let say you only get it twice faster (as if your Serial Fraction is reduced by half), then even if with still just one processor you would have 1/(0.35+0.3) Speed Up that is 1.54. Faster. Better than running it on 4 processors without working on the sequential algorithm.
What does it tell you?
-
Parallelism only provides significant Speed Up if your algorithm has small Serial Fraction. The bigger the Serial Fraction, the more the number of processors become not really matters.
-
Even if in this new era of multicore machines in the market, it is still important to learn and optimize sequential algorithms (as there are many problems that need to be solve in sequential by nature).
In other words, for problems that has bigger Serial Fraction, working on the algorithm introduce more siginificant speed up compared to adding more processors. As I always say, algorithm is always be the corner stone of computing. Not machines, not Products/APIs, not programming languages. It is Algorithm.
Please note as well that we're still not considering load balancing and communication overhead here. If we include them, then we will get lower Speed Up for the Parallel Fraction. The P/N is too ideal.
So, put it all in one sentence: Speed Up of Parallel Algorithm is effectively limited by the number of operations that need to be done sequentially, i.e. its Serial Fraction.
That's Amdahl's Law!