Wirawan Winarto

Microsoft Student Partner
See also: Other Geeks@INDC

Shuffling Is Not Just Randomizing !!

Exam is over. Now I have time to write again.

We learnt about sorting. We learnt about searching. But almost none of us have ever had class who taught us about shuffling. The story went back days ago when some my classmates were developing applications. Some require us to generate shuffled sequences, such as : generating random song playlists, creating list of winning lottery numbers, or shuffling a deck of cards.

What surprised me is : most of them did it wrong. I guess some of us also have already fallen into the same mistakes without realizing it. Here are some of mistakes we possibly made :

Mistake One : Forget to Randomize Seeds

Without randomize(seed), we will get the same random numbers everytime we run it.


Mistake Two : Wrong Number Range

Calling a random(x), in most languages today, will return a random number between 0 and x-1. Some of my friends forget this. If they want to get a random number between 0 and x, then they should use random(x+1) instead.


Mistake Three : Not So Good Modulus

How to retrieve random indices in array? Even my teacher once coded like this :

X = random(MAXINT) mod ARRAY.LENGTH()

This one is incorrect despite the fact that it will return random number between 0 and last indice. Why? Because MAXINT is not always able to be divided evenly by array.length.

In a small array, it seems not a big deal. But in a bigger array, it can lead into bias.

For example, we have an array of 15000 elements. Assuming we use Int16, then it generates random number between 0 and 32766. That means the number 0 to 2766 are having 50% more chance to be selected than the rest!

We must avoid this because the correct algorithms should assure that each member has equal chances to be put in any positions. In a case like generating list of winning lottery numbers, this can be considered as unfair (a.k.a cheating).


Mistake Four : The Wrong Knuth

Some friends also did this when shuffling a deck of cards. The misintepreted algorithm is derived from Donald Knuth’s book – The Art of Computer Programming Vol 2 :

First it looks okay since every element has chance to be swapped with any elements. But think about it, if we loop 0 to array.length-1 as many as array.length times. Assuming we are having four elements, then there are 4 x 4 x 4 x 4 = 256 possible outputs.

In reality, there should be only 4! or 4 x 3 x 2 x 1 = 24 possible results. And since 256 cannot be divided evenly with 24, then there are must be some biases.

The correct one is what is described by Donald Knuth himself in his book :

Compare this with what I have written earlier. I guess you know what is wrong.


Mistake Five : Easy Way Is Not Easy

For those who think shuffling by swapping is very mistake-prone might consider of using easier algorithm. Many friends of mine use this easy method : assign a random value for each member in array, then sort them accordingly. No possible mistake, right?

I don’t think so.

Mistake happens when some assigned random numbers are in the same value. This also can lead into bias because a normal sorting algorithm does not randomly put items with the same value. This will cause a shuffled list is not “well shuffled”.

What if we use unstable sorting algorithm?

Still incorrect. Even though the positions of same-valued data in an unstable sorted list cannot be predicted, does not mean that it is random.


Mistake Six : Shuffle Again and Again

Not really a mistake, but just useless. People think if we shuffle a sequence again and again then it will produce more randomness. It is not true. If your algorithm is correct, then it will not make any difference (in theory of probability) whether you shuffle it once, twice, or ten thousands times!


Lucky for us, there are two good news :

  1. No need to take this too seriously, because producing a purely random number itself is impossible. Even random function in .NET is just a pseudo-random generator. (usually on other systems it is based on system clock, but I am not sure yet about .NET)

  2. Because the algorithm is expected to produce random results, those flaws are hard to detect. So, the good news is : most of biased algorithm just go unnoticed.


Compared to sorting and searching, shuffling is more poorly implemented by students. Maybe we should begin to teach them how to shuffle correctly.

Share this post: | | | |
Posted: Jan 23 2008, 09:30 PM by wirawan | with no comments
Filed under:

Comments

No Comments