## Elegant Algorithms using Randomization

Randomized algorithms are those algorithms that make use of something random (for example a random permutation of an array, a random number generator, etc) to calculate the answer fast with certainty or it gives the approximate answer which on an average case is correct.

Let us skip the philosophical/mathematical question of the ability of a computer to generate something truly random and assume that we have an RNG(random number generator) that provides a number between [a,b] with uniform distribution(for all practical purposes).

## Why should we use Randomized algorithms?

After all deterministic algorithms are so intuitive and predictable. We can understand their time complexities and predict the output quite easily. So what is the need for randomization? We use randomization to come up with faster algorithms (on an average case) or to avoid highly complicated deterministic algorithms. In some cases randomized algorithms are the only feasible solution to some problems.

Let me show you the beauty of Randomization with a few examples:

## Problem 1:

Problem Statement: Given an array of integers of length $N$ (where $N$ is even), you are given that there are $\frac{N}{2}$ integers that are distinct and $\frac{N}{2}$ integers that are the same (For example {$1,6,2,6,6,3$}, 3 distinct elements and 3 copies). Find the element that repeats $\frac{N}{2}$ times.

Deterministic Solution: You can run two nested for loops and check for every $(i,j)$ if $arr[i] == arr[j]$ in which case $arr[i]$ would be the answer. The time complexity of this algorithm is $O(N^2)$ which is not very efficient. This can be improved to $O(N)$with knowledge of hash-maps or hash-tables or even smart partitioning. But in any case it can be proved that any deterministic algorithm takes at least $O(N)$ time to solve this problem. That is because since our program is predictable we can give the array in such a way that it takes at least $\frac{N}{2}$ steps because the first $\frac{N}{2}$ integers are the unique ones.

Randomized Solution: The solution is quite elegant and much more simpler AND faster than the deterministic one. The solution is, take any 2 random elements of the array. If they are equal then we have found our repeated element, if they are not equal then just repeat the process again. The algorithm is just:

while(true){
int i1 = random_number(0,n-1), i2 = random_number(0,n-1);
if(i1 == i2) continue;
if(arr[i1] == arr[i2]){
cout << "Repeated element is: " << arr[i1] << endl;
break;
}
}


That’s it. No hash-map or any advanced concepts. This quite literally the simplest and fastest algorithm to solve the problem. It’s surely simple but why is it fast? I mean in theory this algorithm could run forever, imagine it just kept taking the distinct elements. But lets do some maths and find the probability with which our algorithm takes those distinct integers each time. Number of favourable pairs (pairs if chosen our program would end) = $(N/2)\cdot(N/2-1)$ Total number of pairs = $N^2$.

So the probability that our program would end in the first iteration is,(say) $P$= $\frac{N \cdot(N-2)}{4 \cdot N^2}$ = $\frac{1}{4} - \frac{1}{2 \cdot N}$ which is greater than $\frac{1}{5}$ for all $N \geq 10$, so $P \geq \frac{1}{5}$ which implies that the probability that the program does not end on the first step (say) $Q$ = $1-P$, so $Q$ $\leq 4/5$

So we have found out that if we only performed one iteration there is a $80$% chance that we would not get our desired answer. Yikes that’s extremely bad. Well whats the probability that we don’t get our answer after 2 steps? Its just $Q^2$ because there is no difference between the first and second iteration. and $Q^2 \leq 16/25$, so we have a $64$% chance that we don’t get our answer within 2 steps. Well its still bad but at least its better.

So let our program run for 100 iterations. Then $Q^{100} \leq 2\cdot 10^{-10}$ which is such a small number that for all practical purposes that this will never happen. So our algorithm is so fast that given an array of $2$ million integers any deterministic algorithms would take at least $1$ million steps to find the answer whereas the above one would require at most $100$ (for all practical purposes).

There is a separate method for denoting and calculating time complexities of randomised algorithms. But they are not straightforward so for brevity’s sake I will skip it.

## Problem 2:

Problem Statement

The solution is greedy. Algorithm: Let $str$ be the string we have inputted and $N$ be the length of the string. Here is the solution below. Only the replace() function needs to be written.

## Conclusion

Not all problems can be solved using randomization. But do keep a lookout on those that can optimised/simplified using randomness. This just scratches the surface of randomization.

There are data structures that use randomness called probabilistic data structures(Monte Carlo type) and other data structures like Treaps (Las Vegas type) that depend on randomness for its fast running time. Quick Sort is one of the fastest sorting algorithms because of the property of random numbers. There is also a class of algorithms called Genetic Algorithms(Monte Carlo type) that simulate survival-of-the-fittest concept to find the optimal answer. So the scope of this topic is quite large and highly practical.

• Wikipedia

• Computer Algorithms in C++, Computer Science Press, division of W.H. Freeman, New York, September 1996 (with S. Sahni and Sanguthevar Rajasekaran)