Quicksort is an elegant and efficient sorting algorithm. Analysis of quick sort is a lot of fun. The core part of quick sort algorithm is partition the original array into two parts. This part of code is shown below (in java):
private void quicksort(int[] a, int lo, int hi) {
if (hi <= lo) return;
int i = lo - 1, j = hi;
int t, v = a[hi];
while (true) {
while (a[++i] < v);
while (v < a[--j])
if (j == lo) break;
if (i >= j) break;
t = a[i]; a[i] = a[j]; a[j] = t;
}
t = a[i];
a[i] = a[hi];
a[hi] = t;
quicksort(a, lo, i - 1);
quicksort(a, i + 1, hi);
}
The 10th line of the above code is to swap two values.
Q: Can we analyze accurately how many times Line 10 will be executed if the data is uniformly distributed?
Modeling the problem
Let’s suppose the original array’s length is N and the WLOG, the data is a permutation of 1,2,3, \dots, N, uniformly.
After a full run of one partition, the array is divided into two parts, and the pivot is a[hi]. Since there are no duplicated values in the array, a[hi]
‘s final position is determined by its value. One general case can be shown as:
Then those values of the first k elements that are larger than the pivot a[hi]
should be swapped. And this is just the number of the 10th line of code’s execution times.
Let’s name the first k-1 elements partA and the last N-k elements partB. Let m be the number of elements in partA that are larger than the pivot, such that there are k-m-1 elements in partA that are smaller than the pivot.
And we know that a[hi]
is the kth largest elements so that in the whole array, there are k-1 elements smaller than the pivot and N-k elements larger than the pivot.
It is easy to verify that in partB, m elements are smaller than the pivot, N-k-m elements are larger than the pivot.
Compute the probability
Based on the pre-analysis in the previous section, we can formula the number of this kind of permutations like:
{N-k \choose m} {k-1 \choose k-1-m} (k-1)! (N-k)!- Term {N-k \choose m} means among all the N-k elements larger than the pivot, we want to choose m of them to put into partA
- Term {k-1 \choose k-1-m} means that, after the first step, partA only remains k-1-m slots, and we have to fill those slots with values smaller than the pivot. Among all the k-1 values that are smaller than the pivot, we want to choose k-1-m of them
- After the first two steps, we have divided values into all parts they should be in
- We are dealing with permutation problem, the last two terms are permutations
So the probability that the Line 10 to be executed m times is:
P(m) = \sum_{k}\frac{ {N-k \choose m} {k-1 \choose k-1-m} (k-1)! (N-k)!}{N!}And the expectation of m should be
\sum_{m}P(m)=\sum_{m}m\sum_{k}\frac{ {N-k \choose m} {k-1 \choose k-1-m} (k-1)! (N-k)!}{N!}The key to the above summation is one equation from Knuth’s concrete mathematic:
\sum_k {n \choose k} {s \choose k}k = s {n+s-1 \choose n-1}.
Then the remaining work is easy, finally we get the answer:
A: Line 10 will be executed \frac{N-2}{6} times