Analysis of the number of swapping in partition of Quicksort

Quicksort is an elegant and efficient sorting algorithm. Analysis of quick sort is a lot of fun. The core part of quick sort algorithm is partition the original array into two parts. This part of code is shown below (in java):

private void quicksort(int[] a, int lo, int hi) {
    if (hi <= lo) return;
    int i = lo - 1, j = hi;
    int t, v = a[hi];
    while (true) {
        while (a[++i] < v);
        while (v < a[--j])
            if (j == lo) break;
        if (i >= j) break;
        t = a[i]; a[i] = a[j]; a[j] = t;
    }
    t = a[i];
    a[i] = a[hi];
    a[hi] = t;
    quicksort(a, lo, i - 1);
    quicksort(a, i + 1, hi);
}

The 10th line of the above code is to swap two values.

Q: Can we analyze accurately how many times Line 10 will be executed if the data is uniformly distributed?

Modeling the problem

Let’s suppose the original array’s length is N and the WLOG, the data is a permutation of 1,2,3, \dots, N, uniformly.

After a full run of one partition, the array is divided into two parts, and the pivot is a[hi]. Since there are no duplicated values in the array, a[hi]‘s final position is determined by its value. One general case can be shown as:

\underbrace{\dots}_\text{k-1} a[h_i] \underbrace{\dots}_\text{N-k}

Then those values of the first k elements that are larger than the pivot a[hi] should be swapped. And this is just the number of the 10th line of code’s execution times.

Let’s name the first k-1 elements partA and the last N-k elements partB. Let m be the number of elements in partA that are larger than the pivot, such that there are k-m-1 elements in partA that are smaller than the pivot.

And we know that a[hi] is the kth largest elements so that in the whole array, there are k-1 elements smaller than the pivot and N-k elements larger than the pivot.

It is easy to verify that in partB, m elements are smaller than the pivot, N-k-m elements are larger than the pivot.

Compute the probability

Based on the pre-analysis in the previous section, we can formula the number of this kind of permutations like:

{N-k \choose m} {k-1 \choose k-1-m} (k-1)! (N-k)!
  • Term {N-k \choose m} means among all the N-k elements larger than the pivot, we want to choose m of them to put into partA
  • Term {k-1 \choose k-1-m} means that, after the first step, partA only remains k-1-m slots, and we have to fill those slots with values smaller than the pivot. Among all the k-1 values that are smaller than the pivot, we want to choose k-1-m of them
  • After the first two steps, we have divided values into all parts they should be in
  • We are dealing with permutation problem, the last two terms are permutations

So the probability that the Line 10 to be executed m times is:

P(m) = \sum_{k}\frac{ {N-k \choose m} {k-1 \choose k-1-m} (k-1)! (N-k)!}{N!}

And the expectation of m should be

\sum_{m}P(m)=\sum_{m}m\sum_{k}\frac{ {N-k \choose m} {k-1 \choose k-1-m} (k-1)! (N-k)!}{N!}

The key to the above summation is one equation from Knuth’s concrete mathematic:

\sum_k {n \choose k} {s \choose k}k = s {n+s-1 \choose n-1}.

Then the remaining work is easy, finally we get the answer:

A: Line 10 will be executed \frac{N-2}{6} times

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.