Coupon collector’s problem: sample without replacement

Problem

About one year ago, in the post Coupon collector’s problem:An infinite-series perspective I come up with a solution to asymptotically answer the following problem:

The cardinality of the data is n, for each distinct group, there are a duplicates. Suppose we want to use hash aggregation algorithm to do some computation, each time you uniformly sample one data and push it in the hash table (capacity is just n), exactly right after sampling m data, for the first time you find that the hash table is full. What is the expectation of m?

The above problem is a model of sampling without replacement. One year ago’s solution is with replacement. Recently, I am reading the book Probability Theory: The Logic of Science. It gives me some idea so I come up with a solution.

Continue reading

Analysis of the number of swapping in partition of Quicksort

Quicksort is an elegant and efficient sorting algorithm. Analysis of quick sort is a lot of fun. The core part of quick sort algorithm is partition the original array into two parts. This part of code is shown below (in java):

private void quicksort(int[] a, int lo, int hi) {
    if (hi <= lo) return;
    int i = lo - 1, j = hi;
    int t, v = a[hi];
    while (true) {
        while (a[++i] < v);
        while (v < a[--j])
            if (j == lo) break;
        if (i >= j) break;
        t = a[i]; a[i] = a[j]; a[j] = t;
    }
    t = a[i];
    a[i] = a[hi];
    a[hi] = t;
    quicksort(a, lo, i - 1);
    quicksort(a, i + 1, hi);
}
Continue reading

Coupon collector’s problem:An infinite-series perspective

Background

If you are not familiar with Coupon collector’s problem, please first read the following page:

Six years ago when I was in graduate school, I encountered this problem. I did not want to use Indicator variable method to solve it, so I tried my best to solve it using infinite series. I came up with a complicated formula which was not accurate and I am not satisfied because this means it is wrong.

the wrong formula of six years ago
Continue reading

Jump Consistent Hash Algorithm

One of my colleagues introduced me a new consistent hash algorithm: jump consistent hash. It is a concise algorithm, just in 5 line of code. The paper can be found here:  A Fast, Minimal Memory, Consistent Hash Algorithm. The proofs in the paper are not that detailed, so in this blog, I will try to provide a more formal way to help you understand it and build it from scratch.

Continue reading