Elementary sampling theory

Contents hide

The problems are from the Book Jaynes, Edwin T. Probability theory: The logic of science. Cambridge university press, 2003. Chapter 3 of the book is on the topic of Elementary sampling theory. These problems can be easily solved using the symbolic method of analytical combinatorics.

De-arrangement Problem

For more information on de-arrangement problem, please refer to https://en.wikipedia.org/wiki/Derangement.

In high school, we can solve this via Inclusion–exclusion principle. But now we have better tools: symbolic method of analytical combinatorics.

Permutations can be thought of a set of cycles. I first read this idea from Page 261 Knuth, Donald Ervin, and Oren Patashnik. Concrete mathematics: a foundation for computer science. Addison-Wesley, 2003, and fully developed in Flajolet, Philippe, and Robert Sedgewick. Analytic combinatorics. cambridge University press, 2009 for exponential generating functions.

P(z) = SET(CYL(z))

De-arrangement just adds constraints on the above symbolic construction: the inner cycles have to contain more than 1 element, and we have only $M$ urns. So we have $D(z) = SET(CYL_{>1}(z))=SET(CYL_2(z) + CLY_3(z)+\dots)$ .It is quite straightforward to get the exponential generating function:

D(z) = e^{\frac{z^2}{2}+\frac{z^3}{3}+\dots} = exp(\ln(\frac{1}{1-z})-z)=\frac{e^{-z}}{1-z}

The above generating function can be taken as $e^{-z}$ times $\frac{1}{1-z}$ , and the final sequence should be the convolution of the two parts’ corresponding sequence: $[z^M]D(z) = [z^M]e^{-z} \circledast [z^M]\frac{1}{1-z} = \frac{(-1)^M}{M!} \circledast \{1,1,1,\dots\}$ . Convolution with const sequence of $1$ is just partial summation of the original sequence. So we have

[z^M]D(z) = \sum_{k=0}^{M}\frac{(-1)^k}{k!}

Since we are using exponential generating function, we get the total numbers of de-arrangement when there are $M$ urns is $M!\sum_{k=0}^{M}\frac{(-1)^k}{k!}$ , so the probability that we have at least one ball matching the urn (not de-arrangement) is

1 - M!\sum_{k=0}^{M}\frac{(-1)^k}{k!} / M! = \sum_{k=1}^{M}\frac{(-1)^{k+1}}{k!}

Based on the above analysis, it is easy to show the asymptotic result: $1-\frac{1}{e}$ .

Word model

The balls in urns model can be taken as a sequence with length $M$ of set. So if there are no constraints, the symbolic method is $W_M(z) = SEQ_M(SET(z)) = (e^z)^M=e^{Mz}$ . That is why we have $N![z^N]W_M(z) = M^N$ ways.

Now consider one constraint: at least on ball in each urn. This will change the above symbolic method’s inner constructor to $SET_{\ge 1}(z)$ . Thus we will get a new exponential generating function:

$B_M(z) = SEQ_M(SET_{\ge 1}(z)) = (e^z-1)^M$ .

This leads to the number of possibility to be:

N![z^N]B_M(z) = M! \genfrac\{\}{0pt}{0}{N}{M}

And the probability is:

\frac{M! \genfrac\{\}{0pt}{0}{N}{M}}{M^N}

If $M > N$ , Stirling number of the second kind is $0$ , and this is consistent with our common sense.

Python script to check

We can use the following python script to verify the result roughly:

computed = {}

import math
from random import randint

def sterling2(n, k):
    key = str(n) + "," + str(k)
    if key in computed.keys():
        return computed[key]
    if n == k == 0:
        return 1
    if (n > 0 and k == 0) or (n == 0 and k > 0):
        return 0
    if n == k:
        return 1
    if k > n:
        return 0
    result = k * sterling2(n - 1, k) + sterling2(n - 1, k - 1)
    computed[key] = result
    return result

def prob(M, N):
    return math.factorial(M) * sterling2(N,M) / (M**N)

def mc_sim(M, N, r=1000000):
    def test_once():
        s = set(randint(1, M) for i in range(N))
        return int(len(s) == M)
    return float(sum(test_once() for i in range(r)))/r

if __name__ == "__main__":
    M = 5
    for N in range(5, 10):
        p_acc = prob(M, N)
        p_sim = mc_sim(M, N)
        diff = abs(p_acc - p_sim) / (p_acc) * 100
        print("For M=%s, N=%s, Formula computes: %.4lf, Simluation shows: %.4lf, Relative Diff is %.4lf%%" % (M, N, p_acc, p_sim, diff))

And a demo run output is shown below:

For M=5, N=5, Formula computes: 0.0384, Simluation shows: 0.0385, Relative Diff is 0.3880%
For M=5, N=6, Formula computes: 0.1152, Simluation shows: 0.1150, Relative Diff is 0.2014%
For M=5, N=7, Formula computes: 0.2150, Simluation shows: 0.2155, Relative Diff is 0.2316%
For M=5, N=8, Formula computes: 0.3226, Simluation shows: 0.3234, Relative Diff is 0.2536%
For M=5, N=9, Formula computes: 0.4271, Simluation shows: 0.4266, Relative Diff is 0.1052%

Math proof

(e^z-1)^M = \sum_{m=0}^M {M \choose m}e^{zm}(-1)^{(M-m)} \implies \\ [z^N](e^z-1)^M = \sum_{m=0}^M{M \choose m}(-1)^{(M-m)}[z^N]e^{zm} = \sum_{m=0}^M{M \choose m}(-1)^{(M-m)} \frac{m^N}{N!}

Then we can have

N! [z^N]B_M(z) = \sum_{m=0}^M{M \choose m}(-1)^{(M-m)} m^N

Then we just need to prove

\sum_{m=0}^M (-1)^{(M-m)} \frac{m^N}{m!(M-m)!} = \genfrac\{\}{0pt}{0}{N}{M}

The above equation is just equation 6.19 of Knuth’s concrete mathematical. It is easy to verify the equation satisfies the following recurrent relation:

\genfrac\{\}{0pt}{0}{N}{M} = M \genfrac\{\}{0pt}{0}{N-1}{M} + \genfrac\{\}{0pt}{0}{N-1}{M-1}

Pay more attention to the boundary values, we know it is just the Stirling number of second kind.

Thoughts on Computing

英雄无岁，江湖无辈

De-arrangement Problem

Word model

Python script to check

Math proof

Leave a ReplyCancel reply

De-arrangement Problem

Word model

Python script to check

Math proof

Leave a ReplyCancel reply

Discover more from Thoughts on Computing