# Elementary sampling theory

The problems are from the Book Jaynes, Edwin T. Probability theory: The logic of science. Cambridge university press, 2003. Chapter 3 of the book is on the topic of Elementary sampling theory. These problems can be easily solved using the symbolic method of analytical combinatorics.

## De-arrangement Problem

In high school, we can solve this via Inclusion–exclusion principle. But now we have better tools: symbolic method of analytical combinatorics.

Permutations can be thought of a set of cycles. I first read this idea from Page 261 Knuth, Donald Ervin, and Oren Patashnik. Concrete mathematics: a foundation for computer science. Addison-Wesley, 2003, and fully developed in Flajolet, Philippe, and Robert Sedgewick. Analytic combinatorics. cambridge University press, 2009 for exponential generating functions.

P(z) = SET(CYL(z))

De-arrangement just adds constraints on the above symbolic construction: the inner cycles have to contain more than 1 element, and we have only M urns. So we have D(z) = SET(CYL_{>1}(z))=SET(CYL_2(z) + CLY_3(z)+\dots).It is quite straightforward to get the exponential generating function:

D(z) = e^{\frac{z^2}{2}+\frac{z^3}{3}+\dots} = exp(\ln(\frac{1}{1-z})-z)=\frac{e^{-z}}{1-z}

The above generating function can be taken as e^{-z} times \frac{1}{1-z}, and the final sequence should be the convolution of the two parts’ corresponding sequence: [z^M]D(z) = [z^M]e^{-z} \circledast [z^M]\frac{1}{1-z} = \frac{(-1)^M}{M!} \circledast \{1,1,1,\dots\}. Convolution with const sequence of 1 is just partial summation of the original sequence. So we have

[z^M]D(z) = \sum_{k=0}^{M}\frac{(-1)^k}{k!}

Since we are using exponential generating function, we get the total numbers of de-arrangement when there are M urns is M!\sum_{k=0}^{M}\frac{(-1)^k}{k!}, so the probability that we have at least one ball matching the urn (not de-arrangement) is

1 - M!\sum_{k=0}^{M}\frac{(-1)^k}{k!} / M! = \sum_{k=1}^{M}\frac{(-1)^{k+1}}{k!}

Based on the above analysis, it is easy to show the asymptotic result: 1-\frac{1}{e}.

## Word model

The balls in urns model can be taken as a sequence with length M of set. So if there are no constraints, the symbolic method is W_M(z) = SEQ_M(SET(z)) = (e^z)^M=e^{Mz} . That is why we have N![z^N]W_M(z) = M^N ways.

Now consider one constraint: at least on ball in each urn. This will change the above symbolic method’s inner constructor to SET_{\ge 1}(z). Thus we will get a new exponential generating function:

B_M(z) = SEQ_M(SET_{\ge 1}(z)) = (e^z-1)^M.

This leads to the number of possibility to be:

N![z^N]B_M(z) = M! \genfrac\{\}{0pt}{0}{N}{M}

And the probability is:

\frac{M! \genfrac\{\}{0pt}{0}{N}{M}}{M^N}

If M > N, Stirling number of the second kind is 0, and this is consistent with our common sense.

### Python script to check

We can use the following python script to verify the result roughly:

computed = {}

import math
from random import randint

def sterling2(n, k):
key = str(n) + "," + str(k)
if key in computed.keys():
return computed[key]
if n == k == 0:
return 1
if (n > 0 and k == 0) or (n == 0 and k > 0):
return 0
if n == k:
return 1
if k > n:
return 0
result = k * sterling2(n - 1, k) + sterling2(n - 1, k - 1)
computed[key] = result
return result

def prob(M, N):
return math.factorial(M) * sterling2(N,M) / (M**N)

def mc_sim(M, N, r=1000000):
def test_once():
s = set(randint(1, M) for i in range(N))
return int(len(s) == M)
return float(sum(test_once() for i in range(r)))/r

if __name__ == "__main__":
M = 5
for N in range(5, 10):
p_acc = prob(M, N)
p_sim = mc_sim(M, N)
diff = abs(p_acc - p_sim) / (p_acc) * 100
print("For M=%s, N=%s, Formula computes: %.4lf, Simluation shows: %.4lf, Relative Diff is %.4lf%%" % (M, N, p_acc, p_sim, diff))



And a demo run output is shown below:

For M=5, N=5, Formula computes: 0.0384, Simluation shows: 0.0385, Relative Diff is 0.3880%
For M=5, N=6, Formula computes: 0.1152, Simluation shows: 0.1150, Relative Diff is 0.2014%
For M=5, N=7, Formula computes: 0.2150, Simluation shows: 0.2155, Relative Diff is 0.2316%
For M=5, N=8, Formula computes: 0.3226, Simluation shows: 0.3234, Relative Diff is 0.2536%
For M=5, N=9, Formula computes: 0.4271, Simluation shows: 0.4266, Relative Diff is 0.1052%


### Math proof

(e^z-1)^M = \sum_{m=0}^M {M \choose m}e^{zm}(-1)^{(M-m)} \implies \\ [z^N](e^z-1)^M = \sum_{m=0}^M{M \choose m}(-1)^{(M-m)}[z^N]e^{zm} = \sum_{m=0}^M{M \choose m}(-1)^{(M-m)} \frac{m^N}{N!}

Then we can have

N! [z^N]B_M(z) = \sum_{m=0}^M{M \choose m}(-1)^{(M-m)} m^N

Then we just need to prove

\sum_{m=0}^M (-1)^{(M-m)} \frac{m^N}{m!(M-m)!} = \genfrac\{\}{0pt}{0}{N}{M}

The above equation is just equation 6.19 of Knuth’s concrete mathematical. It is easy to verify the equation satisfies the following recurrent relation:

\genfrac\{\}{0pt}{0}{N}{M} = M \genfrac\{\}{0pt}{0}{N-1}{M} + \genfrac\{\}{0pt}{0}{N-1}{M-1}

Pay more attention to the boundary values, we know it is just the Stirling number of second kind.

This site uses Akismet to reduce spam. Learn how your comment data is processed.