快速排序性能分析——Analytical Combinatorics方法

多键值排序是把快速排序(Quick Sort)和基数排序(Radix Sort)结合起来的算法。论文为 Fast Algorithms for Sorting and Searching Strings (J. Bentley, R. Sedgewick) 。Greenplum 6里面有实现,本文是我自己按论文搜索资料,一步一步复习、学习、整理的记录。

教授的讲座资料:

Generating Functions基本准备

A(z) = \sum_{n=0}^{\infty} A_n z^n 是序列A_n对应生成函数。显而易见的生成函数和序列对应:

\frac{1}{1-z} \iff \{1,1,\dots,\} ; \frac{1}{1-z} = \sum_{n=0}^{\infty} z^n

还有一些重要的微积分性质和卷积性质:

A(z) = \sum_{n=0}^{\infty} A_n z^n \implies A'(z) = \sum_{n=1}nA_nz^{n-1} \implies \\ zA'(z) = \sum_{n=1}nA_nz^n = \sum_{n=0}nA_nz^n \implies \\
zA'(z) \sim nA_n; zA'(z)+A(z) \sim (n+1)A_n \\
\implies z\frac{d}{dz}\Big(\frac{1}{1-z}\Big)  = \frac{z}{(1-z)^2}\sim n; \frac{1}{(1-z)^2} \sim (n+1) \\
\implies z\frac{d}{dz}\Big( \frac{1}{(1-z)^2} \Big) = \frac{2z}{(1-z)^3} \sim n(n+1)
A(z) = \sum_{n=0}^{\infty}A_nz^n \implies \int_{0}^z A(z) dz = \sum_{n=0}^{\infty} A_n \int_{0}^z z^n dz = \sum_{n=0}^{\infty} A_n\frac{1}{n+1}z^{n+1} \\
 \implies  \int_{0}^z A(z) dz  \sim \frac{a_{n-1}}{n} (n>0, \text{otherwise }0)
A(z) \sim A_n, B(z) \sim B_n \implies A(z)B(z) = \Big(\sum_{n=0}^{\infty}A_nz^n\Big)\Big( \sum_{k=0}^{\infty}B_kz^k \Big) \\ \implies A(z)B(z) = \sum_{n=0}^{\infty} (\sum_{k=0}^{n}A_kB_{n-k})z^n \iff A(z)B(z) \sim \sum_{k=0}^{n}A_kB_{n-k} \\
\implies \frac{1}{1-z}A(z) \sim \sum_{k=0}^nA_k

快速排序的分析——Generating Functions

参考资料(资料的reference还有资料):

快速排序是典型的分治策略的计算机算法,通过线性复杂度的划分算法找到枢纽点,然后递归。号称纯函数编程语言只需要三行就可以实现一个快速排序(当然性能不够优化),下面的例子是Erlang

quicksort(L) when length(L) =< 1 -> L;
quicksort([P|L]) ->
   Left = [X || X <- L, X < P],
   Right = [X || X <- L, X >= P],
   quicksort(Left) ++ [P] ++ quicksort(Right).

快速排序的算法性能分析的递推式子(具体数学,公式2.12)

C_0 = 0\\
C_n = n + 1 + \frac{2}{n}\sum_{k=0}^{n-1}C_k

我们利用生成函数的办法,看看能不能得到和具体数学里一模一样的结果:

nC_n = (n+1)n + 2\sum_{k=0}^{n-1}C_k \implies \\
nC_n + 2C_n = (n+1)n + 2\sum_{k=0}^{n}C_k \text{   }(\text{这个式子n=0也满足}) \implies \\
zC'(z) + 2C(z) = \frac{2z}{(1-z)^3} + \frac{2}{1-z}C(z) \\
\implies C'(z) = \frac{2C(z)}{1-z} + \frac{2}{(1-z)^3}

上面最后是一个典型的常微分方程(可以参考https://en.wikipedia.org/wiki/Ordinary_differential_equation)对应的模式的:

  • P(z) = \frac{-2}{1-z}
  • Q(z) = \frac{2}{(1-z)^3}

按照通解的公式先求:

\int^{z} P(z) dz = \int^{z} \frac{-2}{1-z} dz = \ln{(1-z)^2}

再求

\int^z e^{\int^zP(z)dz} Q(z) dz = \int^z (1-z)^2 \frac{2}{(1-z)^3} dz = 2\ln{\frac{1}{1-z}}

因此通解是

\frac{1}{(1-z)^2}(2\ln{\frac{1}{1-z}}+ \text{Const})

结合边界条件C_0=0,可以得出上面的Const是0, 所以有:

C(z) = \frac{2}{(1-z)^2}\ln{\frac{1}{1-z}}

我们用wolframalpha展开看看结果:

根据具体数学(2.14) C_n = 2(n+1)H_n-2n, 结合上图,n\le 6是全部吻合的。其中H_n = 1 + \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{n} , n > 0

由于\int_{0}^z \frac{1}{1-z} dz = \ln{\frac{1}{1-z}},可以知道:

B(z) = \frac{1}{1-z}\ln{\frac{1}{1-z}} = \sum_{n=1}^{\infty} H_n z^n \implies B_n = H_n, n \ge 1 \\
\implies \frac{1}{1-z}B(z) \sim \sum_{k=0}^nB_k=\sum_{k=1}^nH_k

即利用生成函数得到的复杂度的通项是C_n=2\sum_{k=1}^{n}H_k,交换求和顺序可以证明上面结果等价。利用Euler–Maclaurin formula和斯特林公式,可以获取渐进结果。

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.