Analysis of the number of swapping in partition of Quicksort

Quicksort is an elegant and efficient sorting algorithm. Analysis of quick sort is a lot of fun. The core part of quick sort algorithm is partition the original array into two parts. This part of code is shown below (in java):

private void quicksort(int[] a, int lo, int hi) {
    if (hi <= lo) return;
    int i = lo - 1, j = hi;
    int t, v = a[hi];
    while (true) {
        while (a[++i] < v);
        while (v < a[--j])
            if (j == lo) break;
        if (i >= j) break;
        t = a[i]; a[i] = a[j]; a[j] = t;
    }
    t = a[i];
    a[i] = a[hi];
    a[hi] = t;
    quicksort(a, lo, i - 1);
    quicksort(a, i + 1, hi);
}
Continue reading

Understand distkey in Greenplum

Greenplum database is the world’s leading open-source MPP database. MPP database means a distributed system. They very first thing you have to deal with in distributed system is that how to divide data into pieces and split them around machines. Greenplum supports different kinds of data distribution policy: hash, random, replicated. Among these kinds of policy, hash-distribution is the most important one since many intermediate results are hash-distributed. Please visit https://greenplum.cn/ for more details.

Continue reading

On dynamic hash expansion

Dynamic hash table will expand when it is full. If we simply use modulo to map hash value to the entry position, there are some properties on the data movement while expanding.

Let’s start with some questions:

  1. If at the very first, the table size is n, and for the entry at ith place ( k \mod n = i), what is the possible place for it after expanding?
  2. Can we describe the moving process?
Continue reading

Coupon collector’s problem:An infinite-series perspective

Background

If you are not familiar with Coupon collector’s problem, please first read the following page:

Six years ago when I was in graduate school, I encountered this problem. I did not want to use Indicator variable method to solve it, so I tried my best to solve it using infinite series. I came up with a complicated formula which was not accurate and I am not satisfied because this means it is wrong.

the wrong formula of six years ago
Continue reading

锅哥的12小时飓风赛复盘

4月12日参加了第一次斯巴达12小时的飓风赛,反正是完成了,比较虐。想求得的答案也求得了。下面讲细细的回味这个完整过程的来龙去脉。

从《士兵突击》开始

2017~2018的春节,我看了《红海行动》,还挺喜欢的,然后看影评有人说挺多《士兵突击》剧组出来的,比如张译,然后发现《士兵突击》有人可以刷十几遍。顿时来了兴趣,然后就入了兰晓龙的坑,也开始一遍遍刷《士兵突击》和《我的团长我的团》。

Continue reading

锅哥的年夜饭

小慕慕才五个月,锅哥今年又在北京过年。独在异乡为异客,通过年夜饭来遥望家乡。

  • 冷盘:酱牛肉
  • 汤:花胶炖鸡
  • 热菜
    • 干烧臭鳜鱼
    • 海参小米粥
    • 白灼大虾
    • 冬笋炒火腿
    • 徽州炒粉丝
    • 清炒小油菜
    • 莴笋炒肉片
  • 主食:
    • 米饭
    • 徽州豆黄粿
    • 烧饼

Prolog Examples: Serialize DMLs for GreenplumDB

BackGround

Greenplum DB is an open-source massively parallel data platform for analytics, machine learning and AI.

Greenplum DB 6.0 is close to release and in GPDB6, Global Deadlock Detector (GDD) is introduced so that DMLS (insert, update, delete) on the same table can be executed concurrently. According to a performance test report by Alibaba:

Alibaba in China has done some initial testing on OSS GP in preperation for Greenplum 6.  Here is the test result:

TEST CASE: Each transaction 3 updates, 1 select, 1 insert
  * Greenplum 5 , Transactions per Second: 15
* Greenplum 6,  Transactions per Second: 2900


Continue reading