In recent years I focus on developing Greenplum, other projects can be found on my Github Page.

My Greenplum commits:
Improve cost model of hash aggregation for Greenplum
- Implement an algorithm to estimate the number of tuples that may be spilled to disk if choosing two-stage hash aggregation plan. The algorithm is based on tools of analytic combinatorics
- Implement an algorithm to compute the number of groups on each segment if the group keys is not matched distributed keys
- Make Greenplum planner can intelligently choose 1-stage or 2-stage aggregation plans
- Code: https://github.com/greenplum-db/gpdb/pull/8439
Merge with PostgreSQL v12
- Help on performance analysis:
- Fix test cases to make pipeline green
- Huge milestone, my honor to work with Heikki and make some contribution
- Code: https://github.com/greenplum-db/gpdb/pull/10862
Global Deadlock Detector
- Implement global deadlock detector and prove the correctness of the algorithm
- Downgrade the lock level on master so that Greenplum can be used in OLTP scenario
- UPDATE|DELETE TPC-B like benchmark becomes about 100 times faster
- A patent
- Code: https://github.com/greenplum-db/gpdb/pull/4810 (there are several later commits, but this is the main one)
Consistent hashing in distributed system and Pivotal hashing algorithm
- Introduce jump consistent hashing in Greenplum to make expanding faster
- Develop a new consistent hash——pivotal hash algorithm which is much better than google’s maglev. This is the best consistent hash algorithm based on look-up-table
- Develop a index-based algorithm to make scanning need-move-tuple faster
- Code: https://github.com/greenplum-db/gpdb/pull/5426 (Pivotal hash and index algorithm please refer to my blogs)
Optimization of locks and OLTP performance for Greenplum
- look deep into select statement with locking clause cases in MPP database and do optimization for some simple cases
- look deep into split-update technique and adopt them for MPP database
- look deep into locks and partition tables and transactions
- Code: https://github.com/greenplum-db/gpdb/pull/7635 (Also there are later commits, but this is the main one)
Serialize DMLs for GreenplumDB
- Develop an algorithm than can serialize different DMLs in Greenplum
- Use Prolog to find a best lock mode solution
- Code: https://github.com/kainwen/gpdb/tree/serialize_updatewith_motion_on_qd
Online expansion for Greenplum
- Expanding Greenplum cluster without restarting the cluster
- Introduce numsegments in each table’s distribution information so that queries can still take advantage of co-locate of original table, also make planner can generate correct distributed plans
- Implement an algorithm to reshuffle only-need-move tuples based on split-update
- Implement an algorithm to reshuffle only-need-move tuples based on two writer gang
Improve performance of join for Greenplum
- Refactor the core part of distributed planner in Greenplum to make code clean
- Improve adding motion algorithm to make some joins much faster
- Code: https://github.com/greenplum-db/gpdb/pull/7148