Understand distkey in Greenplum

Greenplum database is the world’s leading open-source MPP database. MPP database means a distributed system. They very first thing you have to deal with in distributed system is that how to divide data into pieces and split them around machines. Greenplum supports different kinds of data distribution policy: hash, random, replicated. Among these kinds of policy, hash-distribution is the most important one since many intermediate results are hash-distributed. Please visit https://greenplum.cn/ for more details.

Distributed by columns

To create table distributed on specific columns, you can use the following syntax:

create table t(a int, b int, c int) distributed by(a, b);

The above SQL creates a table with 3 int type columns, and the distributed columns are a and b.

What does this mean mathematically? We have to understand how Greenplum put tuples on segments.

Cdbhash

Hash in Greenplum has two meaning(two steps):

  • given a tuple and its hash columns, based on these information, we use mathematical tools to compute a value. The aim of this step is to map original tuple uniformly into 64 bit integer space.
  • Based on the hash value provided in step 1, use a specific algorithm (gpdb6 supports jump consistent hash) to determine which segment this tuple belongs to.

There is one very important thing I have to emphasize here on the first step of hashing(please read the source code the the function `cdbhash` to really understand it):

We compute each hash columns hash value separately in order, and at each step we turn hash value into a new state.

So, what do we mean by distributed by columns?

Distributed by columns define a formula to compute the target segment of the tuple.

What if we instance some var with const? For example, we t is distributed by (c1, c2, c3), what about select * from t where c3 = 1? Without any data movement, we claim that when given c3 = 1, t is distributed by (c1, c2, 1). This is very useful to reduce motion in distributed plan.

Reduce motion

Let’s see an example:

create table t1(a int, b int, c int) distributed by (a, b);
create table t2(a int, b int, c int) distributed by (a, b);

explain select * from t1, t2 where t1.a = t2.a and t1.b = 1;

At the first glance, we might choose to add redistribute motion over two tables to make them both distributed by (a), and then do the join. That is what ORCA does.

But based on this blog’s theorem, Distributed by columns define a formula to compute the target segment of the tuple. So given t1.b =1, we can know that t1 now distributed on (a, 1) (We do not need any data movement to reach here). Thus we could just add one redistributed motion by (a, 1) on t2, and then do the join. That is what planner does.

gpadmin=# show optimizer;
 optimizer
-----------
 off
(1 row)

gpadmin=# explain (costs off) select * from t1, t2 where t1.a = t2.a and t1.b = 1;
                         QUERY PLAN
------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)
   ->  Hash Join
         Hash Cond: (t2.a = t1.a)
         ->  Redistribute Motion 3:3  (slice1; segments: 3)
               Hash Key: t2.a, 1
               ->  Seq Scan on t2
         ->  Hash
               ->  Seq Scan on t1
                     Filter: (b = 1)
 Optimizer: Postgres query optimizer
(10 rows)

gpadmin=# set optimizer = on;
SET
gpadmin=# show optimizer;
 optimizer
-----------
 on
(1 row)

gpadmin=# explain (costs off) select * from t1, t2 where t1.a = t2.a and t1.b = 1;
                            QUERY PLAN
------------------------------------------------------------------
 Gather Motion 3:1  (slice3; segments: 3)
   ->  Hash Join
         Hash Cond: (t1.a = t2.a)
         ->  Redistribute Motion 3:3  (slice1; segments: 3)
               Hash Key: t1.a
               ->  Seq Scan on t1
                     Filter: (b = 1)
         ->  Hash
               ->  Redistribute Motion 3:3  (slice2; segments: 3)
                     Hash Key: t2.a
                     ->  Seq Scan on t2
 Optimizer: Pivotal Optimizer (GPORCA) version 3.83.0
(12 rows)

1 thought on “Understand distkey in Greenplum

  1. Pingback: 探囊取物——理解,诊断,修复Greenplum中的Opclass和Opfamily相关问题 | Thoughts on Computing

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.