Worktable as Continuations

Not sure why I came again to this interesting paper One WITH RECURSIVE is Worth Many GOTOs. Daily work on Greenplum makes me think and inspires me to learn knowledge in different areas of computer science. This paper was accepted in the same SIGMOD as Greenplum’s HTAP paper and I have a vague impression of it. Recently three tasks in my work made me peruse the paper:

  1. Fix a customer JIRA on recursive CTE
  2. Answer questions from my colleague on the implementation of KNN in Greenplum (I advise using lateral join with some enhancement of Optimizer)
  3. My own spike on the labels as values skill in Greenplum(Postgres)’s expression valuation

This paper uses recursive CTE, Lateral Join and Union All with filters to implement control structure such as Loop, Condition in PLPGSQL. So now is a good time to start a new tour.

A brief introduction to recursive CTE

Greenplum (Postgres) uses Recursive Union to implement recursive CTE. For an interesting introduction to it please refer to my previous blog 三国赵云传数据+Postgres解析一篇SIGMOD2019论文的agg递归CTE. Comments in the executor code ExecRecursiveUnion with the following simple case can help understand. worktable is the key concept here.

explain (costs off) with recursive cte(n) as (
   select 1
   union all
   select n + 1 from cte where n < 10
select * from cte ;
                QUERY PLAN
 CTE Scan on cte
   CTE cte
     ->  Recursive Union
           ->  Result
           ->  WorkTable Scan on cte cte_1
                 Filter: (n < 10)

/* ----------------------------------------------------------------
 *		ExecRecursiveUnion(node)
 *		Scans the recursive query sequentially and returns the next
 *		qualifying tuple.
 * 1. evaluate non recursive term and assign the result to RT
 * 2. execute recursive terms
 * 2.1 WT := RT
 * 2.2 while WT is not empty repeat 2.3 to 2.6. if WT is empty returns RT
 * 2.3 replace the name of recursive term with WT
 * 2.4 evaluate the recursive term and store into WT
 * 2.5 append WT to RT
 * 2.6 go back to 2.2
 * ----------------------------------------------------------------
static TupleTableSlot *
ExecRecursiveUnion(PlanState *pstate)

WorkTable contains information for the next round of execution of the recursive part. If we have a jump table and every time WorkTable contains the next index of instructions to execute, then WorkTable can be thought as a continuation. Also this is almost the same pattern of Postgres’ skill in ExecInterpExpr.

/* struct for jump target -> opcode lookup table */
typedef struct ExprEvalOpLookup
	const void *opcode;
	ExprEvalOp	op;
} ExprEvalOpLookup;

/* to make dispatch_table accessible outside ExecInterpExpr() */
static const void **dispatch_table = NULL;

/* jump target -> opcode lookup table */
static ExprEvalOpLookup reverse_dispatch_table[EEOP_LAST];

#define EEO_SWITCH()
#define EEO_CASE(name)		CASE_##name:
#define EEO_DISPATCH()		goto *((void *) op->opcode)
#define EEO_OPCODE(opcode)	((intptr_t) dispatch_table[opcode])

Returning from a function call is just like calling another function: feed the return value to the continuation. For references of continuaion, please read Professor Dan Friedman’s EOPL.

Condition Expression in SQL

The if-then-else expression in general programming language can be compiled to SQL like:

-- if a then b else c

select if_body.x
from         (select a) if_cond(x), 
     lateral  (
             select b where if_cond.x
             union all
             select c where not if_cond.x
     ) if_body;

If there is no case: more than one row returned by a subquery used as an expression, then I think the above structure can be replaced by simple case-when expression.

NOTE: the union all with filter as dispatch skill (also used in next section) works and will not execute the excluded plan because the filter in each clause is pseudoconstant and Postgres planner will create a Result node with one-time filter. If the run-time filter evaluates to false, then the below plan will not be executed at all. A simple example is:

gpadmin=# explain (costs off) with recursive cte(label, run, val) as (
   select 'f', true, 0::int
   union all
   select body.*
   from cte, lateral (
     select 'g', true, val + count(1)::int from t1 where cte.label = 'f'
     union all
     select 'e', true, val + count(1)::int from t1 where cte.label = 'g'
     union all
     select 'h', false, val + count(1)::int from t1 where cte.label = 'e'
   ) body
select * from cte where not run;
                                  QUERY PLAN
 CTE Scan on cte
   Filter: (NOT run)
   CTE cte
     ->  Recursive Union
           ->  Result
           ->  Nested Loop
                 ->  WorkTable Scan on cte cte_1
                 ->  Append
                       ->  Aggregate
                             ->  Result
                                   One-Time Filter: (cte_1.label = 'f'::text)
                                   ->  Seq Scan on t1
                       ->  Aggregate
                             ->  Result
                                   One-Time Filter: (cte_1.label = 'g'::text)
                                   ->  Seq Scan on t1 t1_1
                       ->  Aggregate
                             ->  Result
                                   One-Time Filter: (cte_1.label = 'e'::text)
                                   ->  Seq Scan on t1 t1_2
(20 rows)

Continuations in SQL

We can build continuation using worktable:

  1. force worktable only have one row output
  2. there is a column of worktable serves as the type of continuation
  3. specific data of a continuation will be different columns for the worktable
  4. next round of recursive union all execution is just like to apply the continuation
  5. to apply a continuation is just a dispatch using union all skill

Below is a case from the book Compiling with Continuations: (with tiny modification)

fun prodprimes(n) = if n=1
                    then []
                    else if isprime(n)
                    then n::prodprimes(n-1)
                    else prodprimes(n-1)

It can be transformed into Continuation-Passing-Style:

fun prodprimes(n,c) =
    if n = 1
    then c([])
    else let fun k(b) =
                 if b = true
                 then let fun j(p) =
                              let val a=n::p
                              in c(a)
                          val m = n-1
                      in prodprimes(m, j)
                 else prodprims(n-1, c)
         in isprime(n, k)

There are 3 types of continuation in the above program:

  1. the final continuation which is just returning the value
  2. the function j which models the control context’s growth
  3. the function itself

We can use an array to keep the prime numbers to model the second continuation.

My attempt is as below:

create language plpython3u;
create function isprime(n int) returns boolean as
for i in range(2, n+1):
  if i * i > n: return True
  if n % i == 0: return False
return True
language plpython3u;

\set n 20

with recursive continuation(label, v1, v2, v3) as
  select 's', :n, array[]::int[], false
  union all
  select dispatch.*
  from continuation, lateral (
      select body.*
      from (select continuation.v1 = 1) as if_cond(b), lateral (
        select 'c', 1, continuation.v2, continuation.v3 where if_cond.b
        union all
        select 'k', continuation.v1, continuation.v2, isprime(continuation.v1) where not if_cond.b
      ) body
      where continuation.label = 's'
    ) union all
      select body.*
      from (select continuation.v3) as if_cond(b), lateral (
        select 's', continuation.v1 - 1, array_append(continuation.v2, continuation.v1), false where if_cond.b
        union all
        select 's', continuation.v1 - 1, continuation.v2, false where not if_cond.b
      ) body
      where continuation.label = 'k'
    ) union all
      select null, null, continuation.v2, null
      where continuation.label = 'c'
  ) dispatch

select * from continuation where label is null;

-- test result
 label | v1 |          v2           | v3
       |    | {19,17,13,11,7,5,3,2} |
(1 row)

What we have leaned and what’s more?

  • How to better generate plans for KNN in MPP database? (take advantage of index and lateral)
  • Recursive CTE’s locus should be dynamic determined in MPP database
  • Result node’s onetime filter skill
  • Compile functional programming language

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.