Not sure why I came again to this interesting paper One WITH RECURSIVE is Worth Many GOTOs. Daily work on Greenplum makes me think and inspires me to learn knowledge in different areas of computer science. This paper was accepted in the same SIGMOD as Greenplum’s HTAP paper and I have a vague impression of it. Recently three tasks in my work made me peruse the paper:
- Fix a customer JIRA on recursive CTE
- Answer questions from my colleague on the implementation of KNN in Greenplum (I advise using lateral join with some enhancement of Optimizer)
- My own spike on
the labels as values
skill in Greenplum(Postgres)’s expression valuation
This paper uses recursive CTE, Lateral Join and Union All with filters to implement control structure such as Loop, Condition
in PLPGSQL
. So now is a good time to start a new tour.
A brief introduction to recursive CTE
Greenplum (Postgres) uses Recursive Union to implement recursive CTE. For an interesting introduction to it please refer to my previous blog 三国赵云传数据+Postgres解析一篇SIGMOD2019论文的agg递归CTE. Comments in the executor code ExecRecursiveUnion
with the following simple case can help understand. worktable
is the key concept here.
explain (costs off) with recursive cte(n) as (
select 1
union all
select n + 1 from cte where n < 10
)
select * from cte ;
QUERY PLAN
-------------------------------------------
CTE Scan on cte
CTE cte
-> Recursive Union
-> Result
-> WorkTable Scan on cte cte_1
Filter: (n < 10)
/* ----------------------------------------------------------------
* ExecRecursiveUnion(node)
*
* Scans the recursive query sequentially and returns the next
* qualifying tuple.
*
* 1. evaluate non recursive term and assign the result to RT
*
* 2. execute recursive terms
*
* 2.1 WT := RT
* 2.2 while WT is not empty repeat 2.3 to 2.6. if WT is empty returns RT
* 2.3 replace the name of recursive term with WT
* 2.4 evaluate the recursive term and store into WT
* 2.5 append WT to RT
* 2.6 go back to 2.2
* ----------------------------------------------------------------
*/
static TupleTableSlot *
ExecRecursiveUnion(PlanState *pstate)
{
WorkTable
contains information for the next round of execution of the recursive part. If we have a jump table and every time WorkTable
contains the next index of instructions to execute, then WorkTable
can be thought as a continuation
. Also this is almost the same pattern of Postgres’ skill in ExecInterpExpr
.
/* struct for jump target -> opcode lookup table */
typedef struct ExprEvalOpLookup
{
const void *opcode;
ExprEvalOp op;
} ExprEvalOpLookup;
/* to make dispatch_table accessible outside ExecInterpExpr() */
static const void **dispatch_table = NULL;
/* jump target -> opcode lookup table */
static ExprEvalOpLookup reverse_dispatch_table[EEOP_LAST];
#define EEO_SWITCH()
#define EEO_CASE(name) CASE_##name:
#define EEO_DISPATCH() goto *((void *) op->opcode)
#define EEO_OPCODE(opcode) ((intptr_t) dispatch_table[opcode])
Returning from a function call is just like calling another function: feed the return value to the continuation
. For references of continuaion
, please read Professor Dan Friedman’s EOPL.
Condition Expression in SQL
The if-then-else
expression in general programming language can be compiled to SQL like:
-- if a then b else c
select if_body.x
from (select a) if_cond(x),
lateral (
select b where if_cond.x
union all
select c where not if_cond.x
) if_body;
If there is no case: more than one row returned by a subquery used as an expression
, then I think the above structure can be replaced by simple case-when
expression.
NOTE: the union all
with filter
as dispatch skill (also used in next section) works and will not execute the excluded plan because the filter in each clause is pseudoconstant
and Postgres planner will create a Result
node with one-time filter. If the run-time filter evaluates to false, then the below plan will not be executed at all. A simple example is:
gpadmin=# explain (costs off) with recursive cte(label, run, val) as (
select 'f', true, 0::int
union all
select body.*
from cte, lateral (
select 'g', true, val + count(1)::int from t1 where cte.label = 'f'
union all
select 'e', true, val + count(1)::int from t1 where cte.label = 'g'
union all
select 'h', false, val + count(1)::int from t1 where cte.label = 'e'
) body
)
select * from cte where not run;
QUERY PLAN
------------------------------------------------------------------------------
CTE Scan on cte
Filter: (NOT run)
CTE cte
-> Recursive Union
-> Result
-> Nested Loop
-> WorkTable Scan on cte cte_1
-> Append
-> Aggregate
-> Result
One-Time Filter: (cte_1.label = 'f'::text)
-> Seq Scan on t1
-> Aggregate
-> Result
One-Time Filter: (cte_1.label = 'g'::text)
-> Seq Scan on t1 t1_1
-> Aggregate
-> Result
One-Time Filter: (cte_1.label = 'e'::text)
-> Seq Scan on t1 t1_2
(20 rows)
Continuations in SQL
We can build continuation
using worktable
:
- force
worktable
only have one row output - there is a column of
worktable
serves as the type ofcontinuation
- specific data of a
continuation
will be different columns for theworktable
- next round of recursive union all execution is just like to apply the
continuation
- to apply a
continuation
is just a dispatch usingunion all
skill
Below is a case from the book Compiling with Continuations: (with tiny modification)
fun prodprimes(n) = if n=1
then []
else if isprime(n)
then n::prodprimes(n-1)
else prodprimes(n-1)
It can be transformed into Continuation-Passing-Style
:
fun prodprimes(n,c) =
if n = 1
then c([])
else let fun k(b) =
if b = true
then let fun j(p) =
let val a=n::p
in c(a)
end
val m = n-1
in prodprimes(m, j)
end
else prodprims(n-1, c)
in isprime(n, k)
end
There are 3 types of continuation
in the above program:
- the final
continuation
which is just returning the value - the function
j
which models the control context’s growth - the function itself
We can use an array to keep the prime numbers to model the second continuation
.
My attempt is as below:
create language plpython3u;
create function isprime(n int) returns boolean as
$$
for i in range(2, n+1):
if i * i > n: return True
if n % i == 0: return False
return True
$$
language plpython3u;
\set n 20
with recursive continuation(label, v1, v2, v3) as
(
select 's', :n, array[]::int[], false
union all
select dispatch.*
from continuation, lateral (
(
select body.*
from (select continuation.v1 = 1) as if_cond(b), lateral (
select 'c', 1, continuation.v2, continuation.v3 where if_cond.b
union all
select 'k', continuation.v1, continuation.v2, isprime(continuation.v1) where not if_cond.b
) body
where continuation.label = 's'
) union all
(
select body.*
from (select continuation.v3) as if_cond(b), lateral (
select 's', continuation.v1 - 1, array_append(continuation.v2, continuation.v1), false where if_cond.b
union all
select 's', continuation.v1 - 1, continuation.v2, false where not if_cond.b
) body
where continuation.label = 'k'
) union all
(
select null, null, continuation.v2, null
where continuation.label = 'c'
)
) dispatch
)
select * from continuation where label is null;
-- test result
label | v1 | v2 | v3
-------+----+-----------------------+----
| | {19,17,13,11,7,5,3,2} |
(1 row)
What we have leaned and what’s more?
- How to better generate plans for KNN in MPP database? (take advantage of index and lateral)
- Recursive CTE’s locus should be dynamic determined in MPP database
- Result node’s onetime filter skill
- Compile functional programming language