# 三国赵云传数据+Postgres解析一篇SIGMOD2019论文的agg递归CTE

Contents

## 递归CTE查询

explain with recursive t(n) as (
select
1
union all
select
n + 1
from
t
where
n < 10
)
select
*
from
t;

QUERY PLAN
-----------------------------------------
CTE Scan on tt
CTE tt
->  Recursive Union
->  Result
->  WorkTable Scan on tt tt_1
Filter: (n < 10)


Postgres用worktable scan实现递归CTE表的扫描，用Recursive Union这个plannode去实现fixpoint语义。Recursive Union这个plannode很可能是唯二的同时有两个子树的plannode。大致了流程如下:

1. 用非递归部分语句初始化worktable，记为wk = w0
2. 基于wk的数据执行worktablescan，产生的结果记为w1, 替换wk = w1
3. 如果wk是空集，则go to 4, 否则go to 2
4. 把所有w0, w1, w2, … wn用union拼接起来

SQL的标准似乎(这里用似乎是因为我没有去搜索标准)不支持在递归部分写Agg，关于这一点原因，我也没有去调研，但我和同事们有一些不成熟的推测，这里就不表了。比如下面的语句就会变报错:

gpadmin=# explain (costs off) with recursive tt(n) as (
select
1
union all
select
max(n + 1)
from
tt
where
n < 10
)
select
*
from
tt;
ERROR:  aggregate functions are not allowed in a recursive query's recursive term
LINE 6:     max(n + 1)
^


## 三国赵云传的数据例子

gpadmin=# select * from peifang where part = '龙胆枪';
part  | subpart
--------+----------
龙胆枪 | 化影戟
龙胆枪 | 紫水晶
龙胆枪 | 白虎之魂
龙胆枪 | 龙卷风笺
(4 rows)

gpadmin=# select * from peifang where part = '化影戟';
part  | subpart
--------+----------
化影戟 | 方天画戟
化影戟 | 浑元画戟
(2 rows)

gpadmin=# select * from base limit 1;
part  | round
--------+-------
金锁甲 |     9
(1 row)


## 传统的SQL如何处理

with recursive tt(part, round) as (
select part, round from base
union
select peifang.part, tt.round
from tt, peifang
where tt.part = peifang.subpart
)
select part, max(round) from tt group by part;


## 论文的贡献

with recursive tt(part, max() round) as (
select part, round from base
union
select peifang.part, tt.round
from tt, peifang
where tt.part = peifang.subpart
)
select part, round from tt;


• 对投影处理：如果投影列被声明了agg，如上文的max() round, 则但凡出现了round位置的地方自动使用max
• 对groupby的处理：只要没有被声明agg的列，一律必须出现在group by里

with recursive tt(part, round) as (
select part, max(round) from base group by part
union
select peifang.part, max(tt.round)
from tt, peifang
where tt.part = peifang.subpart
group by peifang.part
)
select part, max(round) from tt group by part;


gpadmin=# with recursive tt(part, round) as (
select part, max(round) from base group by part
union
select peifang.part, max(tt.round)
from tt, peifang
where tt.part = peifang.subpart
group by peifang.part
)
select part, max(round) from tt group by part;
ERROR:  aggregate functions are not allowed in a recursive query's recursive term
LINE 4:   select peifang.part, max(tt.round)
^


## 瞒天过海：Postgres真的做不到么？

• Postgres的处理递归CTE查询计划框架，完全没有限制agg，明明是可以支持的
• 我在用Postgres编程Dijkstra算法的时候，不是也用agg，我还用了窗口函数呢

explain
with recursive sp(a, d) as
(
select a, d from known
union all
select a, d
from
(
select
a, min(d) as d
from
(
select
sp.a as a,
sp.d as d
from
) x
group by a
)y
)
select * from sp limit 1;


explain
with recursive tt(part, round) as (
select part, max(round) as round from base group by part
union
select part, round
from
(
select part, max(round) as round
from
(
select peifang.part as part, tt.round as round
from tt, peifang
where tt.part = peifang.subpart
)x
group by part
)y
)
select part, max(round) from tt group by part;

QUERY PLAN
----------------------------------------------------------------
GroupAggregate
Group Key: tt.part
CTE tt
->  Recursive Union
->  HashAggregate
Group Key: base.part
->  Seq Scan on base
->  HashAggregate
Group Key: peifang.part
->  Hash Join
Hash Cond: (tt_1.part = peifang.subpart)
->  WorkTable Scan on tt tt_1
->  Hash
->  Seq Scan on peifang
->  CTE Scan on tt


  part  | max
--------+-----
龙胆枪 |  16


## 论文其他部分的思考

Greenplum对递归CTE的支持还有限，在语义分析阶段会禁止掉非常多的功能。如果在MPP数据库里完美的支持递归CTE，有时间的话，还需要系统的思考研究。目前我的经验是，Greenplum可能需要修改Postgres为此设计的执行器，因为Postgres是单机数据库，Recursive Union这个执行器节点和下面的worktable scan的执行器部分在一个进程里，Postgres的执行器代码依赖了这一点。Greenplum的motion会切片查询计划，有可能违背这一点。

## 1 thought on “三国赵云传数据+Postgres解析一篇SIGMOD2019论文的agg递归CTE”

This site uses Akismet to reduce spam. Learn how your comment data is processed.