Computer science might be said born at 1930s, in the math college of Princeton University. Church, Turing, Gödel independently discovered the root of computer science:
- Lambda Calculus
- Turing Machine
- Gödel incompleteness theorem
They all relates to the problem: self reference or self description. We might call it recurrence too.
SQL standard gets this feature in to enhance the power of query. The idea is from Datalog/Prolog I believe. Without this feature, it is hard to implement tree-structure query, like given a relation of huge family tree, query what is the descendant of some specific person.
I am not going to discuss the semantics here. Please search on the Internet for tutorial, below is some recommences:
- The executor source code `ExecRecursiveUnion` of Postgres
A quick example:
EXPLAIN (costs OFF) WITH RECURSIVE ints (n) AS (SELECT 1 UNION ALL SELECT n+1 FROM ints WHERE n+1<= 5 ) SELECT n FROM ints; QUERY PLAN --------------------------------------------- CTE Scan on ints CTE ints -> Recursive Union -> Result -> WorkTable Scan on ints ints_1 Filter: ((n + 1) <= 5) WITH RECURSIVE ints (n) AS (SELECT 1 UNION ALL SELECT n+1 FROM ints WHERE n+1<= 5 ) SELECT n FROM ints; n --- 1 2 3 4 5 (5 rows)
Dijkstra algorithm in SQL
Algorithm is what we can program do. Many many algorithms can be implemented in iteration or recurrence style. And many many algorithms follow the pattern (invariant loop). This is also true for mathematical proofs of induction.
The invariant loop of Dijkstra algorithm is that the knowledge set always contains the correct shortest path information. Each iteration step, we get some new node into the knowledge set, we have to maintain such invariance. This is the core of the correctness, also the core of the implementations.
SQL code framework
We have to use recursive CTE to implement the algorithm. And let’s review the recursive CTE semantic again before continuing.
- Non recursive part query should contain the very beginning knowledge, to focus on the deduction step, I simplify this step to give the data manually (the closest neighbor node)
- The recursive part can only be a join of worktable and the graph data.
- Many limitations of recursive CTE in Postgres:
- no order by
- worktable cannot be in subquery
- cannot scan twice
- cannot in except
So the code has to be like:
with recursive sp(a, b, d) as ( select * from init_knowledge union xxxx (select * from sp join roadmap) ) select * from sp limit xxx;
Finish code step by step
To implement the algorithm, we can ignore the recursive CTE at first, and focus on the deduction part to make logic correct.
How to leave all knowledge to next step? I use a trick here: insert a null tuple in the roadmap, so when join worktable with roadmap, old tuples in worktable will join with nulls and can be identified.
But old tuples should not be considered for ranking the new min. Worktable cannot be scanned twice. How to?
My another trick is adding a flag(huge number) for old tuples and 0 for new potential tuples, and using window function to select out the new min one.
The complete code
create table roadmap(a int, b int, d int); insert into roadmap values (1, 2, 7), (1, 3, 9), (1, 6, 14), (2, 3, 10), (2, 4, 15), (3, 4, 11), (3, 6, 2), (4, 5, 6), (5, 6, 9); create table roadmap_sym(like roadmap); insert into roadmap_sym select * from roadmap; insert into roadmap_sym select b,a,d from roadmap; insert into roadmap_sym values (null, null, null); create table known(like roadmap); insert into known values (6, 3, 2); with recursive sp(a,b,d) as ( select * from known union all select a, b, d from ( select a, b, d, s, rank() over (order by (d+s)) as rk from ( select a, b, d, s from ( select a, b, min(d) as d, sum(t) as s from ( select sp.a as a, case when rm.a is null then sp.b else rm.b end as b, case when rm.a is null then sp.d when sp.a = rm.a then rm.d else sp.d + rm.d end as d, case when rm.a is null then 1000 else 0 end as t from sp, roadmap_sym as rm where rm.a is null or ( (sp.a = rm.a or sp.b = rm.a) and sp.a <> rm.b ) ) x group by a, b )y )z )q where rk = 1 or s = 1000 ) select * from sp where b = 4 limit 1;