Dijkstra via SQL: a glance at Recursive CTE

Computer science might be said born at 1930s, in the math college of Princeton University. Church, Turing, Gödel independently discovered the root of computer science:

  • Lambda Calculus
  • Turing Machine
  • Gödel incompleteness theorem

They all relates to the problem: self reference or self description. We might call it recurrence too.

Recursive CTE

SQL standard gets this feature in to enhance the power of query. The idea is from Datalog/Prolog I believe. Without this feature, it is hard to implement tree-structure query, like given a relation of huge family tree, query what is the descendant of some specific person.

I am not going to discuss the semantics here. Please search on the Internet for tutorial, below is some recommences:

A quick example:

EXPLAIN (costs OFF) WITH RECURSIVE ints (n) AS
  (SELECT 1
   UNION ALL SELECT n+1
   FROM ints
   WHERE n+1<= 5 )
SELECT n
FROM ints;

                 QUERY PLAN
---------------------------------------------
 CTE Scan on ints
   CTE ints
     ->  Recursive Union
           ->  Result
           ->  WorkTable Scan on ints ints_1
                 Filter: ((n + 1) <= 5)

WITH RECURSIVE ints (n) AS
  (SELECT 1
   UNION ALL SELECT n+1
   FROM ints
   WHERE n+1<= 5 )
SELECT n
FROM ints;

 n
---
 1
 2
 3
 4
 5
(5 rows)

Dijkstra algorithm in SQL

Algorithm is what we can program do. Many many algorithms can be implemented in iteration or recurrence style. And many many algorithms follow the pattern (invariant loop). This is also true for mathematical proofs of induction.

The invariant loop of Dijkstra algorithm is that the knowledge set always contains the correct shortest path information. Each iteration step, we get some new node into the knowledge set, we have to maintain such invariance. This is the core of the correctness, also the core of the implementations.

SQL code framework

We have to use recursive CTE to implement the algorithm. And let’s review the recursive CTE semantic again before continuing.

  • Non recursive part query should contain the very beginning knowledge, to focus on the deduction step, I simplify this step to give the data manually (the closest neighbor node)
  • The recursive part can only be a join of worktable and the graph data.
  • Many limitations of recursive CTE in Postgres:
    • no order by
    • worktable cannot be in subquery
    • cannot scan twice
    • cannot in except

So the code has to be like:

with recursive sp(a, b, d) as (
     select * from init_knowledge
     union
     xxxx
     (select * from sp join roadmap)
)
select * from sp limit xxx;

Finish code step by step

To implement the algorithm, we can ignore the recursive CTE at first, and focus on the deduction part to make logic correct.

How to leave all knowledge to next step? I use a trick here: insert a null tuple in the roadmap, so when join worktable with roadmap, old tuples in worktable will join with nulls and can be identified.

But old tuples should not be considered for ranking the new min. Worktable cannot be scanned twice. How to?

My another trick is adding a flag(huge number) for old tuples and 0 for new potential tuples, and using window function to select out the new min one.

The complete code

create table roadmap(a int, b int, d int);
insert into roadmap values
(1, 2, 7),
(1, 3, 9),
(1, 6, 14),
(2, 3, 10),
(2, 4, 15),
(3, 4, 11),
(3, 6, 2),
(4, 5, 6),
(5, 6, 9);
create table roadmap_sym(like roadmap);
insert into roadmap_sym select * from roadmap;
insert into roadmap_sym select b,a,d from roadmap;
insert into roadmap_sym values (null, null, null);

create table known(like roadmap);
insert into known values (6, 3, 2);

with recursive sp(a,b,d) as
(
select * from known
union all
select a, b, d
from
(
  select
    a, b, d, s, rank() over (order by (d+s)) as rk
  from
  (
    select a, b, d, s
    from
    (
      select
      a, b, min(d) as d, sum(t) as s
      from
      (
        select
          sp.a as a,
          case
	    when rm.a is null then sp.b
	    else rm.b
	  end as b,
          case
	    when rm.a is null then sp.d
            when sp.a = rm.a then rm.d
            else sp.d + rm.d
          end as d,
	  case
	    when rm.a is null then 1000
	    else 0
	  end as t
        from
          sp, roadmap_sym as rm
        where
	  rm.a is null
	  or
	  (
            (sp.a = rm.a or
             sp.b = rm.a) and
            sp.a <> rm.b
	  )
      ) x
      group by a, b
    )y
  )z
)q where rk = 1 or s = 1000
)
select * from sp where b = 4 limit 1;

1 thought on “Dijkstra via SQL: a glance at Recursive CTE

  1. Pingback: Big Query Dijkstra - JTuto Mercure

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.