Question about self join
StarGazerM opened this issue · 1 comments
In the compiled RAM of query contains self join for example:
path(from, to) :- edge(from, to).
path(from, to) :- path(from, mid), path(mid, to).
Recursive rule will be compiled into:
LOOP
DEBUG "path(from,to) :- \n path(from,mid),\n path(mid,to).\nin file tc.dl [10:1-10:50]"
QUERY
IF ((NOT ISEMPTY(@delta_path)) AND (NOT ISEMPTY(path)))
FOR t0 IN @delta_path
FOR t1 IN path ON INDEX t1.0 = t0.1
IF ((NOT (t0.0,t1.1) IN path) AND (NOT (t0.1,t1.1) IN @delta_path))
INSERT (t0.0, t1.1) INTO @new_path
END QUERY
END DEBUG
DEBUG "path(from,to) :- \n path(from,mid),\n path(mid,to).\nin file tc.dl [10:1-10:50]"
QUERY
IF ((NOT ISEMPTY(path)) AND (NOT ISEMPTY(@delta_path)))
FOR t0 IN path
FOR t1 IN @delta_path ON INDEX t1.0 = t0.1
IF (NOT (t0.0,t1.1) IN path)
INSERT (t0.0, t1.1) INTO @new_path
END QUERY
END DEBUG
The first RAM query is joining delta version of path with full version path and the second is reverse.
I have 2 question:
- why there is a
(NOT (t0.1,t1.1) IN @delta_path))
in first query - why the self join is not simple as join delta with delta and then delta with full? this should also contains all result, iterate over full seems unnecessary?
The rules are evaluating semi-naive step i+1
. The test NOT (t0.1,t1.1) IN @delta_path
is NOT (mid,to) IN @delta_path
, it is a way to simulate path
at step i-1
.
I reckon it is confusing to have to mix values from t0
and t1
but it's probably due to lowering and t1.0 = t0.1
. It is equivalent to (NOT (t1.0, t1.1) IN @delta_path
.
If you are wondering why we need path
at step i-1
during evaluation of step i+1
, it's an optimization of the semi-naive evaluation well described in the literature. R <- R1 ... Rn, T1 ... Tm
is replaced with tempR(i+1) <- R1 ... Rn, T1(i) .. Tj-1(i), ΔTj(i), Tj+1(i-1) ... Tm(i-1)
for j
in 1..m
.
This definition requires the relations T
at steps i
and i-1
. Since we don't want to keep history of 2 steps in Soufflé, we only store T
at step i
and we say T(i-1) = T(i) - Δ(i)
.
There is a second optimization here, Soufflé does not compute tempR(i+1)
instead it computes the ΔR(i+1)
(that we call @new
) by making sure we are not inserting facts that already exist in R(i)
: (NOT (from, to) IN path)
.
Eventually at the end of each LOOP
iteration, Soufflé computes R(i+1) = R(i) + ΔR(i+1)
:
QUERY
FOR t0 IN @new_anc
INSERT (t0.0, t0.1) INTO anc
END QUERY