zhouqingqing/qpmodel

where exists or exists exception

Closed this issue · 11 comments

select a.a1 from a where exists (select * from b where a.a2 = b.b2) or exists (select * from c where a.a2 = c.c2) order by 1

fails because the first exists subquery has no physical plan, it is possible that the second one doesn't either.
When OR is changed to AND it produces correct results.

Another query which has an OR also fails.

select 1 from a where a.a1 > (select b1 from b where b.b2 > (select c2 from c where c.c2=b2) or b.b1 > ((select c2 from c where c.c2=b2)))

We didn't handle OR systematically

another example:

            sql = "select a1 from a where a.a1 = (select b1 from b bx where bx.b2 = a.a2 OR bx.b1 = (select b1 from b bz where bz.b2 = 2*a1 and bz.b3>1) and bx.b2<3)";

This is the cause of this issue. In practical, there should be no "or expression" in semi-join expression if you want to unnest the subquery.

 if (queryOpt_.optimize_.enable_subquery_unnest_)
                    {
                        // use the plan 'root' containing the subexpr 'x'
                        var replacement = oneSubqueryToJoin(root, x);
                        newroot = (LogicNode)newroot.SearchAndReplace(root,
                                                                replacement);
                    }

But the codes here don't consider this condition, so it failed to handle subquery with "or expression".
I will handle it next Monday.

Blow is the postgreSQL's plan of SQL(explain
select
a.a1
from a
where
exists
(select * from b where a.a2 = b.b2)
or exists
(select * from c where a.a2 = c.c2)
clearly, it didn't be unnested .
image

Blow is the plan of SQL(explain
select
a.a1
from a
where
exists
(select * from b where a.a2 = b.b2)
and exists
(select * from c where a.a2 = c.c2)
clearly, it was unnested .
image

During handling this issue. I find another issue.
"select a.a1 from b" will return result "null" instead of Error,
It might need to opening anothor issue if necessary.

This is very bad. I mean unnest or otherwise returning correct results even if it is slow if preferable than returning outrageously incorrect results. I will open an issue and assign it to myself.

In practical, there should be no "or expression" in semi-join expression if you want to unnest the subquery.

This is not true. The sub-query related papers explicitly give examples the transformations with singlejoin can handle OR.

Blow is the postgreSQL's plan of SQL

We shall get the same PG plan with enable_subquery_unnest_ = false. PG can't do complex subquery decorrelation as us.

The paper Unnesting Arbitrary Queries
To the best of our knowledge, no existing system can unnest such a query. And indeed, unnesting this query is hard:

The routine of oneSubqueryToJoin(root, x) indeed doesn't process the situation of "or expression" in subquery as @zhouqingqing refered.

We didn't handle OR systematically

I originally think the issue is just the problem of "when there is a or in subquey, we don't need to unnest it".
So I have to read this paper to understand and realize it.

@9DemonFox where are we with this issue?

I take over this issue from zhourui today. And locate the codes causing this BUG as bellow.

            LogicFilter Filter = new LogicFilter(markjoin, topfilter);

20201026-142208(eSpace)
it wil handel @1 OR @2 to the plan to the Fig A, i.e. embedding the @2 to a lower level. For AND, it will be ok to do so.
I try to handle it to the plan like Fig B.

agree fig B is the right plan: for AND, the predicate can further push down becoming figure A, for OR, we can't push it down.