OR should be handled in scalarToSingleJoin and inToMarkJoin
Opened this issue · 6 comments
we only handle it in existToMarkJoin
PhysicFilter (actual rows=0)
Output: a.a1[0]
Filter: a.a1[0]=bo.b1[1]
-> PhysicSingleJoin Left (actual rows=0)
Output: a.a1[0],bo.b1[2]
Filter: b.b2[3]=a.a1[0]*2
-> PhysicScanTable a (actual rows=0)
Output: a.a1[0],#a.a2[1]
-> PhysicFilter (actual rows=0)
Output: bo.b1[0],b.b2[1]
** Filter: (bo.b2[2]=?a.a2[1] or (bo.b1[0]=b.b1[3] and bo.b2[2]<3))
-> PhysicSingleJoin【2】 Left (actual rows=0)
Output: bo.b1[0],b.b2[2],bo.b2[1],b.b1[3]
-> PhysicScanTable b as bo (actual rows=0)
Output: bo.b1[0],bo.b2[1]
-> PhysicScanTable b (actual rows=0)
Output: b.b2[1],b.b1[0]
Filter: b.b3[2]>1
the filter with ** have to stay with the SingleJoin 【2】, but the Expr bo.b2[2]=?a.a2[1] can't be deparameter here.
what's the query?
select a1 from a where a.a1 = (select b1 from b bo where b2 = a2 or b1 = (select b1 from b where b2 = 2*a1 and b3>1) and b2<3);
(bo.b2[2]=?a.a2[1] or (bo.b1[0]=b.b1[3] and bo.b2[2]<3)) will be deparameter after meet scan tabel a, i.e., as below.
PhysicFilter (actual rows=0)
Output: a.a1[0]
Filter: a.a1[0]=bo.b1[1]
-> PhysicSingleJoin Left (actual rows=0)
** Filter: (bo.b2[2]=?a.a2[1] or (bo.b1[0]=b.b1[3] and bo.b2[2]<3))
Output: a.a1[0],bo.b1[2]
Filter: b.b2[3]=a.a1[0]*2
-> PhysicScanTable a (actual rows=0)
Output: a.a1[0],#a.a2[1]
-> PhysicFilter (actual rows=0)
Output: bo.b1[0],b.b2[1]
-> PhysicSingleJoin【2】 Left (actual rows=0)
Output: bo.b1[0],b.b2[2],bo.b2[1],b.b1[3]
-> PhysicScanTable b as bo (actual rows=0)
Output: bo.b1[0],bo.b2[1]
-> PhysicScanTable b (actual rows=0)
Output: b.b2[1],b.b1[0]
Filter: b.b3[2]>1
PhysicFilter (actual rows=1)
Output: a.a1[0]
Filter: a.a1[0]=bo.b1[1]
-> PhysicSingleJoin Left (actual rows=3)
Output: a.a1[0],bo.b1[2]
Filter: b.b2[3]=a.a1[0]*2
-> PhysicScanTable a (actual rows=3)
Output: a.a1[0],#a.a2[1]
-> PhysicFilter (actual rows=4, loops=3)
Output: bo.b1[0],b.b2[1]
** Filter: (bo.b2[2]=?a.a2[1] or (bo.b1[0]=b.b1[3] and bo.b2[2]<3))
-> PhysicSingleJoin Left (actual rows=9, loops=3)
Output: bo.b1[0],b.b2[2],bo.b2[1],b.b1[3]
-> PhysicScanTable b as bo (actual rows=3, loops=3)
Output: bo.b1[0],bo.b2[1]
-> PhysicScanTable b (actual rows=3, loops=9)
Output: b.b2[1],b.b1[0]
Filter: b.b3[2]>1
singleJoin should produce only one row, if we pull up the **Filter, the singleJoin will produce more than one row, whitch is conflicting to singleJoin.
if we leave the filter inside the SingleJoin, as the OrExpr can not know value of bo.b2[2]=?a.a2[1], so it will also produce more lines like marker join, i.e. (bo.b2[2]=?a.a2[1] or #marker) .
In a words, we have to know the value of bo.b2[2]=a.a2[1] before excute single join, i.e. it is a nested subquery.
hyperDB seems to has a complex plan to handle it.
select n_name from nation where nation.n_regionkey = (select region_o.r_regionkey from region region_o where region_o.r_name = nation.n_name or region_o.r_regionkey = (select region.r_regionkey from region where region.r_regionkey = 2*nation.n_regionkey and region.r_regionkey<3));