Produce Leading and join method hints with ANY_subquery alias for IN-subquery
mtakahar opened this issue · 0 comments
Postgres optimizer tries to convert an ANY- and IN- subquery expression to an inner join when it satisfies certain conditions, and when it does so, it assigns an internally generated table alias: ANY_subquery
to the subquery.
Example:
An IN-subquery (the same as =ANY (...) ) expression, c2 in (select c1 from t100 where ...)
was converted to an inner join, but the Leading hint was not followed. The optimizer actually generated a plan: Leading (( t10000w ( t1000 t100 ) )) NestLoop(t10000w t100) NestLoop(t10000w t100 t1000)
.
explain /*+ Leading (( ( t10000w t100 ) t1000 )) HashJoin(t10000w t100) HashJoin(t10000w t100 t1000) SeqScan(t10000w) SeqScan(t100) SeqScan(t1000) */select * from t10000w where c2 in (select c1 from t100 where exists (select 0 from t1000 where c1 = t100.c2));
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop Semi Join (cost=20000000000.00..20000017610.50 rows=400 width=16396)
Join Filter: (t10000w.c2 = t100.c1)
-> YB Seq Scan on t10000w (cost=0.00..1000.00 rows=10000 width=16396)
-> Materialize (cost=10000000000.00..10000001610.75 rows=100 width=4)
-> Nested Loop (cost=10000000000.00..10000001610.25 rows=100 width=4)
Join Filter: (t100.c2 = t1000.c1)
-> YB Seq Scan on t1000 (cost=0.00..100.00 rows=1000 width=4)
-> Materialize (cost=0.00..10.50 rows=100 width=8)
-> YB Seq Scan on t100 (cost=0.00..10.00 rows=100 width=8)
(9 rows)
We can get HashJoins by specifying ANY_subquery
in place of t100
, however, the Leading
hint is still not followed:
explain /*+ Leading (( ( t10000w ANY_subquery ) t1000 )) HashJoin(t10000w ANY_subquery) HashJoin(t10000w ANY_subquery t1000) SeqScan(t10000w) SeqScan(t100) SeqScan(t1000) */select * from t10000w where c2 in (select c1 from t100 where exists (select 0 from t1000 where c1 = t100.c2));
QUERY PLAN
----------------------------------------------------------------------------------
Hash Semi Join (cost=118.50..1149.20 rows=400 width=16396)
Hash Cond: (t10000w.c2 = t100.c1)
-> YB Seq Scan on t10000w (cost=0.00..1000.00 rows=10000 width=16396)
-> Hash (cost=117.25..117.25 rows=100 width=4)
-> Hash Join (cost=11.25..117.25 rows=100 width=4)
Hash Cond: (t1000.c1 = t100.c2)
-> YB Seq Scan on t1000 (cost=0.00..100.00 rows=1000 width=4)
-> Hash (cost=10.00..10.00 rows=100 width=8)
-> YB Seq Scan on t100 (cost=0.00..10.00 rows=100 width=8)
(9 rows)
We need more investigations before we can implement this to the framework.