Orca nestloop join should not broadcast co-located tables
hsyuan opened this issue · 2 comments
hsyuan commented
Repro:
create table R(a int, b int) distributed by(a);
create table S(a int, b int) distributed by(a);
create index s_idx_b on S(b);
set optimizer_enable_hashjoin=off;
explain select * from R, S where R.a=S.a;
QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..1324032.79 rows=1 width=16)
-> Nested Loop (cost=0.00..1324032.79 rows=1 width=16)
Join Filter: r.a = s.a
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=8)
-> Table Scan on r (cost=0.00..431.00 rows=1 width=8)
-> Table Scan on s (cost=0.00..431.00 rows=1 width=8)
Settings: optimizer=on
Optimizer status: PQO version 2.53.9
(8 rows)
explain select * from R join S on R.a=S.a and R.b=S.b;
QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..437.00 rows=1 width=16)
-> Nested Loop (cost=0.00..437.00 rows=1 width=16)
Join Filter: true
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=8)
-> Table Scan on r (cost=0.00..431.00 rows=1 width=8)
-> Index Scan using s_idx_b on s (cost=0.00..6.00 rows=1 width=8)
Index Cond: s.b = r.b
Filter: r.a = s.a
Settings: optimizer=on
Optimizer status: PQO version 2.53.9
(10 rows)
We should not broadcast table R
in both queries, because R and S are co-located.
vraghavan78 commented
vraghavan=# explain select * from R, S where R.a=S.a;
QUERY PLAN
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1324032.35 rows=1 width=16)
-> Nested Loop (cost=0.00..1324032.35 rows=1 width=16)
Join Filter: (r.a = s.a)
-> Table Scan on r (cost=0.00..431.00 rows=1 width=8)
-> Table Scan on s (cost=0.00..431.00 rows=1 width=8)
Planning time: 55.002 ms
Optimizer: PQO version 3.6.0
(7 rows)