greenplum-db/gporca

Orca nestloop join should not broadcast co-located tables

hsyuan opened this issue · 2 comments

Repro:

create table R(a int, b int) distributed by(a);
create table S(a int, b int) distributed by(a);
create index s_idx_b on S(b);
set optimizer_enable_hashjoin=off;

explain select * from R, S where R.a=S.a;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..1324032.79 rows=1 width=16)
   ->  Nested Loop  (cost=0.00..1324032.79 rows=1 width=16)
         Join Filter: r.a = s.a
         ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
               ->  Table Scan on r  (cost=0.00..431.00 rows=1 width=8)
         ->  Table Scan on s  (cost=0.00..431.00 rows=1 width=8)
 Settings:  optimizer=on
 Optimizer status: PQO version 2.53.9
(8 rows)

explain select * from R join S on R.a=S.a and R.b=S.b;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..437.00 rows=1 width=16)
   ->  Nested Loop  (cost=0.00..437.00 rows=1 width=16)
         Join Filter: true
         ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
               ->  Table Scan on r  (cost=0.00..431.00 rows=1 width=8)
         ->  Index Scan using s_idx_b on s  (cost=0.00..6.00 rows=1 width=8)
               Index Cond: s.b = r.b
               Filter: r.a = s.a
 Settings:  optimizer=on
 Optimizer status: PQO version 2.53.9
(10 rows)

We should not broadcast table R in both queries, because R and S are co-located.

Solved for simple NLJ with pr #409

vraghavan=# explain select * from R, S where R.a=S.a;
QUERY PLAN

Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1324032.35 rows=1 width=16)
-> Nested Loop (cost=0.00..1324032.35 rows=1 width=16)
Join Filter: (r.a = s.a)
-> Table Scan on r (cost=0.00..431.00 rows=1 width=8)
-> Table Scan on s (cost=0.00..431.00 rows=1 width=8)
Planning time: 55.002 ms
Optimizer: PQO version 3.6.0
(7 rows)