citusdata/cstore_fdw

cstore_fdw should have better support for joins

mtuncer opened this issue · 1 comments

When cstore_fdw table is involved in a join, postgres pulls all data from cstore_fdw table. It makes queries unnecessarily longer.

We are getting more requests to make this work. I do not know the involved work at the moment. We should definitely investigate this for future enhancement.

Here's a quick setup to tell the story:

Create data

CREATE FOREIGN TABLE cstore_tmp (X integer,Y integer) SERVER cstore_server;
INSERT INTO cstore_tmp 
SELECT s1, s2
FROM generate_series(1,3000) s1, generate_series(1,3000) s2;

Query with 1:N join (same as no join)

EXPLAIN SELECT count(b.*)
FROM (
  SELECT 300 minx, 300 miny, 600 maxx, 600 maxy
) a
JOIN cstore_tmp b
ON b.x BETWEEN a.minx AND a.maxx
AND b.y BETWEEN a.miny AND a.maxy;

Result:

Aggregate  (cost=189072.56..189072.57 rows=1 width=8)
  ->  Foreign Scan on cstore_tmp b  (cost=0.00..189072.00 rows=225 width=32)
        Filter: ((x >= 300) AND (x <= 600) AND (y >= 300) AND (y <= 600))
        CStore File: /var/data/postgresql/10/main/cstore_fdw/16384/191674758
        CStore File Size: 74310480

Query with N:N join

SELECT count(b.*)
FROM (
  SELECT 300 minx, 300 miny, 500 maxx, 500 maxy
  UNION ALL
  SELECT 400 minx, 400 miny, 600 maxx, 600 maxy
) a
JOIN cstore_tmp b
ON b.x BETWEEN a.minx AND a.maxx
AND b.y BETWEEN a.miny AND a.maxy;

Result:

Aggregate  (cost=504627.60..504627.61 rows=1 width=8)
  ->  Nested Loop  (cost=0.00..504072.05 rows=222222 width=32)
        Join Filter: ((b.x >= "*SELECT* 1".minx) AND (b.x <= "*SELECT* 1".maxx) AND (b.y >= "*SELECT* 1".miny) AND (b.y <= "*SELECT* 1".maxy))
        ->  Foreign Scan on cstore_tmp b  (cost=0.00..99072.00 rows=9000000 width=40)
              CStore File: /var/data/postgresql/10/main/cstore_fdw/16384/191674758
              CStore File Size: 74310480
        ->  Materialize  (cost=0.00..0.05 rows=2 width=16)
              ->  Append  (cost=0.00..0.04 rows=2 width=16)
                    ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..0.02 rows=1 width=16)
                          ->  Result  (cost=0.00..0.01 rows=1 width=16)
                    ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=16)
                          ->  Result  (cost=0.00..0.01 rows=1 width=16)