EnterpriseDB/mysql_fdw

[Feature request] Support Subquery pushdown

Opened this issue · 0 comments

Motivation

Examples:

  1. Simple case:
SELECT 
  id,
  (SELECT COUNT(*) FROM tbl2 WHERE tbl2.parent_id = tbl1.id) 
FROM tbl1;  -- + WHERE, ...

The query plain currently looks somewhat like this:

 Foreign Scan on tbl1
   Output: tbl1.id, (SubPlan 1)
   Remote query: SELECT `id` FROM `s`.`tbl1`   -- + WHERE, ORDER BY, ...
   SubPlan 1
     ->  Foreign Scan
           Output: (count(tbl2.parent_id))
           Relations: Aggregate on (s.tbl2)
           Remote query: SELECT count(`id`) FROM `s`.`tbl2` WHERE ((`parent_id` = ?))

I.e. for each row of tbl1 a separate query is send to the remote database.

  1. Similar case (worse performance):
SELECT 
  id,
  EXISTS (SELECT FROM tbl2 WHERE tbl2.parent_id = tbl1.id) 
FROM tbl1;

Query Plan:

 Foreign Scan on tbl1
   Output: tbl1.id, (hashed SubPlan 2)
   Remote query: SELECT `id` FROM `s`.`tbl1`   -- + WHERE, ORDER BY, ...
   SubPlan 2
     ->  Foreign Scan on s.tbl2
           Output: tbl2.parent_id
           Remote query: SELECT `parent_id` FROM `s`.`tbl2`

This unfortunately tries to download the complete tbl2!

  1. Another version (might be more difficult to implement?):
SELECT 
  id,
  r1.*
FROM tbl1,
LATERAL (SELECT COUNT(*) FROM tbl2 WHERE tbl2.parent_id = tbl1.id) r1;

Query Plan:

 Nested Loop
   Output: tbl1.id, (count(tbl2.parent_id))
   ->  Foreign Scan on s.tbl1
         Output: tbl1.id, ... more fields...
         Remote query: SELECT `id` FROM `s`.`tbl1`   -- + WHERE, ORDER BY, ...
   ->  Foreign Scan
         Output: (count(*))
         Relations: Aggregate on ("s".tbl2)
         Remote query: SELECT count(*) FROM `s`.`tbl2` WHERE ((`parent_id` = ?))

This is basically the same as 1.

Partial workaround:

SELECT
  tbl1.id,
  COUNT(tbl2.id)
FROM tbl1
LEFT JOIN tbl2 ON tbl2.parent_id = tbl1.id
GROUP BY 1

Query Plan: Single Foreign Scan, everything is pushed down

However, this no longer works when additional fields, e.g. tbl1.*, should be retrieved.

One might try this:

WITH t0 AS ( -- NOT MATERIALIZED does not help
  SELECT
    tbl1.id,
    COUNT(tbl2.id)
  FROM table
  LEFT JOIN tbl2 ON tbl2.parent_id = tbl1.id
  GROUP BY 1
  ORDER BY 1
)
SELECT
  tbl1.*,
  count
FROM t0
JOIN tbl1 ON tbl1.id = t0.id

But the resulting query plan (Merge Join between Foreign Scan of tbl1 and Materialize of Foreign Scan from t0) is not feasible for bigger tables, although SELECT * FROM t0 and SELECT * FROM tbl1 are quite fast by themselves...

Expected Result

The aggregation is evaluated on the remote mysql server without requiring a separate query for each row of tbl1.
Also, EXISTS should probably keep the WHERE, instead of downloading the whole table.