[Feature request] Support Subquery pushdown
Opened this issue · 0 comments
Motivation
Examples:
- Simple case:
SELECT
id,
(SELECT COUNT(*) FROM tbl2 WHERE tbl2.parent_id = tbl1.id)
FROM tbl1; -- + WHERE, ...
The query plain currently looks somewhat like this:
Foreign Scan on tbl1
Output: tbl1.id, (SubPlan 1)
Remote query: SELECT `id` FROM `s`.`tbl1` -- + WHERE, ORDER BY, ...
SubPlan 1
-> Foreign Scan
Output: (count(tbl2.parent_id))
Relations: Aggregate on (s.tbl2)
Remote query: SELECT count(`id`) FROM `s`.`tbl2` WHERE ((`parent_id` = ?))
I.e. for each row of tbl1
a separate query is send to the remote database.
- Similar case (worse performance):
SELECT
id,
EXISTS (SELECT FROM tbl2 WHERE tbl2.parent_id = tbl1.id)
FROM tbl1;
Query Plan:
Foreign Scan on tbl1
Output: tbl1.id, (hashed SubPlan 2)
Remote query: SELECT `id` FROM `s`.`tbl1` -- + WHERE, ORDER BY, ...
SubPlan 2
-> Foreign Scan on s.tbl2
Output: tbl2.parent_id
Remote query: SELECT `parent_id` FROM `s`.`tbl2`
This unfortunately tries to download the complete tbl2
!
- Another version (might be more difficult to implement?):
SELECT
id,
r1.*
FROM tbl1,
LATERAL (SELECT COUNT(*) FROM tbl2 WHERE tbl2.parent_id = tbl1.id) r1;
Query Plan:
Nested Loop
Output: tbl1.id, (count(tbl2.parent_id))
-> Foreign Scan on s.tbl1
Output: tbl1.id, ... more fields...
Remote query: SELECT `id` FROM `s`.`tbl1` -- + WHERE, ORDER BY, ...
-> Foreign Scan
Output: (count(*))
Relations: Aggregate on ("s".tbl2)
Remote query: SELECT count(*) FROM `s`.`tbl2` WHERE ((`parent_id` = ?))
This is basically the same as 1.
Partial workaround:
SELECT
tbl1.id,
COUNT(tbl2.id)
FROM tbl1
LEFT JOIN tbl2 ON tbl2.parent_id = tbl1.id
GROUP BY 1
Query Plan: Single Foreign Scan, everything is pushed down
However, this no longer works when additional fields, e.g. tbl1.*
, should be retrieved.
One might try this:
WITH t0 AS ( -- NOT MATERIALIZED does not help
SELECT
tbl1.id,
COUNT(tbl2.id)
FROM table
LEFT JOIN tbl2 ON tbl2.parent_id = tbl1.id
GROUP BY 1
ORDER BY 1
)
SELECT
tbl1.*,
count
FROM t0
JOIN tbl1 ON tbl1.id = t0.id
But the resulting query plan (Merge Join between Foreign Scan of tbl1 and Materialize of Foreign Scan from t0) is not feasible for bigger tables, although SELECT * FROM t0
and SELECT * FROM tbl1
are quite fast by themselves...
Expected Result
The aggregation is evaluated on the remote mysql server without requiring a separate query for each row of tbl1.
Also, EXISTS should probably keep the WHERE, instead of downloading the whole table.