Enhancing query pushdown in Postgres...
MrPowers opened this issue · 1 comments
MrPowers commented
A lot of queries are pushed down to the database level when Snowflake is used as described in this blog post.
Joins, aggregations, and SQL functions are all pushed down and performed in the Snowflake database before data is sent to Spark.
I know some stuff gets pushed down to Postgres (column pruning), but are joins and aggregations being pushed down? @nvander1 - do you know what gets pushed down to Postgres? Is this something we could improve?
Some analyses could do a lot of stuff at the database level, only send a fraction of the data to the Spark cluster, and then probably perform a lot faster. Spark isn't the best at joins, so pushing those down to the database level would probably help a lot...
nvander1 commented
For JDBC Spark can only really push down filters like a where clause and
maybe column pruning.
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-12126
It sounds like a lot of the work they are doing on datasource v2 is geared
toward writing jdbc sources that are able to do more sophisticated push
downs . You can manually give it a sql string however right now
Le dim. 19 mai 2019 à 14:03, Matthew Powers <notifications@github.com> a
écrit :
… A lot of queries are pushed down to the database level when Snowflake is
used as described in this blog post
<https://www.snowflake.com/blog/snowflake-spark-part-2-pushing-query-processing/>
.
Joins, aggregations, and SQL functions are all pushed down and performed
in the Snowflake database before data is sent to Spark.
I know some stuff gets pushed down to Postgres (column pruning), but are
joins and aggregations being pushed down? @nvander1
<https://github.com/nvander1> - do you know what gets pushed down to
Postgres? Is this something we could improve?
Some analyses could do a lot of stuff at the database level, only send a
fraction of the data to the Spark cluster, and then probably perform a lot
faster. Spark isn't the best at joins, so pushing those down to the
database level would probably help a lot...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#90?email_source=notifications&email_token=ACXK4V4NCOOQBVDQ2Y7S6SLPWGI6DA5CNFSM4HN5A65KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUTCZAQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACXK4VZXHBRTL3GWZT62VELPWGI6DANCNFSM4HN5A65A>
.