MrPowers/spark-daria

Enhancing query pushdown in Postgres...

MrPowers opened this issue · 1 comments

A lot of queries are pushed down to the database level when Snowflake is used as described in this blog post.

Joins, aggregations, and SQL functions are all pushed down and performed in the Snowflake database before data is sent to Spark.

I know some stuff gets pushed down to Postgres (column pruning), but are joins and aggregations being pushed down? @nvander1 - do you know what gets pushed down to Postgres? Is this something we could improve?

Some analyses could do a lot of stuff at the database level, only send a fraction of the data to the Spark cluster, and then probably perform a lot faster. Spark isn't the best at joins, so pushing those down to the database level would probably help a lot...