does spark read from bq multiple times when joining?
Closed this issue · 2 comments
PacoBahena commented
My question is the following,
when doing bq read into a spark dataframe, and then using that dataframe to multiple joins, does spark hit bq multiple times?
Note: Asume a single action at the end.
davidrabinowitz commented
That depends on the query plan and whether the DataFrame is cached or not. The best is to run .explain()
on the result.
PacoBahena commented
That depends on the query plan and whether the DataFrame is cached or not. The best is to run
.explain()
on the result.
So from looking at the physical plan in the spark ui, would it be correct to mention that the connector hits bq once every time i see the following on the plan
Scan com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation@4753eb60