Direct writemethod not working in Databricks for Spark 3.5

Question

Direct writemethod not working in Databricks for Spark 3.5

jainshasha opened this issue 7 months ago · 5 comments

Hi,

I am using Databricks with Spark 3.5 version for writing dataframe into Bigquerytable.
As per the documentation for the spark-bigquery connector i am using writemethod as Direct to write the data into table and avoidingtemporaryBucket write but even after providing Direct writemethod it si asking for providing temporaryBucket name and create temporary files on GCS.

Is this some bug which is still open ?

I am using the following command
finalDF.write.format("bigquery").option("writeMethod", "direct").option("temporaryGcsBucket", bucket).save(table)

Answer 1 · 2024-03-06T15:48:11.000Z

Which version of the connector are you using?

Answer 2 · 2024-03-07T02:56:47.000Z

hi @davidrabinowitz thanks for replying.
Currently using Databricks version of 14.3 and with that it has spark version of 3.5
how to check the version of spark-bigquery connector version on that, can you please help me in that.

Would really appreciate your help in this

Answer 3 · 2024-03-07T04:09:26.000Z

If you are using the built it BigQuery connector, then the Databricks release note should have this information.

Answer 4 · 2024-03-07T06:17:23.000Z

hi @davidrabinowitz
i debug into databricks i saw this jar we are using inside databricks
ls -lrt /databricks/jars/----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--118181791--fatJar-assembly-0.22.2-SNAPSHOT.jar*

Answer 5 · 2024-03-25T17:56:29.000Z

If I understand correctly, the jar is of version 0.22.2 which is a very old jar. For Spark 3.5 we recommend to use the spark-3.5-bigquery connector, the latest version is 0.36.1. Direct write is certainly supported there.