GoogleCloudDataproc/spark-bigquery-connector

Direct writemethod not working in Databricks for Spark 3.5

jainshasha opened this issue · 5 comments

Hi,

I am using Databricks with Spark 3.5 version for writing dataframe into Bigquerytable.
As per the documentation for the spark-bigquery connector i am using writemethod as Direct to write the data into table and avoidingtemporaryBucket write but even after providing Direct writemethod it si asking for providing temporaryBucket name and create temporary files on GCS.

Is this some bug which is still open ?

I am using the following command
finalDF.write.format("bigquery").option("writeMethod", "direct").option("temporaryGcsBucket", bucket).save(table)

Which version of the connector are you using?

hi @davidrabinowitz thanks for replying.
Currently using Databricks version of 14.3 and with that it has spark version of 3.5
how to check the version of spark-bigquery connector version on that, can you please help me in that.

Would really appreciate your help in this

If you are using the built it BigQuery connector, then the Databricks release note should have this information.

hi @davidrabinowitz
i debug into databricks i saw this jar we are using inside databricks
ls -lrt /databricks/jars/----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--118181791--fatJar-assembly-0.22.2-SNAPSHOT.jar*

If I understand correctly, the jar is of version 0.22.2 which is a very old jar. For Spark 3.5 we recommend to use the spark-3.5-bigquery connector, the latest version is 0.36.1. Direct write is certainly supported there.