GoogleCloudDataproc/spark-bigquery-connector

Writing spark data with timestamp_ntz/timestamp to bigquery column datetime

Closed this issue · 1 comments

My use case is to load data from Databricks Delta table using spark over to Bigquery table. The target column type is datetime, an equivalent of that with spark is not available. I followed the idea as specified on this Git Repo documentation that the value can be converted into Big query string literal type. After casting the source data into string i still see an error that the format is not equivalent.

My question is, does the connector support writing date and time that is correctly formatted string data into DATETIME biquery type?

Error:
# Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Provided Schema does not match Table <<project>>.<<data set>>.<<table name>>. Field insert_dttm has changed type from DATETIME to STRING

My code:

spark.sql (f""" create or replace table {src_table_name} ( insert_dttm timestamp) """)
spark.sql (f""" insert into {src_table_name} values ('2014-09-27T12:30:00.45') """)

query = f"""
SELECT
cast(insert_dttm as string) as insert_dttm
FROM foundation.test_table_delta
"""

spark.sql(query)

ret = (df.write.format("bigquery")
        .mode("append")
        .option("temporaryGcsBucket", gcs_temp)
        .option("table", tgt_table_name)
        .option("project", bq_project_id)
        .option("parentProject", bq_project_id)
        .save())

Hi @ne250040, Spark 3.5 supports TimestampNTZType which can be used for DATETIME equivalent in BigQuery. Please use spark-3.5-bigquery connector.