DATETIME parsing in PySpark
satybald opened this issue · 0 comments
satybald commented
Currently, the documentation states that spark parses DATETIME
as TimestampType
[1]. However, in practice this type get parsed as a plain string for 0.32 version of connector[2].
Would it be possible to clarify what's the intended behaviour here? Would it be posible to cast datetime
as timestamps
type?
Reproducable example:
spark = SparkSession.builder \
... .master('local[*]') \
... .appName('Top Shakepeare words') \
... .config('spark.jars.packages', 'com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.32.2') \
... .getOrCreate()
>>> spark.conf.set("materializationDataset","spark_temp_dataset")
>>> word_count = spark.read \
... .format('bigquery') \
... .load('SELECT DATETIME("2023-08-12")')
>>> word_count.printSchema()
root
|-- f0_: string (nullable = true)
[1] https://github.com/GoogleCloudDataproc/spark-bigquery-connector/#data-types
[2] https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/master/spark-bigquery-connector-common/src/main/java/com/google/cloud/spark/bigquery/SchemaConverters.java#L387-L392