GoogleCloudDataproc/spark-bigquery-connector

Map column of a complex type in values causes error "Data type not expected: struct<...>"

Closed this issue · 1 comments

Using connector version 0.35.0 with Spark 3.5.0 on Dataproc:

When writing a dataset of a case class (using ORC as intermediate format, but I don't think its important)

case class NestedCaseClass(a: Int)
case class WrapperCaseClass(
  nested: Map[Int, NestedCaseClass],
)

I am getting

java.lang.IllegalArgumentException: Data type not expected: struct<...>
[info]   at com.google.cloud.spark.bigquery.SchemaConverters.toBigQueryType(SchemaConverters.java:607)
[info]   at com.google.cloud.spark.bigquery.SchemaConverters.createBigQueryColumn(SchemaConverters.java:510)
[info]   at com.google.cloud.spark.bigquery.SchemaConverters.sparkToBigQueryFields(SchemaConverters.java:468)
[info]   at com.google.cloud.spark.bigquery.SchemaConverters.toBigQuerySchema(SchemaConverters.java:456)

It seems that here the connector expects the value of a Map to be a simple type although:

  1. In BQ its a valid type to have a Map of a complex (struct) type as a value
  2. It was working in prior versions like 0.25.0, so it seems to be a regression

I will need to dig into it more as I have just figured out that our code transform Maps to Arrays, something have changed but its seems that the first definition of the issue is not accurate. I will open an other one if needed