GoogleCloudDataproc/spark-bigquery-connector

Add a way to disable map type support.

GrzegorzSmardzewskiAllegro opened this issue · 1 comments

The Map read support added in #914 breaks backwards compatibility. We have arrays of records with fields named "key" and "value" and the connector incorrectly assumes that those should be serialized as maps which causes serialization errors, because our code expects an array there.

There should be a way to disable that behavior, via a property for example.

Hi @davidrabinowitz bumping on this because we saw this impact production customers due to this backwards compatibility break. I think the offending section of code might be this:

Spark MapTypes don't have actual named "key/value" attributes, I am not sure why the code recognizes attributes with the names "key/value" and identifies those as MapTypes unless this is a BQ convention or something? MapTypes have key/value pairs, but they don't explicitly have attributes named "key" or "value".

I think the behavior that something that looks like a Map with a literal key/value attribute is that it should be an array of structs as was the previous behavior.

Either way, this is definitely breaks backwards compatibility and the previous expected interface, and don't see a workaround at the moment.

I can provide PySpark examples demonstrating what I expected the behavior to be if required/helpful.