Azure/spark-cdm-connector

Schema drift: The number of columns in CSV/parquet file is not equal to the number of fields in Spark StructType.

Closed this issue · 1 comments

rgk85 commented

I have encountered an issue where the schema includes more columns than my actual data, the reading throws an error also saying this.

The number of columns in CSV/parquet file is not equal to the number of fields in Spark StructType. Either modify the attributes in manifest to make it equal to the number of columns in CSV/parquet files or modify the csv/parquet file

image

I was reading the documentation and unsupported scenarios and as far as I understood on the scenario where the actual data has more columns than specified in the schema is not supported, am I missing something in the documentation or perhaps I'm doing something wrong, perhaps a workaround is in place?

spark-cdm-connector 0.19.1
Databricks 6.4
Spark 2.4.5

This may be relatede to the issue we were facing, discussed in a previous issue:
#84

Things to try:

  • using the permissive mode option
    .option("mode", "permissive")
  • resyncing from dataverse side