Schema drift: The number of columns in CSV/parquet file is not equal to the number of fields in Spark StructType.

Question

Schema drift: The number of columns in CSV/parquet file is not equal to the number of fields in Spark StructType.

Closed this issue 2 years ago · 1 comments

I have encountered an issue where the schema includes more columns than my actual data, the reading throws an error also saying this.

The number of columns in CSV/parquet file is not equal to the number of fields in Spark StructType. Either modify the attributes in manifest to make it equal to the number of columns in CSV/parquet files or modify the csv/parquet file

I was reading the documentation and unsupported scenarios and as far as I understood on the scenario where the actual data has more columns than specified in the schema is not supported, am I missing something in the documentation or perhaps I'm doing something wrong, perhaps a workaround is in place?

spark-cdm-connector 0.19.1
Databricks 6.4
Spark 2.4.5

Answer 1 · 2021-12-20T06:17:59.000Z

This may be relatede to the issue we were facing, discussed in a previous issue:
#84

Things to try:

using the permissive mode option
.option("mode", "permissive")
resyncing from dataverse side