pkiraly/metadata-qa-api

Ignore fields from the schema that are not in the data

Opened this issue · 3 comments

At the moment, the code fails if a CSV contains a column that is not in the schema. Better behaviour would be to simply ignore these columns and post a warming. If none of the columns match the schema, the result should be empty, but should not throw an exception.

Dear @mielvds, could you please write an example for this error? I am not able to reproduce it. I've created a new method which reads column names from the CSV header. See deatils in #58.

TBH I opened this issue a bit to quickly. I'll see if I can reproduce... but I think #58 solves it indeed.

@pkiraly I was able to reproduce this. The issue was that I set the CsvReader with the schema header like this:

this.calculator.setCsvReader(
                new CsvReader()
                        .setHeader(((CsvAwareSchema) schema).getHeader()));

Imagine you have a schema that configures the fields A, B,C, but your CSV contains the columns A,B,C, D.
You'd get a java.lang.IllegalArgumentException: The size of columns are different than the size of headers when running calculator.measureAsList(strings)