osiegmar/FastCSV

CsvReader fails to parse file with some quoted fields

oflege opened this issue · 3 comments

Describe the bug
CsvReader with errorOnDifferentFieldCount(true) fails for a particular csv file. If I change anything in that file before the problematic line 228 (delete a row, add/remove a char from a field in a row) or replace the double quotes in line 228 with single quotes, the CsvReader does not fail.

To Reproduce
JUnit test to reproduce the behavior:

        try (CsvReader r = CsvReader.builder().fieldSeparator(';').errorOnDifferentFieldCount(true)
                .build(new File("a.csv").toPath(), StandardCharsets.ISO_8859_1)
        ) {
            r.iterator().forEachRemaining(System.out::println);
        }

a.csv

Thanks for reporting this issue! Given your test code and data I could successfully reproduce and fix it.

The problem was caused by the combination of two things:

  • Quote character within an unquoted field (nonconforming data per section 2.5 of RFC 4180)
  • Need to refill the input buffer while parsing such a field

Could you give the develop branch a try if it fixes your problem with real data?

Thanks a lot for the quick fix, I just tested the code in the develop branch with our curent set of csv files and all were parsed successfully

Thanks! Fixed in 2.2.1 just released.