osiegmar/FastCSV

NamedCsvReader should trim header fields when it reads first line

azharrnaeem opened this issue · 4 comments

Is your feature request related to a problem? Please describe.
If the header fields contains space after delimiter i.e , then the fields are read with space prefix. For example if the header line is column1, column2, column3 then the map de.siegmar.fastcsv.reader.NamedCsvRow#getFields, contains keys with spaces prefixed.

Describe the solution you'd like
While reading header line, the values can trimmed. Perhaps a new feature flag in the builder would be a better option to enable/disable it.

Describe alternatives you've considered
None.

RFC 4180 compliance
To best of my knowledge, it should not contradict compliance.

Per section 2.4 of RFC 4180:

Spaces are considered part of a field and should not be ignored.

Anyway, I'm still considering trim support but I'm not so sure if header records need special treatment.

illdd1 commented

How could we deal with white space at the end of a file when using named csv reader , it is picking up the white space at the end of the file
When looking for a value with white space at the end it cannot find it

Example
Looking for (“name”) cannot find (“name “)

@osiegmar

@illdd1 I'm not sure if I understood you correctly / how your post differs from what @azharrnaeem requests.

Currently FastCSV behaves exactly RFC conform. If you have a column header "name " you have to access it via getField("name "). In addition to that, all recognized header fields (including their spaces) are included in getFields().

Strictly speaking, your CSV file is broken. But I understand that people have to treat with those files. I just need to make sure that changes (like trim fields, case-insensitive lookups, duplicate header fields, ...) do not sacrifice the high performance of FastCSV – which is the number 1 design goal of this library!

This will be possible with FastCSV 3 (soon to be released).

To simply trim all fields you can just call:

CsvReader.builder().fieldModifier(FieldModifier.TRIM).build(data);

For special treatment you could implement a custom FieldModifier:

// Call .trim() and .toUpperCase() for the first line only
FieldModifier headerTrimUpperModifier = (originalLineNumber, fieldIdx, comment, quoted, field) ->
    originalLineNumber == 1 ? field.trim().toUpperCase() : field;

var csvBuilder = NamedCsvReader.builder()
    .fieldModifier(headerTrimUpperModifier);

for (NamedCsvRecord csvRecord : csvBuilder.build(" h1 , h2 \nfoo,bar")) {
    System.out.println(csvRecord.getFieldsAsMap());
}

prints: {H1=foo, H2=bar}

(Syntax may change until the official release!)