jabranr/csv-parser

Document assumed field and line separators

Opened this issue · 1 comments

Please document the assumption done of the format of the input.

There are numerous ways to construct csv data and hardly no consensus, so saying "csv" without further specification will always be very subjective (see https://en.wikipedia.org/wiki/Delimiter-separated_values and https://en.wikipedia.org/wiki/Comma-separated_values).

For example in my locale (Danish) comma is used as a thousands separator in decimal values, so "our csv" (rather dsv) is almost always using semicolon (;) as field delimiter to overcome the ambiguity between fields and decimal values.

Maybe something like this in README.md:

CSV assumptions

Line Separation Detection

  • If any non-printable characters (like \n or \t) are detected in input any of those will be used as line separator
  • If only printable characters are found semicolon (;) will be used as line separator

Field Separation

  • Fields are always separated by comma (,)

@mikini That is a great input. Thank you. I will try to update the ReadMe ASAP as well as try to look for a solution for such a format.