Make row length check optional

Question

Make row length check optional

lessless opened this issue 8 years ago · 7 comments

Hello,

thank you for the wonderful library, it's a pleasure to use it except one pain point: a hard check on a row length.
If there are no any strong objections it can be a wonderful addition.

Answer 1 · 2016-12-07T20:48:35.000Z

Interesting, thanks for raising this - If I understand correctly that would be something like CSV.decode(expected_row_length: 5) ?

Answer 2 · 2016-12-08T01:46:10.000Z

Rather CSV.decode(fixed_length: false), so the check won't be performed at all and dealing with incomplete / too long lines will be in application responsibility.

As result

Bruce,Wayne,bruce@wayne.com
Peter,
James, Howlett,james@howlett.com,49

will be parsed into

[
  ["Bruce", "Wayne", "bruce@wayne.com"],
  ["Peter", nil],
  ["James", "Howlett", "james@howlett.com", "49"]
]

Answer 3 · 2016-12-10T10:05:42.000Z

Interesting suggestion - what program is used to encode csv files omitting the separators? If this is intended encoding, is there a reason to do it that way (e.g. disk space limitations)?

This is challenging and can lead to interesting situations like header rows being shorter than data rows, which would throw away data. I can see where you're coming from, however I would be inclined to suggest to properly encode the files before feeding them in.

Answer 4 · 2016-12-11T06:58:03.000Z

I think I have the same kind of problem. I have to deal with some weird csv (pipe as separator, and no escape character) that I don't produce, and can't fix. On the master branch you made easy for me to identified these line, which is great because now I can filter and generate nice error reports with pattern matching on the lines with {:error, "..."}.

In the ruby program I tried to port in elixir, the ruby csv library don't seems to check the length, so actually when I have extra columns that shouldn't be here, ruby still gives me the row. These extra columns are in the end of lines and are export errors I think. But these lines without the extra column at the end are valid and have valuable info for me I could extract.

So maybe when you return the Error {:error, "Row has length 30 - expected length 29 on line 45"}, insert the row in the tupple ? So I can have a chance of doing something with it and extract the info I would want.

Answer 5 · 2016-12-28T09:03:12.000Z

Actually, after doing a research I found that all rows must be the same length https://www.ietf.org/rfc/rfc4180.txt

Each line should contain the same number of fields throughout the file.

So a situation when a line is shorter/longer than the others is a clear violation of standard and throwing an error is a very legit behavior.

Also, because of that error can be easily cached with rescue clause and because of changing format of the error message is orthogonal to original subject I'm closing this issue.

Answer 6 · 2020-04-10T12:22:16.000Z

Not very flexible to just look at the spec, I am also getting files from supplier that are missing seperators at the end of the line, so now I have to manually go past each line and add them myself.

Having a check that just omits the check for line length would be easy and as a user of the lib a good way to get around the issue.

Answer 7 · 2020-04-15T12:06:58.000Z

@MarkNijhof is validate_row_length: false working for your case? If no, can you post an example of the data you're dealing with?