qri-io/dataset

Replace stdlib encoding/csv with something more robust to bad input

Closed this issue · 1 comments

b5 commented

while working on EPA TRI data our CSV parser chokes on input that is, well, bad, but most other csv parsers are able to work with. We've run into this a few times, and the go standard library has no plans to implement anything beyond RFC 4180. To make qri "just work", it's time to investigate using a different csv parser.

Some options I've found so far:

the gocarina/gocsv package looks far more robust, and uses an MIT license. We should test if that library is able to handle a sample of TRI data as a start. Benchmarks would be a plus, but no guarantees.

b5 commented

Ok after some investigation, the problem isn't with our CSV parsing library, but in the way the detect package configures structure.FormatConfig, which makes me happy.