mgcrea/node-xlsx

data parsing issue when loading a CSV

pcross616 opened this issue · 1 comments

As we know CSVs don't have a true standard. I did find recently when a standard CSV such as.

ID, Title, Author, Pages
1234, My Really Cool Book, Bobo T Clown, 124
4512,A Book with a Quote " in the Title, That Guy, 515
2345,Some Other Book, Someone Else, 42
2555,Some Other Book " Part 2, Someone Else, 42

The quote is messing with the data in the sheets. I have a much larger data set representation of this but thats the gist of it. Could we have a way to specify if quoted or not, since some data feeds do not escape " and in this case quotes are not wrapping each field. Excel handles this file fine as is but when using node-xlsx the columns get all missed aligned.

I did a little bit more digging. It looks like XLSX.utils.sheet_to_json is the issue. When using the underlying API and looking at the data in

var wb = XLSX.read(buf, {type:'buffer'}); 
wb.Sheets["Sheet1"])

all the data is parsed and managed correctly. So looks like this is an upstream issue.