Remove byte-order markers from CSV files?

Question

Remove byte-order markers from CSV files?

robinhouston opened this issue 7 years ago · 9 comments

I’ve noticed that Excel now saves UTF-8 CSV files with a BOM. (I’m using Microsoft Excel for Mac version 15.33, saving in “CSV UTF-8” format.)

When such files are parsed with csvParse, the key corresponding to the first column has a zero-width non-breaking space as its first character, which leads to a situation where d["keyName"] is undefined even though keyName appears when you print out d!

I’m not sure whether you think this should be addressed in the parser – if not it should at least be documented I think.

Answer 1 · 2017-05-17T16:01:05.000Z

Can you attach an example file I can use for testing purposes?

Answer 2 · 2017-05-17T16:09:28.000Z

Sure! GitHub won’t let me attach a .csv file, so I’ve zipped it.
Workbook1.csv.zip

Answer 3 · 2017-05-17T16:15:37.000Z

As TXT: Workbook1.txt

Answer 4 · 2017-05-17T16:24:33.000Z

Interestingly if you use FileReader.readAsText, it automatically strips the BOM bytes for you, per the Encoding specification.

Answer 5 · 2017-05-17T16:34:50.000Z

Seems like XMLHttpRequest and Fetch also automatically strip the BOM. Here’s a CORS-accessible URL I tested:

https://rawgit.com/mbostock/3fe6055309cff87cba4103837d914fee/raw/48cec3b15411fe2a9d9f678c5988d03b3988f498/test.csv

So my question is how are you getting a string with the BOM still in it? It seems like the BOM stripping should happen earlier, before it gets to d3-dsv.

Answer 6 · 2017-05-17T16:43:34.000Z

Sorry, I should have included a complete repro. I’m getting this in node, by fs.readFile(filename, "utf8", …). It looks as though the node developers have decided against stripping BOMs by default.

It’s okay if I should handle this in the app: I just thought I should flag it.

Answer 7 · 2017-05-17T16:50:46.000Z

Okay. I’m going to close this issue. If you want to submit a pull request with an edit to the README suggesting that Node users use strip-bom that would be 💯 .

Answer 8 · 2017-05-17T16:51:10.000Z

Great, will do!

Answer 9 · 2018-10-19T16:07:10.000Z

Gah this just got me too. Could we consider adding it directly to d3-dsv? I think the code is considerably shorter than the comment in the README, plus I wasted a good ten minutes, thanks Excel!